Retraction Watch

Tracking retractions as a window into the scientific process

Newly released AI software writes papers for you — what could go wrong?

with 7 comments

This week, we received a press release that caught our attention: A company is releasing software it claims will write manuscripts using researchers’ data. 

The program, dubbed “Manuscript Writer,” uses artificial intelligence (AI) to generate papers, according to the company that created it, sciNote LLC. A spokesperson explained the software generates a first draft the scientist should revise, and won’t write the Discussion, “the most creative and original part of the scientific article.” But can it provide any coherent text?  

According the release from sciNote, Manuscript Writer (an add-on to the company’s Electronic Lab Notebook, or ELN):

…has the ability to significantly simplify the process of preparing scientific manuscripts by using the technological advances in machine learning and artificial intelligence. Recognizing the importance of timely publication of scientific findings by the global scientific community, the add-on aims to significantly reduce the time taken to prepare initial content. By drawing upon data contained within the ELN and references that are accessible in open access journals, to provide a structured draft for the author to then edit and develop further.

A spokesperson told us the company’s ELN is the first to generate scientific articles.

What about avoiding the problems that sometimes befall paper mills, such as plagiarism? We asked someone with experience investigating paper mills, Charles Seife at New York University. Seife said he couldn’t speak from experience about Manuscript Writer, as he doesn’t have it on his computer, but it seems “dodgy:”

I could certainly imagine a useful system of some kind that would take lab notes and attempt to fit data, protocols, and notes into various templates; help format references; even create an outline of what information goes where in a paper. This system is promising more than that, though, suggesting that the software would provide a “first draft.” To me, this suggests that, unless the scientist’s already entered a substantial amount of prose, the program’s going to get it from somewhere else… which is problematic, to say the least.

The terms of service say explicitly that the draft will be generated not just from the data stored by the user but from “relevant keywords and open access references.” Obviously, an AI isn’t capable of understanding and digesting prose the way a human is, so it’s hard for me to see how it’s going to be able to create any sort of derivative work based on open-access references that isn’t plagiaristic or incoherent (or most likely both.)

Seife added:

So, having not used the program myself, I can’t say for sure, but I’d be willing to bet money that it’s scraping prose from references’ introductions, jumbling it up in some fashion, and plunking it down for the researcher to use in his own introduction. This ain’t a good idea, for obvious reasons.

So, yes, this business model concerns me. If it, in fact helps automate the process of stealing other people’s prose, lightly massaging it, and using it as one’s own, then it is a terrible thing.

We brought Seife’s concerns to the spokesperson, who told us:

Manuscript Writer will generate materials and methods section of the manuscript based on the scientist’s project and experiment data, protocols and notes in sciNote…In addition to that, Manuscript Writer will generate an introduction based on relevant keywords and DOI numbers that the scientist selected and entered. Manuscript Writer will pull information from selected references, and based on the relevant keywords it will look for additional relevant open access references and include them in the draft as well. The scientists will get an introduction in which every sentence or paragraph comes with a citation and all references are added to the list of references (another part of the manuscript generated by Manuscript Writer).

The program checks for plagiarism, the spokesperson noted:

After every paragraph that is included in the introduction, the scientist sees the number of the reference and a percentage (e.g. 100%) which shows the scientist that a particular paragraph is cited from the specified reference and is 100% the same text. This information cannot be overlooked, because it is part of the text and additionally notifies the scientist that she/he should edit it…It is then their responsibility to edit and proofread the text. As it would be in every other case when writing manuscripts.

We also notify the scientist to edit the received text at the point when they receive the draft. The main benefit is that Manuscript Writer can include interesting paragraphs, related to the subject at hand, to the introduction and give the scientist a head start while writing.

The spokesperson added:

Manuscript Writer’s purpose is not to write the finalized text instead of the scientist, [its] purpose is to empower the scientist. Which is why it cannot write the discussion section, which is the most creative and original part of the scientific article and greatly depends on the scientist’s style and way of thinking. Every scientist adds their own expertise and knowledge to the entire text.

David Moher of the Ottawa Hospital Research Institute said the program also raises a different concern:

The product appears to be geared to maintain the publication mill – publish or perish. Many universities and research institutes are trying to move away from this model. Today, there are many avenues to make research accessible, such as Open Science Framework and a host of preprint servers. Most importantly, research needs context and I’m not sure this tool can or should be providing the necessary human involvement in generating research reports.

Seife added he doesn’t know of other companies offering a similar product, but others have experimented with computer-generated prose. For instance, one company (Automated Insights) generates news stories about sports and corporate earnings, which often have specific structures, he said:

but I don’t think that a scientific paper is as easy to tackle.

Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here. If you have comments or feedback, you can reach us at retractionwatchteam@gmail.com.

Written by Alison McCook

November 9th, 2017 at 10:57 am

Posted in Uncategorized

Comments
  • Oliver C. Schultheiss November 9, 2017 at 11:42 am

    Yes!!! Finally!!!

    When I was a grad student, back in the 90ies, when the word crawled out of the MS-DOS cave and into the gradual sunshine of Windows, there was a rumor about a legendary software called publish.exe (today you’d call it an app). You were supposed to just feed in the data, and it would do all the rest for you. This presumably also included exploiting all kinds of undisclosed flexibility in data analysis, such as excluding outliers (defined as any data point that made the result lie outside p < .05) or dropping inconvenient control conditions.

    Now the good times have finally arrived! And I can quit my efforts teaching grad students how to write papers! Beaches of Spain, here I come!

    • Oliver C. Schultheiss November 9, 2017 at 3:05 pm

      …it should have been world, not word. Sorry for the typo. I couldn’t curb my enthusiasm.

  • John H Noble Jr November 9, 2017 at 1:03 pm

    Oliver,
    Wait up on the beaches of Spain if you want to avoid the grad students. They will no longer need your teaching or any of the research courses, now that AI has taken on responsibility for producing research.

    “Originality” starts with creative construction by AI for analysis.

  • Ciaran November 9, 2017 at 1:20 pm

    Funnily enough, the plagiarism problem in AI-generated content has already been “solved”, in that there is an easy way to force generators to pass obvious similarity tests:

    https://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8574

  • conor dolan November 13, 2017 at 8:00 am

    Perhaps the AI program (app) can do Retraction Watch comments too? I enjoy reading them!

  • Gary November 14, 2017 at 10:46 am

    “The program, dubbed “Manuscript Writer,” uses artificial intelligence (AI) to generate papers”
    Or you could go the old fashioned route and employ a paper mill…

  • Thom Engel November 15, 2017 at 9:10 am

    I sent this to several friends of mine. One of them opined that, “What we need is software that makes it more difficult to write technical articles, to cut down on the chaff from people who don’t know what they’re saying. ” I couldn’t have said it better.

  • Post a comment

    Threaded commenting powered by interconnect/it code.