Last month, a millipede expert in Denmark received an email notifying him that one of his publications had been mentioned in a new manuscript on Preprints.org. But when the researcher, Henrik Enghoff, downloaded the paper, he learned that it cited his work for something off-topic.
Stranger still, the authors of the now-withdrawn preprint, a group of researchers in China and Africa, also referenced two papers by Enghoff that he knew he hadn’t written. It turned out they didn’t exist.
“I’ve never had anything like this happen before,” Enghoff, a professor at the Natural History Museum of Denmark, in Copenhagen, told Retraction Watch.
Flabbergasted, Enghoff reached out to David Richard Nash at the University of Copenhagen. A few months prior, Nash had been experimenting with OpenAI’s ChatGPT, an artificial-intelligence chatbot, to see if it could be used to find scientific literature. He asked the bot to provide him with recent references on the butterfly species he works with. “It came back with 10 plausible-looking papers,” only one of which existed, Nash told Retraction Watch.
After learning of Enghoff’s case, Nash emailed Preprints.org, a free preprint server owned by the scientific publisher MDPI. He explained that he had looked up five random references in the preprint and found that all of them were fictitious. He also hinted that generative artificial intelligence such as ChatGPT could have been at work, adding:
I suggest that you contact the authors directly and ask for an explanation (and hopefully a retraction and apology to the affected “authors” of these fake references), and also review your policies regarding accepting AI-generated textxs [sic] and references.
In correspondence shared with Retraction Watch, Lloyd Shu, preprints editor at MDPI, apologized for “the problems caused by this preprint” and said:
We will withdraw it immediately and add the authors of this preprint to our blacklist.
Neither Shu nor the preprint authors responded to our requests for comments.
Ioana Craciun, scientific officer at MDPI, confirmed that Preprints.org had “taken down a preprint that appears to have been generated using AI.”
According to Preprints.org’s website:
All preprints undergo a short screening before being uploaded online. This process takes less than 24 hours in most cases. Screening includes checks for basic scientific content, author background, and compliance with ethical standards. It is carried out by the Preprints.org staff, with the support of active researchers and the Preprints.org advisory board.
Craciun explained that, because preprints do not undergo peer review, “it may be more likely for AI-generated content to be posted on preprint servers without detection.”
She added:
This case serves as a signal for us. It is not only a challenge for Preprints.org, but for all preprint servers, publishers, universities, and the scientific community as a whole. We will review our screening process to identify areas for improvement. While we can pay attention to the extremely low similarity rate of AI-generated content, we cannot rely solely on that. We hope that more powerful tools will be developed to assist in the detection of AI-generated content, and closer inspection of references may be necessary, as demonstrated in this case.
While the title of the withdrawn preprint, “From Beneficial Arthropods to Soil-Dwelling Organisms: A Review on Millipedes,” is somewhat strange, its abstract appears cogent. (Preprints.org has removed the manuscript, but we have made it available here.) The main text is suspicious in places, however, including broken sentences such as these:
Another study by Enghoff [32] carried out on the distribution of millipedes in Southeast Asia, is known to have high levels of species richness and endemism. The distribution and ecology of millipedes and other arthropods, and notes that millipedes are found on all continents except Antarctica [48].
“If it had gone into a journal, I would definitely be very worried,” Nash said. “Going onto a preprint server, I have to admit that slightly confirms some of my prejudice against preprint servers.”
Note: The interview with Enghoff was conducted in Danish and translated by the writer, a native speaker.
Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on Twitter, like us on Facebook, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].
Yep, that’s what LLMs do – they do word-by-word statistical mashups that kind of look like a thought out product but are not. They don’t handle coherent chunks of text like literature citations, they create new ones as they go along.
So if you’re running a journal or preprint server I suggest you write a little script to look up literature citations from submitted manuscripts in real databases (PubMed is a good source for biomedical subjects, but heck, you’re a publisher or editor you should know which repositories are appropriate for your journal) and if you can’t find them then take a serious look at the submitted manuscript before making it public.
I’m fairly certain reference 76 is fake, the issue page numbers don’t go that high. I’m sure there are more.
The journal editor stated that they blacklisted the authors. Is that enough? With hundreds of journals to chose from, what’s to prevent the authors from submitting the same paper to another publisher (or possibly to a different MDPI journal)? Or generating a continuous stream of AI papers for submission elsewhere?
Does the editorial board of the journal have an ethical obligation to contact the host institution(s) of the authors to inform them that their employees, to all appearances, have tried to perpetrate scientific fraud?
Thoughts?
Do you think these institutions will care?