
As a hospital librarian, Jessica Waite is typically successful at tracking down elusive articles for clinicians at Royal Hallamshire Hospital in England. So when a colleague couldn’t locate two references in a paper and asked for help, the librarian grew suspicious.
“These were recent references, which usually we have no problem finding,” Waite told us. “I looked at the issues of the journals where the article should have been, and there were completely different articles, so I immediately thought that the articles we had been asked to find were not real.”
The references were from an article exploring mental health integration after bowel diversion surgery published in Digestive Diseases and Sciences (DDS), a Springer Nature title.
Acting on her hunch, Waite searched for the full list of references and discovered that 12 of the 14 didn’t exist. Such cases frequently result from hallucinations generated by large language models such as ChatGPT.
Author Marie Atallah claims the references were submitted in error, an oversight affected by a brain injury. She provided us with two more reference lists, the first with more nonexistent sources and a final copy with real references.
“I was really shocked that an article published in a Springer journal and which presumably had gone through a peer or editorial review could have such a high number of references to hallucinated articles,” said Waite, a clinical/outreach librarian at Royal Hallamshire Hospital. “I can see how one or two may slip through the net, but this was a preposterous number.”
Waite is among many people who’ve contacted us about fake references in research papers. In December, we wrote about a paper in the Journal of Academic Ethics that included fake references, and in July, we reported on a paper rejected for use of AI that was later published in World of Media with hardly any changes.
Waite contacted Springer Nature about her concerns, and the publisher said it would investigate the matter, according to an email we have seen.
Michael Stacey, head of communications for Springer Nature journals, confirmed the publisher was alerted to concerns about the paper’s references earlier this year.
“We take all concerns about papers we have published extremely seriously, and are now looking into the matter carefully following an established process and in line with the Committee on Publication Ethics (COPE) best practice,” Stacey told us in an email.
Hallucinated references are an area the publisher is “actively exploring,” added Chris Graf, research integrity director for Springer Nature. Last April, the publisher announced the launch of a new in-house AI tool to check submissions for irrelevant references.
“This is more complex than it may at first appear, as references can be detailed by authors in a variety of different ways, often do not include DOIs, and simple tools to identify hallucinated references can produce false positives,” Graf told us by email.
The DDS article explores the psychological burden of ileostomy, a surgical intervention for conditions like inflammatory bowel disease, colorectal cancer, and traumatic bowel injuries. Atallah, the sole author, is a psychologist who trains postdoctoral fellows at Sutter Health in San Francisco.
Atallah acknowledged the false references, but said they came from an incorrect draft version she accidentally submitted to the journal. Atallah told us she worked on the paper for five years and used published literature to support each claim. During drafts, she used various tools, including AI, “which we know carries the known risk of hallucinated references,” she told us.
Although her sources were collected and stored in a cloud-based reference manager, each draft refinement led to hallucinated sources, which Atallah “either deleted or kept in to verify later, in case some of these were real but hard to find and provided additional context,” she told us. When Atallah submitted the paper to the journal, she sent a version with the hallucinated references, she said.
“This was unfortunately not caught on time by me, the reviewers, or the journal during their review process prior to the final publication,” Atallah told us. “The Journal eventually reached out to me when the error was caught, in which I was informed the appropriate discussions and steps to correct this were being undertaken.”
Atallah added the paper is deeply personal, as she underwent an unplanned ileostomy procedure in 2020 following a near-fatal car accident. During the accident, she sustained a severe traumatic brain injury, she said. That injury affects her frontal lobe, which “plays a central role in executive functioning, including planning, organization, decision-making, impulse control, and attention to detail,” she explained.
“As a result, I have had to be especially intentional about the systems and safeguards I use to manage complex academic and professional work,” she told us. “While this is in no way an excuse it may offer context for how I could have used a placeholder version of citations.”
Atallah sent us a new draft of her paper with an updated reference list and attached PDFs of “the original sources” from which she says the work originated. Of the 21 sources on the new reference list, we could only verify the existence of four of them. Other references on the new list had different journal names, varying titles, or different authors and pages from the PDF document purported to correspond. The PDF document of works included real articles, but most did not match Atallah’s reference list.
We reached out to Atallah again to inquire about the inconsistencies in the reference list and PDF document she provided us, and she sent us a third reference list for the paper. The latest list contains 25 sources and all references appear real, according to our check. Atallah did not explain what happened, but apologized “for the confusion once again.”
“All corrections consist of modifications to the references only,” she told us. “The article content and all claims remain unchanged.”
As for staff at Royal Hallamshire Hospital, Waite said the psychologist who requested help with the paper was disappointed the references couldn’t be found.
Fake references make Waite’s job as a librarian “so much harder,” she added.
“I expect this on the wider internet, but not in reputable scientific journals,” she told us. “This is a problem both for the information literacy teaching and the clinical literature searching I do to inform evidence-based practice in my hospital. It’s really worrying.”
Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on X or Bluesky, like us on Facebook, follow us on LinkedIn, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].
“Hallucinations” is a polite word for lying. Based on the common-sense idea that it should not be that hard to code a program NOT to generate non-existent citations, a rational conclusion is that the common AI output is intended to lie-probably to create the output quickly… speed over truth. Sounds like AI is a human-produced product.
It’s not coded to generate citations at all. They are general-purpose LLMs which are trained to generate text that is similar, in various ways, to their training set. This process inherently generates fake references, because the set of potential reference-shaped bits of text is massively larger than the set of actual things to be referred to. And an LLM which never produces novel combinations of words, only pre-existing ones, is a plagiarism engine, which is not considered all that useful.
So far the evidence suggests that this is nearly impossible to prevent. As you say, speed may be an issue, but I see reason to believe the developers when they say that this is tough. “I am going to quote this specific paper to support this specific point” is a big ask for something that does not think.
I am aware of one valid use for LLM in reference lists: asking “how can I format this citation correctly?” and looking at the result carefully before putting it in. This was helpful in a recent paper that had to cite the grey literature heavily; it was hard to figure out the formats. I would recommend never, ever using them to actually generate citations. It’s one of the things they do worst, unless you are actually satisfied with a reference-shaped bit of text.