Librarian finds ‘preposterous number’ of fake references in paper from Springer Nature journal

Gunnar Ridderström/Pexels

As a hospital librarian, Jessica Waite is typically successful at tracking down elusive articles for clinicians at Royal Hallamshire Hospital in England. So when a colleague couldn’t locate two references in a paper and asked for help, the librarian grew suspicious.

“These were recent references, which usually we have no problem finding,” Waite told us. “I looked at the issues of the journals where the article should have been, and there were completely different articles, so I immediately thought that the articles we had been asked to find were not real.”

The references were from an article exploring mental health integration after bowel diversion surgery published in Digestive Diseases and Sciences (DDS), a Springer Nature title. 

Acting on her hunch, Waite searched for the full list of references and discovered that 12 of the 14 didn’t exist. Such cases frequently result from hallucinations generated by large language models such as ChatGPT

Author Marie Atallah claims the references were submitted in error, an oversight affected by a brain injury. She provided us with two more reference lists, the first with more nonexistent sources and a final copy with real references.  

“I was really shocked that an article published in a Springer journal and which presumably had gone through a peer or editorial review could have such a high number of references to hallucinated articles,” said Waite, a clinical/outreach librarian at Royal Hallamshire Hospital. “I can see how one or two may slip through the net, but this was a preposterous number.”

Waite is among many people who’ve contacted us about fake references in research papers. In December, we wrote about a paper in the Journal of Academic Ethics that included fake references, and in July, we reported on a paper rejected for use of AI that was later published in World of Media with hardly any changes.  

Waite contacted Springer Nature about her concerns, and the publisher said it would investigate the matter, according to an email we have seen. 

Michael Stacey, head of communications for Springer Nature journals, confirmed the publisher was alerted to concerns about the paper’s references earlier this year. 

“We take all concerns about papers we have published extremely seriously, and are now looking into the matter carefully following an established process and in line with the Committee on Publication Ethics (COPE) best practice,” Stacey told us in an email. 

Hallucinated references are an area the publisher is “actively exploring,” added Chris Graf, research integrity director for Springer Nature. Last April, the publisher announced the launch of a new in-house AI tool to check submissions for irrelevant references.

“This is more complex than it may at first appear, as references can be detailed by authors in a variety of different ways, often do not include DOIs, and simple tools to identify hallucinated references can produce false positives,” Graf told us by email.  

The DDS article explores the psychological burden of ileostomy, a surgical intervention for conditions like inflammatory bowel disease, colorectal cancer, and traumatic bowel injuries. Atallah, the sole author, is a psychologist who trains postdoctoral fellows at Sutter Health in San Francisco. 

Atallah acknowledged the false references, but said they came from an incorrect draft version she accidentally submitted to the journal. Atallah told us she worked on the paper for five years and used published literature to support each claim. During drafts, she used various tools, including AI, “which we know carries the known risk of hallucinated references,” she told us.  

Although her sources were collected and stored in a cloud-based reference manager, each draft refinement led to hallucinated sources, which Atallah “either deleted or kept in to verify later, in case some of these were real but hard to find and provided additional context,” she told us. When Atallah submitted the paper to the journal, she sent a version with the hallucinated references, she said.

“This was unfortunately not caught on time by me, the reviewers, or the journal during their review process prior to the final publication,” Atallah told us. “The Journal eventually reached out to me when the error was caught, in which I was informed the appropriate discussions and steps to correct this were being undertaken.”

Atallah added the paper is deeply personal, as she underwent an unplanned ileostomy procedure in 2020 following a near-fatal car accident. During the accident, she sustained a severe traumatic brain injury, she said. That injury affects her frontal lobe, which “plays a central role in executive functioning, including planning, organization, decision-making, impulse control, and attention to detail,” she explained. 

“As a result, I have had to be especially intentional about the systems and safeguards I use to manage complex academic and professional work,” she told us. “While this is in no way an excuse it may offer context for how I could have used a placeholder version of citations.”

Atallah sent us a new draft of her paper with an updated reference list and attached PDFs of “the original sources” from which she says the work originated. Of the 21 sources on the new reference list, we could only verify the existence of four of them. Other references on the new list had different journal names, varying titles, or different authors and pages from the PDF document purported to correspond. The PDF document of works included real articles, but most did not match Atallah’s reference list.

We reached out to Atallah again to inquire about the inconsistencies in the reference list and PDF document she provided us, and she sent us a third reference list for the paper. The latest list contains 25 sources and all references appear real, according to our check. Atallah did not explain what happened, but apologized “for the confusion once again.”  

“All corrections consist of modifications to the references only,” she told us. “The article content and all claims remain unchanged.”

As for staff at Royal Hallamshire Hospital, Waite said the psychologist who requested help with the paper was disappointed the references couldn’t be found. 

Fake references make Waite’s job as a librarian “so much harder,” she added.   

“I expect this on the wider internet, but not in reputable scientific journals,” she told us. “This is a problem both for the information literacy teaching and the clinical literature searching I do to inform evidence-based practice in my hospital. It’s really worrying.”


Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on X or Bluesky, like us on Facebook, follow us on LinkedIn, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].


Processing…
Success! You're on the list.

32 thoughts on “Librarian finds ‘preposterous number’ of fake references in paper from Springer Nature journal”

  1. “Hallucinations” is a polite word for lying. Based on the common-sense idea that it should not be that hard to code a program NOT to generate non-existent citations, a rational conclusion is that the common AI output is intended to lie-probably to create the output quickly… speed over truth. Sounds like AI is a human-produced product.

    1. It’s not coded to generate citations at all. They are general-purpose LLMs which are trained to generate text that is similar, in various ways, to their training set. This process inherently generates fake references, because the set of potential reference-shaped bits of text is massively larger than the set of actual things to be referred to. And an LLM which never produces novel combinations of words, only pre-existing ones, is a plagiarism engine, which is not considered all that useful.

      So far the evidence suggests that this is nearly impossible to prevent. As you say, speed may be an issue, but I see reason to believe the developers when they say that this is tough. “I am going to quote this specific paper to support this specific point” is a big ask for something that does not think.

      I am aware of one valid use for LLM in reference lists: asking “how can I format this citation correctly?” and looking at the result carefully before putting it in. This was helpful in a recent paper that had to cite the grey literature heavily; it was hard to figure out the formats. I would recommend never, ever using them to actually generate citations. It’s one of the things they do worst, unless you are actually satisfied with a reference-shaped bit of text.

    2. It’s not even lying because there’s no intentionality in an LLM’s output. It’s just a glorified autocomplete that spews out algorithmic guesswork. Using Harry Frankfurt’s definition, it’s a bullshit generator pure and simple.

      1. I love your name for it. Bullshit generator. I thought the same thing when I read article. Hallucinations?? Come on. I just don’t trust today’s scholars to always be honest, I guess.

        1. The debate among the philosophers on whether the characterization of LLMs as “bullshit generators” à la Frankfurt is accurate stricto sensu remains ongoing [1-10]. But I at least tend to agree that calling fake references “bullshit” is far more (accurately) descriptive than just saying they’re “hallucinations.”

          [1] https://doi.org/10.1007/s43681-025-00743-3
          [2] https://doi.org/10.1007/s13347-025-01013-0
          [3] https://doi.org/10.1007/s13347-025-00991-5
          [4] https://doi.org/10.1007/s13347-025-00977-3
          [5] https://doi.org/10.1007/s13347-025-00907-3
          [6] https://doi.org/10.1007/s10676-025-09869-8
          [7] https://doi.org/10.1007/s10676-025-09845-2
          [8] https://doi.org/10.1007/s10676-025-09828-3
          [9] https://doi.org/10.1007/s10676-024-09802-5
          [10] https://doi.org/10.1007/s10676-024-09775-5

        1. How can anything in the paper be supported if nothing changed except all the fake citations?
          You can’t just make stuff up and then find/make up citations for it. That’s for politics, not science.

          1. Because the info could still be correct. If I say plants leaves are generally green, and add a BS citation, its a bad citation. It does not mean plant leaves are actually purple

          2. Well, a statement that is supported by a nonsense reference is not supported. Which means that the reliability of the authors’ statement cannot be verified.
            Generally, obvious truths do not need a reference; when you get into the non-obvious a fake reference is a red flag for unreliability.

      2. I view them as general purpose sentence-like object fabricators whose primary use case (in financial terms) is bullshit generation.

        (Other use cases tend to involve training on a restricted corpus for which they basically pull out the material relevant to a query considerably more effectively than a search with a regular expression, and the fabricated sentence-like objects may closely resemble actual sentences in the corpus.)

        All the outputs are fabrications, and equally valid in terms of the model used; any associated meanings are imparted by the reader based on their own world model. These things are not supposed to give correct formulations of propositions, merely highly plausible sequences of words.

        I’d agree that “bullshit generator” is close enough, in the current setting of the discussion, and certainly more helpful than “AI”. And perhaps clearer in some important respects than “stochastic parrot.”

        On the other hand, an automated proof verification system does what it says on the label … deterministically. And there are other similar tools using much older ideas.

        There is also a long history of researchers talking to their dogs and getting value from that. In the absence of a dog an LLM can do something similar.

        … I find I’m basically repeating Mary’s second sentence, at length, but then it’s worth repeating.

    3. I think even better would be placeholder citation lists that contain obviously fake authors, titles, publication dates (e.g. 1066) and journals. The intra-article cites and references cited could be highlighted in yellow cites until intentionally corrected by the author. Such would alert the authors and reviewers of potential editorial changes needed.

      This case sounds very strange though. An author who used a personal clinical neuropathy as a research paper. And are the coauthors not involved in the review process before submission?

  2. I used an LLM to reformat my citations from MLA to Chicago style. It inserted references into my bibliography which I did not have in there and which did not exist. Thankfully I caught it long before publication, but not before a draft went out. Peer reviewers did not catch it because the arguments were solid and most references existed and were relevant. I don’t know what LLM she used but it is within the realm of possibility that she put in a draft with correct references and asked the AI to review or critique the draft and it replaced her actual references with fake ones.

    1. Why would you use a chat bot to make citations when there are already free tools specifically designed for such a purpose like mybib

  3. A “new in-house AI tool to check submissions for irrelevant references” and hallucinations- this reminds me of drug dealers with trap houses next to rehabilitation programs.

  4. If you use a hammer to butter toast, you have no right to complain about the result. Especially a cheap hammer. Incompetent authors who are desperate to generate paper-like objects using low-end freeware should be eliminated from the scientific literature, along with those who publish them. This is easily done: don’t cite the crap, and don’t subscribe to the crap merchants. Otherwise, you are an accessory to a scam.

  5. Never trust an author who cites hallucinated references, because he cited something he knows nothing about. Never trust a journal published such articles, as it does not do the most basic gatekeeping job.

  6. Academic Journals should have all put on a hard-line ban on LLMs being involved in drafting papers years ago. The only step those things should be involved in is number crunching and pattern recognition for the study’s data. Aka the thing LLMs are actually designed for.

  7. It’s interesting to see all the excuses that people come up with to try to explain their dishonesty. It really all boils down to “the dog ate my homework. “

  8. It is dissapointing the in house AI tool could not correctly flag these fake references. This is not a job for an LLM in my opinion. A clasification algorithm or neural net would be more suitable for this task. 1. Identify all citations in the paper using a classical ML model. 2. Query a database for potential matches for each citation and get a list of potential references for each. 3. Check these potential references against the references listed in the paper. If there is no match flag the citation as unresolved so it can be manually checked by a human. Its not a simple problem to solve but think something like this could ache ve decent accuracy if implemented correctly.

  9. The core issue here is that the author could not have read her references. She could not have conducted background research with the intent of substantiating her findings. Forget about the LLM hallucination issues; they only matter in that the author’s intent to appear that she researched her topic without having done so failed.

  10. Another librarian here: I have also reported published articles to their publishers for false citations. The writer’s excuse that they were accidentally included in a rough draft rings entirely false. If they had not read a real publication in preparation to write their article, they had no business referencing anything… and putting in a false citation in the hopes that later on they would find a real source that said the same thing is putting the research cart before the horse. Do actual research and reading or bow out of the academic profession.

  11. This does not come as a surprise to me, and it is not just a question of the integrity of the author, it is also one of the integrity of the journals. Recently, Springer Nature have lauded their new SNAPP system, what they call a “next-generation peer review system.” See: https://www.springernature.com/gp/snapp. It is not delivering on its important responsibility of using academic expertise to evaluate academic work.

    As an international scholar, frequently solicited to review academic manuscripts, I was approached by Springer Nature journals to review papers with no relation to my field of expertise (an important component of the peer review process). These messages looked like SPAM, with strange language configuration, incorrect salutation, a no-reply email, and no institutional affiliation provided for editor. Further, however, they were generated, there was no connection between the subject of the papers and my expertise. The two I have received in short succession were on subjects I could barely recognise, let alone review. I am aware of other colleagues receiving similar requests.

    It would appear that Springer are giving AI free hand in the serious matter of academic review. They need humans with disciplinary expertise providing oversight. A young academic, hungry for the experience/prestige of reviewing, and under pressure to gain valuable experience might have accepted to review a topic they didn’t really know.

    We mustn’t forget that academic research is about standing on other people’s shoulders. Each academic builds on the body of research of their discipline. When we review, we KNOW the work that the authors are citing. It is of OUR discipline. And, when they introduce new work, curious, we look it up!

    Scholarship is a human activity. It can be supported by machines for banal things like calculation and orthography. But, for thoughtful scholarship, humans cannot delegate the task of evaluation.

    Springer has much to answer for as they implement their review processes.

    (when I wrote Springer to critique this process in strong terms, I received a completely unsatisfactory answer: instructions about how to decline the request).

    1. Springer Nature publishes for profit because they are a publically traded company. Their main responsibility is to their stock holders. This publisher will not care what you and I think.

  12. “This is more complex than it may at first appear”, how complex is it to retract or at least slap an EOC on the article now that you know about it? Also, if you’re providing Google Scholar links and the majority of them are unable to link to the articles perhaps that should be a clue? Pretty sure you could hire a monkey to click links and it would apparently be more effective than whatever AI tools they’re shilling out for.

Leave a Reply to Anonymous HistorianCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.