One in 277 PubMed-indexed papers in 2026 shows fabricated references, says analysis

Figure from correspondence to The Lancet by Maxim Topaz and colleagues.

Fabricated citations in the biomedical literature have increased 12-fold in two years, according to an audit of nearly 2.5 million papers published as a letter to The Lancet today. 

The analysis of articles indexed in PubMed found that about one in 277 papers published in the first seven weeks of 2026 referenced a paper that didn’t exist. That was a jump from 2025’s rate of one in 458 and 2023’s one in 2,828. The researchers, led by Maxim Topaz of Columbia University’s Data Science Institute, used AI to “distinguish genuine fabrications from formatting discrepancies such as informally abbreviated titles.”

Topaz’s group located the sharpest increase in hallucinated references in mid-2024, which they note coincided with the rise of AI writing tools. The findings come as Nature reported last month that tens of thousands of publications from 2025 “might include invalid references generated by AI.” Retraction Watch has seen its fair share of reports of hallucinated citations generated by LLMs like ChatGPT.

In their sample, Topaz and his colleagues were able to verify 97.1 million references, from which they identified 4,406 “fabricated” references that appeared in a total of 2,810 papers. Based on data from the Retraction Watch Database and elsewhere, nearly all the articles found to have fake references — over 98% — had seen “no publisher action” at the time of the audit in February. 

A Taylor & Francis spokesperson told us that publisher is investing in “technology, specialist staff and processes to catch problematic” citations. Articles with concerning references are returned to the author, the spokesperson said, adding, “If these account for more than a small proportion of citations and/or substantially impact the overall integrity of the manuscript, the submission will usually be rejected.”

Renee Hoch, head of publication ethics for PLOS told us they are “exploring options for system-wide reference integrity screening.” Hoch also said PLOS doesn’t automatically classify fabricated references as misconduct: “Research misconduct has a specific definition that includes an element of intent, and whether an issue qualifies as research misconduct is addressed at the institutional level, not at the journal or publisher level,” she said.

We also contacted Elsevier, Wiley, Springer Nature, IEEE and Sage, but they did not respond in the short timeframe provided by The Lancet’s embargo.

Publishers need to take fabricated references seriously, Howard Bauchner and Frederick Rivara write in a commentary accompanying the analysis. Bauchner is the former editor of JAMA and Rivara the former editor of JAMA Pediatrics. The two argue that in cases in which a hallucinated reference appears, the paper should be retracted.

Researchers “incur responsibility for the entire content of that paper” when they agree to be authors, Bauchner and Rivara write. “Retraction of these manuscripts might lead to greater scrutiny of references by authors of manuscripts.”

David Resnik, an integrity researcher at the National Institutes of Health, disagrees, telling us the choice of whether a paper with a fabricated citation should be retracted “depends on the role the citation plays in supporting the results of the study.” His view aligns with that of Topaz, who told us an article should be retracted when the hallucinated references are central to the conclusions of the paper. Topaz gave an example from the analysis where 18 of 30 references appear to have been fabricated. 

“For papers with one or two fabricated references that are incidental to the main findings, I think correction and transparency may be more proportionate than retraction,” Topaz told us. He noted 91% of the articles in his group’s dataset of problematic papers had only one or two fabricated references, many of which “are likely honest mistakes by authors who used AI tools without verifying the output.” 

Ella Flemyng, the head of editorial policy and research integrity at Cochrane, called the new study’s findings “serious” but had concerns about it.  “Though the approach [using AI] was validated on 500 records and the main limitations are discussed, we are lacking considerable details about the methods,” she said.

She also noted because the conclusions rely on an AI-assisted audit, “confidence in the findings depends less on the headline number and more on: how the AI system was designed and validated; how errors were assessed and corrected; and how reproducible and transparent the overall process is.”

Mohammad Hosseini, a researcher in biostatistics and informatics at Northwestern University’s Feinberg School of Medicine, called The Lancet analysis “simplistic.” In a March paper, Hosseini and Resnik made a point of distinguishing between hallucinated citations that matter to a paper’s scientific conclusions and those that do not. Topaz’s group didn’t differentiate scientifically critical references – which effectively function as data – from those that were relatively less important, Hosseini said. 

Hosseini told us the study represents “low-hanging fruit” and the “tip of the iceberg.” He said the “bigger and more important problem” remains citations generated by AI that aren’t wholly hallucinated but are inaccurate, biased or incomplete. “We are far from being able to even detect them or do anything about them,” he said. 

Flemyng had a similar perspective, telling us that, along with addressing individual instances with fabricated citations, “we also need to highlight the pressures in academia that create a perverse incentive for fast science; a researcher needs more publications, more citations, and it doesn’t surprise me that corners are being cut and outputs are not fully verified.”

Hosseini and Resnik wrote in their paper AI-fabricated citations are likely to persist because hallucinating is “inextricably linked to how LLMs operate.” 

Whether or not LLMs are expected to stop hallucinating, Topaz told us, the “damage is already done.” The “contamination” of over 4,000 fabricated references his team found “does not go away when the AI gets better,” he said. 

Hallucinated references have lately been drawing much attention from sleuths, research misconduct investigators and journalists. Late last year, we reported a World Bank paper on obesity trends contained at least 14 fake references. In March, we wrote about a librarian who discovered 12 of 14 references in a Springer Nature article on bowel surgery management did not exist. Our cofounder Ivan Oransky was the named author of a hallucinated reference on a paper in a Springer Nature journal last year

On an interactive website about their research, Topaz and his colleagues report on one publisher that “produced fabrications at more than fourteen times the rate of the most selective journals in the dataset.” The publishers with the highest rates of fake references remain unnamed at the site because a “raw comparison of publisher-level rates would be misleading without adjusting for the volume and type of papers each publisher indexes in PubMed,” Topaz told us. He declined to identify the publishers’ individual rates.

“What I can say is that the concentration is disproportionately among large open access journals and publishers, which is consistent with what others have observed about where papermill activity and less rigorous peer review tend to cluster,” he said.

Topaz and his coauthors recommend a series of actions to deal with what they see as the growing problem. First, they suggest fighting AI with AI: “publishers should integrate automated reference verification into submission workflows before peer review begins.” They also want to see article-indexing services add integrity metadata so flags travel with references, and they want to see fake references tracked in research integrity databases. Finally, they say, “publishers should retroactively screen existing publications and issue corrections or retractions when fabricated references compromise a paper’s conclusions.” 

The research team is particularly concerned about review articles, which they note “had a fabrication rate that was 57% higher than other paper types.”

Flemyng shares that worry. “In this new age of AI, the need for full systematic reviews that meet the expected standards is paramount. To risk introducing biased, unsystematic AI slop into the literature would be a serious step backwards,” she said.


Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on X or Bluesky, like us on Facebook, follow us on LinkedIn, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.