
A few months ago, when Elle O’Brien, a data scientist at the University of Michigan, was checking who had recently cited her work on Google Scholar, she came across something that would take her and her colleagues down “a rabbit hole.”
When O’Brien opened a publication that had recently cited her, it appeared to be a rewritten version of an arXiv preprint she had co-authored with two colleagues, Grischa Liebel and Sebastian Baltes. Yet this did not seem to be a simple case of theft by other academics.
For starters, the six authors listed on the fake article didn’t exist, although three had been given the same institutional affiliations as O’Brien, Liebel, and Baltes: the University of Michigan, Reykjavik University and Heidelberg University, respectively. The similarities in the texts read as if someone had typed, “ChatGPT, please rephrase this paper without changing anything else,” Liebel wrote in a post on LinkedIn. But why would fake authors need publications?
“For me it was, first of all, what the f?” Baltes told Retraction Watch in a phone interview. “We didn’t understand the purpose of it.” Then, when they looked closer at the text’s references, “we saw that it’s about pushing citations up,” Baltes said. “I got really annoyed.”
What O’Brien, Liebel, Baltes and others ended up finding was a series of fake articles across multiple preprint servers, plagiarized from real articles and attributed to authors who don’t exist. These papers seem to be designed to inflate citation counts of someone being cited. But who uploaded them to the servers is unclear, and the researchers who benefit the most from the citations have denied any involvement, saying they themselves have flagged the articles to publishers.
The fake article that first tipped O’Brien off, published on Elsevier’s SSRN, included many references that had nothing to do with the content of the work. The trio identified Yuze Hao of the Inner Mongolia University in China as benefiting the most from the reference list. After checking his publications online, Liebel ran a quick analysis of one of Hao’s most cited works — a 2025 conference submission that had been cited 91 times at the time, now more than 100. Aside from four citations in the Association for Computing Machinery and IEEE papers, Liebel found that the remaining citations of Hao’s work came from preprint servers, primarily arXiv, SSRN, Authorea and Cambridge.
Liebel opened each arXiv article that supposedly cited Hao but was unable to find any reference to Hao’s work in them. Of the 15 fake papers hosted on SSRN that cited Hao’s conference contribution, Liebel said he was quickly able to find the earlier works that the fakes plagiarized. He just looked for terminology or distinctive words that an LLM would be unlikely to change and typed them into Google Scholar. The fake papers had authors who didn’t exist, although affiliations named real universities. The first authors mostly had an Outlook email address or fake institutional ones.
In a comment on Liebel’s LinkedIn post describing the situation, Hao denied engaging in citation fraud. He wrote that he had noticed “abnormal citation activity” on his Google Scholar profile in November 2025, and had since closed his profile and submitted retraction requests for the SSRN papers. “I suspect that someone with the same [name] spelling as mine has impersonated me and committed fraudulent acts,” he wrote. His ResearchGate profile now lists only 58 citations to his work.
In response to our questions, Hao wrote that he took “immediate action” as soon as he became aware that his “legitimate research was being cited by these fraudulent preprints.”
“I have zero involvement in the creation or publication of these preprints,” he told us by email, including boldfacing for emphasis. “I do not benefit from ‘inflated’ citations derived from fraudulent or unrelated work, as such activity only serves to damage my professional reputation and the integrity of my field.”
Hao said he has requested the retraction of the articles from SSRN and Authorea. “We share the same goal of cleansing the scholarly record of such fraudulent activities,” he wrote. “I would appreciate it if your coverage reflects that I am the whistleblower who reported these specific papers to the platforms involved.”
As of last week, 11 of the 15 fake articles had been removed from SSRN’s website for “confirmed plagiarism,” according to an Elsevier spokesperson. Since our questions to the publisher, the remaining four under investigation have also been removed.
“SSRN has our own internal system and processes for identifying and guarding against plagiarism,” the spokesperson wrote, including advanced detection technologies. “We consult outside reports and respond promptly when potential plagiarism charges are brought to our attention.”
Hao was not the only researcher who gained citations from the first fake article the group identified. Jiaming Pei, a postgraduate student at the University of Sydney, who describes himself as one of the “world’s top 2% scientists” on his Google Scholar profile, was also cited frequently.
When Baltes contacted him in January, Pei also said he did not know what was going on.
“It’s not my paper. I don’t know what happened,” he wrote in an email Baltes shared with us. “U can ask SSRN to check who did that. I never use preprint websites. In the necessary way, please let them provide the account that submitted the ur paper. I never did that, Why do you say this is mine?”
In a subsequent email, Pei apologized for his tone because he was “upset and defensive” from feeling accused of misconduct.
“I have previously experienced a very similar incident, where my work was plagiarized and posted as a preprint without authorization, which later caused serious complications for a legitimate journal submission. Because of this, I fully understand how distressing and disruptive such misconduct can be for the original authors,” Pei wrote in the email to Baltes and the other authors. He has not responded to requests for comment from Retraction Watch.
Baltes and his colleagues were also copied on an email Pei sent to SSRN the same day, requesting removal of the preprint that “includes a large number of citations to my publications that are unrelated to the content of the work.” He called the practice “highly problematic” and said it could cause “direct harm to the academic reputation of the cited researcher.”
Mapping the full extent of the issue is proving to be difficult for the group. “It’s a rabbit hole,” Baltes said.
Others, including Matt Hodgkinson, a research integrity specialist, are working on gathering evidence of a possible citation mill. Liebel and Baltes acknowledge such schemes could be used to discredit somebody. “You could automate this in a way that a colleague gets fake citations,” Liebel said.
Part of the issue is with the field itself. Liebel and Baltes explained computer science values outputs at conferences, not only journals, meaning platforms which don’t count conference contributions don’t give a complete picture of a researcher’s output. “There’s no baseline that you could say, here’s my Scopus H-index, here’s my Google Scholar one,” Liebel said. “The truth is somewhere in between.”
“For me, one of the big conclusions is, it just gets more and more important to go away from this whole citation counting,” Baltes said. “It’s almost meaningless.”
Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on X or Bluesky, like us on Facebook, follow us on LinkedIn, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].
I have no idea what is going on and it seems no one else has either from the looks of it. But given some recent posts here at RW, it would indeed seem trivial to maliciously pump someone’s Google Scholar records 10x or even 100x higher for a given year for making that someone to look like a fraud. That said, Scopus etc. are not the answer either; in fact, when not manipulated, Scholar works pretty well to record “attention” (which may have nothing to do with “impact”).