How thousands of invisible citations sneak into papers and make for fake metrics

In 2022, Guillaume Cabanac noticed something unusual: a study had attracted more than 100 citations in a short span of less than two months of being published.

Cabanac, a computer scientist at the University of Toulouse in France, initially flagged the study on PubPeer after it was highlighted by the Problematic Paper Screener, which automatically identifies research papers with certain issues.

The screener flagged this particular paper — which has since been retracted — for containing so-called tortured phrases, strange twists on established terms that were probably introduced by translation software or humans looking to circumvent plagiarism checkers.

But Cabanac noticed something weird: The study had been cited 107 times according to the ‘Altmetrics donut,’ an indicator of an article’s potential impact, yet it had been downloaded just 62 times.

What’s more, according to Google Scholar, this paper had been cited only once. “There was a clear discrepancy between the counts on Google Scholar and the counts on Altmetrics/Dimensions,” Cabanac says. That gap is especially significant since “we know that usually Google Scholar overestimates the number of citations,” he adds.

After a little probing, Cabanac and his sleuthing colleagues figured out where the extra citations were coming from: the metadata files submitted to Crossref, a repository for unique identifiers for scholarly metadata, as the group report in a preprint posted to the arXiv server October 4. Google Scholar doesn’t use metadata files submitted to Crossref; instead it text-mines PDF versions of studies, Cabanac says.

“We believe we found an undocumented way of cheating with citation counts,” Cabanac tells Retraction Watch. “It’s original because it doesn’t require fraudsters to alter the version of record, meaning the PDF or HTML version of the paper.”

The metadata files of the papers in question seem to contain more references than are in the HTML or PDF versions, Cabanac says. According to Cabanac, the references are sneaked in at some point into metadata files that are submitted to Crossref and automatically ingested. Since metadata files can be resubmitted as many times as one likes, updated metadata files can also be submitted anytime after an article is published.

These extra undue citations ultimately inflate the Altmetrics score represented by the donut, which depicts how often an article is being cited and mentioned on social media. That’s problematic because these inflated citation scores are ultimately reflected on bibliographic platforms like Dimensions. Citation counts are frequently used as a way to judge researchers and apportion funding so boosting such indicators could falsely amplify a researcher’s perceived impact.

According to the study, the introduced references seem to be coming largely from journals published by Technoscience Academy, an open access publisher run out of Gujarat, India, and a Crossref member. Technoscience Academy did not reply to a request for comment.

It isn’t clear who is manipulating the metadata files or whether the issue is due to a technical glitch, is unclear. But Cabanac says the phenomenon is a result of lack of gatekeeping. One way of addressing the issue would be by building tools and systems to regularly compare the references within PDFs, HTMLs and metadata files of the whole scholarly literature, he adds.

Cabanac says if it becomes clear a publisher’s output includes cooked references, its Crossref membership should be scrutinized. Being the signatory on the agreement with Crossref, “the publisher is responsible for their actions,” Cabanac says. “They can run an audit in their own premises to see who the malevolent person is.”

“It looks really dodgy,” says Ginny Hendricks, director of member and community outreach at Crossref, who notes the case is the first time her organization has heard of sneaked references. “It definitely seems like a side effect of the community’s obsession with citation as a metric [and] a measure of impact or importance, which is unfortunate.”

She adds that Crossref will look into the issue, noting that the organization rarely ever revokes membership for cause. The only member it has excluded for cause in the past is Omics International, Henricks says: “They were causing harm to the whole community.”

Hendricks says Crossref has so far not considered introducing extensive gatekeeping but she encourages third parties to use Crossref’s open data to develop systems to do just that. “We’re not the people that decide scientific legitimacy,” she says.

The study analyzed the content of three journals published by Technoscience Academy, each of which have minted more than 1,000 digital object identifiers at Crossref. It found that around 9% of references included in metadata files of the papers published by these three journals — 5,978 references out of a total of 65,836 — benefitted just two researchers who had co-authored the studies being cited.

One of the researchers in question is J. Nageswara Rao of the Vignan’s Institute of Information Technology in Visakhapatnam, India, who benefitted from 3,103 extra citations, the study found.

Retraction Watch contacted Rao for a comment but has yet to hear back. The retraction notice for the paper Cabanac found reads:

This article has been retracted by Hindawi following an investigation undertaken by the publisher [1]. This investigation has uncovered evidence of one or more of the following indicators of systematic manipulation of the publication process:

(1) Discrepancies in scope

(2) Discrepancies in the description of the research reported

(3) Discrepancies between the availability of data and the research described

(4) Inappropriate citations

(5) Incoherent, meaningless and/or irrelevant content included in the article

(6) Peer-review manipulation

The second author that the study namechecks as benefiting from the sneaked references is Bhavesh Kataria of the LDRP Institute of Technology and Research in Gandhinagar, India, who has benefitted from 1,564 extra citations, according to the study. Retraction Watch could not find contact details for Kataria.

Three journals also profited from the sneaked citations, the study found. The International Journal of Scientific Research in Science, Engineering and Technology gained an extra 826 citations followed by the International Journal of Advanced Science and Technology and the Turkish Journal of Physiotherapy and Rehabilitation with 537 and 428, respectively.

In addition to sneaked references, the study also reports on instances of ‘lost references,’ which are references that are in the HTML/PDF but not in Crossref metadata files. “Users of the Crossref metadata (e.g., Dimensions) disregard some references because they are not in their database or because they failed to properly textmine the text of the references provided in the metadata,” Cabanac says. The study found that to be the case for 56% (36,939 out of 65,836) references in HTML versions of papers.

Editor’s note: Last month, Crossref acquired the Retraction Watch database. The deal does not involve the Retraction Watch blog, which remains independent.

Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on Twitter, like us on Facebook, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].

5 thoughts on “How thousands of invisible citations sneak into papers and make for fake metrics”

Friedemann says:

October 9, 2023 at 8:17 am

Did something get cut off in the last sentence? “The study found that 56% (36,939 out of 65,836) references in HTML versions of papers.” seems like it’s missing something.

1. Ivan Oransky says:
  
  October 9, 2023 at 9:01 am
  
  Fixed, thanks.
  
Cheshire says:

October 9, 2023 at 1:02 pm

Possible contact information for Bhavesh Kataria.

“Corresponding Author: [email protected]”

Source: https://www.researchgate.net/publication/369327824_Design_Engineering/link/6414be18a1b72772e406849c/download

Ben McLeish says:

October 10, 2023 at 3:16 am

“These extra undue citations ultimately inflate the Altmetrics score represented by the donut, which depicts how often an article is being cited and mentioned on social media.”
This is false. Altmetric (note it’s without an “s”) does not add citations to its Attention Score. It does have a tab that shows citations but like Mendeley saves they don’t add to the Attention Score.

Guillaume Cabanac says:

November 21, 2023 at 5:17 pm

The authors of the commented preprint received an email from the Editor-In-Chief (not disclosing his/her name) of one of the journals analysed in their preprint:

https://pubpeer.com/publications/41ACBE326B82E559AE6FE1D735527F#2

How thousands of invisible citations sneak into papers and make for fake metrics

Related

5 thoughts on “How thousands of invisible citations sneak into papers and make for fake metrics”

Leave a ReplyCancel reply

Share this:

Related

5 thoughts on “How thousands of invisible citations sneak into papers and make for fake metrics”

Leave a ReplyCancel reply