The “phantom reference:” How a made-up article got almost 400 citations

Here’s a mystery: How did a nonexistent paper rack up hundreds of citations?

Pieter Kroonenberg, an emeritus professor of statistics at Leiden University in The Netherlands, was puzzled when he tried to locate a paper about academic writing and discovered the article didn’t exist. In fact, the journal—Journal of Science Communications—also didn’t exist.

Perhaps Kroonenberg’s most bizarre discovery was that this made-up paper, “The art of writing a scientific article,” had somehow been cited almost 400 times, according to Clarivate Analytics’ Web of Science.

Anne-Wil Harzing, a professor of International Management at at Middlesex University in London, who recounted Kroonenberg’s discovery in her blog, wrote:

To cut a long story short, the article appeared to be completely made up and did not in fact exist. It was a “phantom reference” that had been created merely to illustrate Elsevier’s desired reference format.

Here’s the reference from Elsevier’s reference style section, part of its author guidelines (we’ve seen examples that cite the paper as from 2000 as well):

Van der Geer, J., Hanraads, J.A.J., Lupton, R.A., 2010. The art of writing a scientific article. J Sci. Commun. 163 (2) 51-59.

Puzzled, Harzing set out to understand how so many authors could cite this paper.

Harzing found that nearly 90% of the citations were for conference proceedings papers, and nearly two-thirds of these appeared in Procedia conference volumes, which are published by Elsevier.

When examining some of the papers more closely, Harzing found “most citations to the phantom reference occurred in fairly low-quality conference papers,” and were written by authors with poor English. She said she suspects that some authors may not have understood that they were supposed to replace the template text with their own or may have mistakenly left in the Van der Geer reference while using the template to write their paper. There may be minimal quality control for these conference papers, says Harzing; still, she found that the phantom reference did appear in about 40 papers from established journals.

Harzing concluded that the mystery of the phantom reference “ultimately had a very simple explanation: sloppy writing and sloppy quality control.”

We contacted several researchers who cited the phantom reference; all attributed it to some kind of mistake. One said he believes two similar references were somehow confused, and the “Van de Geer” replaced the correct one; another author said he has contacted the publisher to fix the error.

Although 400 citations sounds significant, Harzing put the number in context: Out of nearly 85,000 Procedia conference papers, the phantom reference appeared in less than 0.5% of articles:

Whilst unfortunate, one might consider this to be an acceptable ‘margin of error’.

She added:

In a way we can be glad that our phantom reference IS a phantom reference. If this had been an existing publication, the mistakes might have had far more serious consequences.

The take-home, Harzing says, is “If something looks fishy, it probably IS fishy!”

Update, 1500 UTC, 11/15/17: The researcher who requested the correction told us the publisher has agreed.

Hat tip: Research Whisperer

Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here. If you have comments or feedback, you can reach us at [email protected].

21 thoughts on “The “phantom reference:” How a made-up article got almost 400 citations”

  1. Thanks for the information. One way to avoid or at least limit phantom references is to get from the authors the link to full copy of all references. This will limit the number of references and allow getting more easily reference titles and abstracts and looking to the full paper when needed. Numerous journals ignore the reference titles and getting this information only for all references is a very tedious work for a reviewer. This is important to see if the references are well balanced or not.

    1. Many journals now change the referencing style to including title of the paper which I really really like. It makes our work tracing back the paper so much easier, by google search (free platform, I really hate scifinder – it’s not user-friendly at all) it.

  2. I think “sloppy writing” and “sloppy quality control” are too euphemistic. This really represents a dereliction of duty. It’s so frustrating to see how cavalierly many scientist take putting their name to a permanent document.

    1. I don’t think that this is fair. Given the complexity of the co-authorship decision (nb. many journals require an authors’ contribution section, which makes this point explicit), you can’t expect every author, ditto peer reviewers, to check every detail of the paper. Usually there is a “lead” author (usually, although not always, the corresponding author) who is really the only person who can be held responsible for every word of the paper, and even then, they usually can’t give you every detail of, say, how the analytical programs work. Science is a team sport, and people are fallible. As long as there’s no intent to deceive, you would expect, statistically, a bit of slop, and science is built to self-correct, and does!

  3. “Sloppy” reminds me of how it was described when Sandy Berger removed documents from the National Archive – – he was described as “just being sloppy”. Most of us call it dishonesty.

  4. I see how this can happen. On my course syllabi I include a sample homework at the end with the format that I expect. And I did have a student write “sample homework” at the top of every assignment for a whole semester. He failed the class for unrelated reasons.

    1. The difference though is that every reference listed at the end, should have a matching citation somewhere in the body text. It really shouldn’t be that hard to go through your reference list and see where it appears. Admittedly, this is somewhat easier with some styles than others (IEEE for example), but if you’re spending weeks/months writing your publication, an extra 5-10 minutes shouldn’t be that much more work to ensure that
      1) Your citations are correct
      2) Your citations exist
      3) You’re not giving free citations to articles that you are no longer referencing

      1. 3) You’re not giving free citations to articles that you are no longer referencing

        This is supposed to be a manuscript editor’s job, so I’m not surprised about the conference proceedings, but the 40 real papers are squarely the fault of the publisher.

        1. ^ For that matter, the task can be trivially automated if the publisher is putting in links markup from cites to the reference list before sending them off to production.

  5. I think that what you call “sloppy” is actually just smart management of one’s time. Number of mistakes caught per editing unit time reminds me of a decaying exponent, and you may choose whether you want to use taxpayers’ money to polish your paper or do something useful. In the end, if the message of the paper is clear, there is very little added value to the readedr in getting every tiny detail right.

    1. I disagree.
      Each and every journal I’m familiar with explicitly expect authors to make sure they properly credit prior research, and that they will accurately write the bibliography. Based on your logic, the Reference list can be removed all together since it is not really essential for understanding the published research.

  6. This is what happens when fact-checking and editing are seen as extraneous. As an editor for a major nursing journal, I edited dozens of research articles and reviews and checked every source. I have found countless instances of erroneous or blatantly wrong citations. Researchers often get their own math wrong. Since most medical and nursing journals don’t edit, I can only imagine the extent of error–whether purposeful deceit or more innocent errors–in the literature.

  7. There was a similar story in my research field, Music Information Retrieval, with what started as an 1999 unpublished poster communication being wrongly referenced as a paper in one influential Phd thesis and then, on the basis of that, re-cited hundreds of time. Like chinese whispers, there were rampant typos on the authors’ names (Prof. David Perrott ended up a Perot, Perrot, and even a Parrot – an ex-Perrott, if I may) and, perhaps more concerning, on experimental results (a key measure of duration was quoted as anything from 50ms to 450ms). The (real!) authors of this (real!) work, who were totally unsuspecting that their unpublished work was generating such frenzy in a field outside of their own, were kind enough to “officialize” it with a proper article, which Journal of New Music Research let us invite and edit in 2008. More in that issue’s introduction: http://www.tandfonline.com/doi/abs/10.1080/09298210802479318?journalCode=nnmr20

  8. Of course, the problem is that ChatGPT doesn’t just offer citations unchecked, it literally creates them anew.
    So how much do you want to use a tool that creates fictional references if you ask for a real reference?
    And why would that be of any use whatsoever?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.