Springer Nature book on machine learning is full of made-up citations

Would you pay $169 for an introductory ebook on machine learning with citations that appear to be made up?

If not, you might want to pass on purchasing Mastering Machine Learning: From Basics to Advanced, published by Springer Nature in April. 

Based on a tip from a reader, we checked 18 of the 46 citations in the book. Two-thirds of them either did not exist or had substantial errors. And three researchers cited in the book confirmed the works they supposedly authored were fake or the citation contained substantial errors.

“We wrote this paper and it was not formally published,” said Yehuda Dar, a computer scientist at Ben-Gurion University of the Negev, whose work was cited in the book. “It is an arXiv preprint.” The citation incorrectly states the paper appeared in IEEE Signal Processing Magazine.

Aaron Courville, a professor of computer science at Université de Montréal and coauthor on the book Deep Learning, was correctly cited for the text itself, but for a section that “doesn’t seem to exist,” he said. “Certainly not at pages 194-201.” And Dimitris Kalles of Hellenic Open University in Greece also confirmed he did not write a cited work listing him as the author.

The researcher who emailed us, and asked to remain anonymous, had received an alert from Google Scholar about the book, which cited him. While his name appeared on multiple citations, the cited works do not exist.

Nonexistent and error-prone citations are a hallmark of text generated by large language models like ChatGPT. These models don’t search literature databases for published papers like a human author would. Instead, they generate content based on training data and prompts. So LLM-generated citations might look legitimate, but the content of the citations might be fabricated. 

The book’s author, Govindakumar Madhavan, asked for an additional “week or two” to fully respond to our request for comment. He did not answer our questions asking if he used an LLM to generate text for the book. However, he told us, “reliably determining whether content (or an issue) is AI generated remains a challenge, as even human-written text can appear ‘AI-like.’ This challenge is only expected to grow, as LLMs … continue to advance in fluency and sophistication.”

According to his bio in the book, Madhavan is the founder and CEO of SeaportAi and author of about 40 video courses and 10 books. The 257-page book includes a section on ChatGPT that states: “the technology raises important ethical questions about the use and misuse of AI-generated text.” 

Springer Nature provides policies and guidance about the use of AI to its authors, Felicitas Behrendt, senior communications manager for books at the publisher, told us by email. “Whilst we recognise that authors may use LLMs, we emphasise that any submission must be undertaken with full human oversight, and any AI use beyond basic copy editing must be declared.” 

Mastering Machine Learning contains no such declaration. When asked about the potential use of AI in the work, Behrendt told us: “We are aware of the text and are currently looking into it.” She did not comment on efforts taken during Springer Nature’s editorial process to ensure its AI policies are followed.

LLM-generated citations were at the center of controversies around Robert F. Kennedy Jr.’s “Make America Healthy Again” report and a CDC presentation on the vaccine preservative thimerosal. At Retraction Watch, our cofounders were once cited in a made-up reference in an Australian government report on research integrity.  We’ve seen fake citations fell research articles, and our list of papers with evidence of undisclosed ChatGPT use has grown long and almost certainly represents only a fraction of those that actually do. 

The same day Behrendt replied to our query, Springer Nature published a post on its blog titled, “Research integrity in books: Prevention by balancing human oversight and AI tools.” 

“All book manuscripts are initially assessed by an in-house editor who decides whether to forward the submission to further review,” Deidre Hudson Reuss, senior content marketing manager at the company, wrote. “The reviewers – subject matter experts – evaluate the manuscript’s quality and originality, to ensure its validity and that it meets the highest integrity and ethics standards.”


Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on X or Bluesky, like us on Facebook, follow us on LinkedIn, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].


Processing…
Success! You're on the list.

20 thoughts on “Springer Nature book on machine learning is full of made-up citations”

  1. I wouldn’t expect an answer from Springer any time soon. I reported a similar case of a book chapter which contained hallucinated references, including one which it attributed to me which doesn’t match anything I’ve actually written. It’s been 4 months now and I’m still waiting for their investigation to reach a conclusion.

    1. The publisher charges huge fees to supposedly ensure quality. If no one looked at this book during the whole process (not even the author lol) then it might as well have been self published.

  2. What scientific book only has 46 references? I’ve never seen one.
    The author is at fault yes but where’s the editor, the reviewers?

  3. It takes 2 to tango. The editor/publisher are equally culpable as is the author, since without their “approval” the fabricated manuscript would not have been published.

  4. Just as funding and potential conflicts of interest must be reported in scientific publication, I believe editors/publishers should require that ANY use whatsoever of LLM or so-called AI be reported, including the software version and the specific nature of its application in the course of the research and manuscript preparation. Personally, I find myself increasingly favoring sources such as 404Media that are entirely the product of real human beings. Over my career. every one of my several million published words was generated by me, and I intend to keep it that way. I am no Luddite, but as a technologist I think we must always weigh costs and benefits. Frankly, AI is causing far more problems than it purports to solves.

  5. It’s a shame that the bar is lower for publishing textbooks than for the students that use them. Making any of these “mistakes” in a classroom would be probation-worthy.

    1. I just graduated from a masters program. Rules about AI are vague and sometimes left up to the instructor, who declines to comment. Those who hope to stop the abuse of AI deploy unproven (disproven?) AI recognition software, resulting in false accusations. The only instructor who flagged AI correctly — he was able to more or less recreate the text with his own prompt — just let the student know and left it to her to decide to do better. It was our final paper.

      1. I mean more for faking references. I am unfortunately aware of the AI epidemic (at least nobody in my masters program was using Grok, as far as I know) but I think this falls under the umbrella of plagiarism

  6. Checking references is such an obvious & easy thing to do. When students submitted papers for a class I taught as an Adjunct, I often spot-checked references, particularly when I suspected plagiarism.

    While clearly the author is at fault regarding this book, it’s amazing that the editors of Springer apparently did little if any review before it was published. As noted by DS, that there are only 46 references should have been a red flag for the editor to read the book more carefully. Considering the made-up references, it’s likely that the text has much equally made-up ideas.

  7. Just downloaded a copy in case it disappears. The text is very imprecise in what I just read. The term “artificial intelligence” was not coined at a conference by John McCarthy, but McCarthy and 3 others applied for funding for that conference using the term “AI”. Turing did not invent the “Turing test”, he called it the “imitation game”. *We* have taken to calling it the “Turing test”. And indeed, very sparsely referenced. It is being sold for 230 €! The Wikipedia pages are better and cheaper.

  8. Interesting Google Scholar profile from the publishers link, the Pubmed link gives nothing. Just this book and each chapter are listed, same as the Springer internal list of publications. Is this person even real?
    https://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Govindakumar+Madhavan%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en
    The book does list his affiliations as with SeaportAI in Chennai, indicating his name as AM Govind Kuma or simply as Govind Kumar (his email address at SeaportAI is also Govind Kumar.

  9. If you don’t know to use AI don’t use it.
    But remember you will end up wearing glasses and become bald if you don’t use it.

  10. Springer-Nature is the worst offender. Allows journal or book editor to publish in their own edited book or journal to manipulate their H index and stuff their C.V. For Example S.K. Saxena (India). editor of journal Virus disease (springer-nature) publishes in his own journal and freequently edits books containing book chapters written by his students.
    17 chapters in one book written and edited by him
    https://link.springer.com/book/10.1007/978-981-15-4814-7#toc

  11. Anyone who has ever done any editing since the rise of Gen AI is aware of this problem, but the issue goes far beyond the final product, such as the book itself.
    From the very start of the process, like author acquisition, you are often greeted with AI-written emails (from all sides), and sometimes the entire correspondence with such authors or editors is done with the help of AI. I myself am not a native English speaker, and I sometimes miss the “natural” communication that included random typos in (in)formal messages. Of course, this doesn’t mean that native English speakers never make mistakes. Nowadays, you will notice the following:
    -You can never “judge” someone’s English, because it most likely uses AI. From one point I understand this approach, as you:
    a) can believe that bc of your english language level, your manuscript can be rejected (should not be the case, but fear is there)
    b) can be ashamed of it
    -Publishers are keen on saving money, and with the prevalent use of AI, some heavy editing is now gone (some publishers have stopped offering heavy editing, using the excuse that authors can now use AI to smooth the text).
    Publishers are also saving money on tools: now, along with plagiarism tools, you need to use “AI detector tools,” which are nonsense—they focus on the percentage of AI-generated content in the text rather than checking the validity of references, which is still manual work and something editors do not have time for.
    So, in the end, nobody (authors, editors, or publishers) is stepping in to address these hallucinated references. I do not see them suddenly disappearing; rather, I view all publishing from 2022 to today as a risk, since in my experience, at least one in six works edited had at least one hallucinated reference.

Leave a Reply to Thaddeus McIlroyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.