Springer Nature book on machine learning is full of made-up citations

Would you pay $169 for an introductory ebook on machine learning with citations that appear to be made up?

If not, you might want to pass on purchasing Mastering Machine Learning: From Basics to Advanced, published by Springer Nature in April.

Based on a tip from a reader, we checked 18 of the 46 citations in the book. Two-thirds of them either did not exist or had substantial errors. And three researchers cited in the book confirmed the works they supposedly authored were fake or the citation contained substantial errors.

“We wrote this paper and it was not formally published,” said Yehuda Dar, a computer scientist at Ben-Gurion University of the Negev, whose work was cited in the book. “It is an arXiv preprint.” The citation incorrectly states the paper appeared in IEEE Signal Processing Magazine.

Aaron Courville, a professor of computer science at Université de Montréal and coauthor on the book Deep Learning, was correctly cited for the text itself, but for a section that “doesn’t seem to exist,” he said. “Certainly not at pages 194-201.” And Dimitris Kalles of Hellenic Open University in Greece also confirmed he did not write a cited work listing him as the author.

The researcher who emailed us, and asked to remain anonymous, had received an alert from Google Scholar about the book, which cited him. While his name appeared on multiple citations, the cited works do not exist.

Nonexistent and error-prone citations are a hallmark of text generated by large language models like ChatGPT. These models don’t search literature databases for published papers like a human author would. Instead, they generate content based on training data and prompts. So LLM-generated citations might look legitimate, but the content of the citations might be fabricated.

The book’s author, Govindakumar Madhavan, asked for an additional “week or two” to fully respond to our request for comment. He did not answer our questions asking if he used an LLM to generate text for the book. However, he told us, “reliably determining whether content (or an issue) is AI generated remains a challenge, as even human-written text can appear ‘AI-like.’ This challenge is only expected to grow, as LLMs … continue to advance in fluency and sophistication.”

According to his bio in the book, Madhavan is the founder and CEO of SeaportAi and author of about 40 video courses and 10 books. The 257-page book includes a section on ChatGPT that states: “the technology raises important ethical questions about the use and misuse of AI-generated text.”

Springer Nature provides policies and guidance about the use of AI to its authors, Felicitas Behrendt, senior communications manager for books at the publisher, told us by email. “Whilst we recognise that authors may use LLMs, we emphasise that any submission must be undertaken with full human oversight, and any AI use beyond basic copy editing must be declared.”

Mastering Machine Learning contains no such declaration. When asked about the potential use of AI in the work, Behrendt told us: “We are aware of the text and are currently looking into it.” She did not comment on efforts taken during Springer Nature’s editorial process to ensure its AI policies are followed.

LLM-generated citations were at the center of controversies around Robert F. Kennedy Jr.’s “Make America Healthy Again” report and a CDC presentation on the vaccine preservative thimerosal. At Retraction Watch, our cofounders were once cited in a made-up reference in an Australian government report on research integrity. We’ve seen fake citations fell research articles, and our list of papers with evidence of undisclosed ChatGPT use has grown long and almost certainly represents only a fraction of those that actually do.

The same day Behrendt replied to our query, Springer Nature published a post on its blog titled, “Research integrity in books: Prevention by balancing human oversight and AI tools.”

“All book manuscripts are initially assessed by an in-house editor who decides whether to forward the submission to further review,” Deidre Hudson Reuss, senior content marketing manager at the company, wrote. “The reviewers – subject matter experts – evaluate the manuscript’s quality and originality, to ensure its validity and that it meets the highest integrity and ethics standards.”

Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on X or Bluesky, like us on Facebook, follow us on LinkedIn, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].

Processing…

Success! You're on the list.

Whoops! There was an error and we couldn't process your subscription. Please reload the page and try again.

13 thoughts on “Springer Nature book on machine learning is full of made-up citations”

I wouldn’t expect an answer from Springer any time soon. I reported a similar case of a book chapter which contained hallucinated references, including one which it attributed to me which doesn’t match anything I’ve actually written. It’s been 4 months now and I’m still waiting for their investigation to reach a conclusion.

Of course the author is at fault. But let’s focus on the publisher.

ds says:

July 1, 2025 at 8:17 am

The publisher charges huge fees to supposedly ensure quality. If no one looked at this book during the whole process (not even the author lol) then it might as well have been self published.

Reply
1. Mark Crowley says:
  
  July 1, 2025 at 9:40 am
  
  Exactly! What is the actual point of the publisher at all if they don’t check for this kind of thing?
  
  Reply

What scientific book only has 46 references? I’ve never seen one.
The author is at fault yes but where’s the editor, the reviewers?

It takes 2 to tango. The editor/publisher are equally culpable as is the author, since without their “approval” the fabricated manuscript would not have been published.

Just as funding and potential conflicts of interest must be reported in scientific publication, I believe editors/publishers should require that ANY use whatsoever of LLM or so-called AI be reported, including the software version and the specific nature of its application in the course of the research and manuscript preparation. Personally, I find myself increasingly favoring sources such as 404Media that are entirely the product of real human beings. Over my career. every one of my several million published words was generated by me, and I intend to keep it that way. I am no Luddite, but as a technologist I think we must always weigh costs and benefits. Frankly, AI is causing far more problems than it purports to solves.

Guititio says:

July 1, 2025 at 12:48 pm

Sadly, and as noted in Friday’s RW Daily, some prominent authors are now suggesting that disclosure of Gen-AI use should be voluntary rather than mandatory: https://journals.sagepub.com/doi/full/10.1177/17470161251345499

Reply

It’s a shame that the bar is lower for publishing textbooks than for the students that use them. Making any of these “mistakes” in a classroom would be probation-worthy.

Checking references is such an obvious & easy thing to do. When students submitted papers for a class I taught as an Adjunct, I often spot-checked references, particularly when I suspected plagiarism.

While clearly the author is at fault regarding this book, it’s amazing that the editors of Springer apparently did little if any review before it was published. As noted by DS, that there are only 46 references should have been a red flag for the editor to read the book more carefully. Considering the made-up references, it’s likely that the text has much equally made-up ideas.

Just downloaded a copy in case it disappears. The text is very imprecise in what I just read. The term “artificial intelligence” was not coined at a conference by John McCarthy, but McCarthy and 3 others applied for funding for that conference using the term “AI”. Turing did not invent the “Turing test”, he called it the “imitation game”. *We* have taken to calling it the “Turing test”. And indeed, very sparsely referenced. It is being sold for 230 €! The Wikipedia pages are better and cheaper.

There is no way that response from the author wasn’t copied and pasted from an LLM.

Interesting Google Scholar profile from the publishers link, the Pubmed link gives nothing. Just this book and each chapter are listed, same as the Springer internal list of publications. Is this person even real?
https://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Govindakumar+Madhavan%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en
The book does list his affiliations as with SeaportAI in Chennai, indicating his name as AM Govind Kuma or simply as Govind Kumar (his email address at SeaportAI is also Govind Kumar.

Peter Vamplew says:

June 30, 2025 at 9:13 pm

I wouldn’t expect an answer from Springer any time soon. I reported a similar case of a book chapter which contained hallucinated references, including one which it attributed to me which doesn’t match anything I’ve actually written. It’s been 4 months now and I’m still waiting for their investigation to reach a conclusion.

Thaddeus McIlroy says:

July 1, 2025 at 3:56 am

Of course the author is at fault. But let’s focus on the publisher.

1. ds says:
  
  July 1, 2025 at 8:17 am
  
  The publisher charges huge fees to supposedly ensure quality. If no one looked at this book during the whole process (not even the author lol) then it might as well have been self published.
  
  1. Mark Crowley says:
    
    July 1, 2025 at 9:40 am
    
    Exactly! What is the actual point of the publisher at all if they don’t check for this kind of thing?
    
DS says:

July 1, 2025 at 8:14 am

What scientific book only has 46 references? I’ve never seen one.
The author is at fault yes but where’s the editor, the reviewers?

Andy Patterson says:

July 1, 2025 at 8:19 am

It takes 2 to tango. The editor/publisher are equally culpable as is the author, since without their “approval” the fabricated manuscript would not have been published.

ProfLarry says:

July 1, 2025 at 8:42 am

Just as funding and potential conflicts of interest must be reported in scientific publication, I believe editors/publishers should require that ANY use whatsoever of LLM or so-called AI be reported, including the software version and the specific nature of its application in the course of the research and manuscript preparation. Personally, I find myself increasingly favoring sources such as 404Media that are entirely the product of real human beings. Over my career. every one of my several million published words was generated by me, and I intend to keep it that way. I am no Luddite, but as a technologist I think we must always weigh costs and benefits. Frankly, AI is causing far more problems than it purports to solves.

1. Guititio says:
  
  July 1, 2025 at 12:48 pm
  
  Sadly, and as noted in Friday’s RW Daily, some prominent authors are now suggesting that disclosure of Gen-AI use should be voluntary rather than mandatory: https://journals.sagepub.com/doi/full/10.1177/17470161251345499
  
Scientist says:

July 1, 2025 at 9:06 am

It’s a shame that the bar is lower for publishing textbooks than for the students that use them. Making any of these “mistakes” in a classroom would be probation-worthy.

DTX says:

July 1, 2025 at 9:52 am

Checking references is such an obvious & easy thing to do. When students submitted papers for a class I taught as an Adjunct, I often spot-checked references, particularly when I suspected plagiarism.

While clearly the author is at fault regarding this book, it’s amazing that the editors of Springer apparently did little if any review before it was published. As noted by DS, that there are only 46 references should have been a red flag for the editor to read the book more carefully. Considering the made-up references, it’s likely that the text has much equally made-up ideas.

Debora Weber-Wulff says:

July 1, 2025 at 11:26 am

Just downloaded a copy in case it disappears. The text is very imprecise in what I just read. The term “artificial intelligence” was not coined at a conference by John McCarthy, but McCarthy and 3 others applied for funding for that conference using the term “AI”. Turing did not invent the “Turing test”, he called it the “imitation game”. *We* have taken to calling it the “Turing test”. And indeed, very sparsely referenced. It is being sold for 230 €! The Wikipedia pages are better and cheaper.

Mary says:

July 1, 2025 at 1:26 pm

There is no way that response from the author wasn’t copied and pasted from an LLM.

Warrick says:

July 1, 2025 at 4:33 pm

Interesting Google Scholar profile from the publishers link, the Pubmed link gives nothing. Just this book and each chapter are listed, same as the Springer internal list of publications. Is this person even real?
https://scholar.google.co.uk/scholar?as_q=&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=%22Govindakumar+Madhavan%22&as_publication=&as_ylo=&as_yhi=&as_allsubj=all&hl=en
The book does list his affiliations as with SeaportAI in Chennai, indicating his name as AM Govind Kuma or simply as Govind Kumar (his email address at SeaportAI is also Govind Kumar.

Springer Nature book on machine learning is full of made-up citations

Related

13 thoughts on “Springer Nature book on machine learning is full of made-up citations”

Leave a ReplyCancel reply

Share this:

Related

13 thoughts on “Springer Nature book on machine learning is full of made-up citations”

Leave a ReplyCancel reply