A 2020 paper that claimed to find a link between microbial genomes in tissue and cancer has been retracted following an analysis that called the results into question.
The paper, “Microbiome analyses of blood and tissues suggest cancer diagnostic approach,” was published in March 2020 and has been cited 610 times, according to Clarivate’s Web of Science. It was retracted June 26. The study was also key to the formation of biotech start-up Micronoma, which did not immediately respond to our request for comment.
Rob Knight, corresponding author and researcher at the University of California San Diego, also did not immediately respond to our request for comment.
In October 2023, mBio, a journal from the American Society for Microbiology, published “Major data analysis errors invalidate cancer microbiome findings.” The paper pointed out several major flaws in the the earlier article by Knight’s group.
After downloading and analyzing the original data, “we found almost right away that the authors of the Nature paper had made some huge mistakes – that most of the bacteria they found simply weren’t there, or else were present in quantities that were 100s of times smaller than they reported. Oops,” Steven Salzberg, a researcher at Johns Hopkins in Baltimore, and corresponding author of the 2023 paper, told Retraction Watch in an email.
Salzberg and his colleagues found “some of these species were ‘nonsensical,’” he told us. For example, the Knight paper found that Hepandensovirus was the most important species to identify adrenocortical carcinoma. “Well, that’s a shrimp virus! Makes no sense as it doesn’t exist in humans,” he told us.
Knight’s group responded to the criticism in a follow-up paper, “Robustness of cancer microbiome signals over a broad range of methodological variation,” published in February 2024 in Oncogene. In it, they defended their original findings: “These extensive re-analyses and updated methods validate our original conclusion that cancer type-specific microbial signatures exist in TCGA, and show they are robust to methodology.”
The retraction notice cites Salzberg’s paper and the response from the authors. It reads:
The Editors have retracted this article. After publication, concerns about the robustness of specific microbial signatures reported as associated with cancer were brought to the attention of the Editors. The authors have provided responses to the issues in a separate publication.
Expert post-publication peer review of the issues raised and the authors’ responses has confirmed that some of the findings of the article are affected and the corresponding conclusions are no longer supported. All authors agree with this retraction.
Like Retraction Watch? You can make a tax-deductible contribution to support our work, subscribe to our free daily digest or paid weekly update, follow us on Twitter, like us on Facebook, or add us to your RSS reader. If you find a retraction that’s not in The Retraction Watch Database, you can let us know here. For comments or feedback, email us at [email protected].
Noticing:
https://www.nytimes.com/2023/08/25/health/cancer-microbes-debate.html?unlocked_article_code=1.2k0.rotR.wjzhApCf1DtD&smid=url-share
Sad how these papers with “major errors” are just retracted and everyone pretends like the peer-review process of the journal that published hasn’t been utterly discredited.
Garbage in garbage out… perhaps accountability by way of job loss and/or court of law?
Wow. It was cited over 600 times and start-up was founded based on it. Flawed papers can do a lot of damage if they are not retracted quickly. Keep up good work
Yet. the same authors doubled down on this study with this paper, published recently, and as mentioned above. Will this paper be retracted too? Seems like there is a lot more to this story.
Good to see that the scientific record is corrected in this case. Cancer microbiome studies seem to be particularly prone to report questionable results. This study also found shrimp parasite DNA in breast cancer specimen: https://pubpeer.com/publications/2D94E67C82602FEC7EDF6723C59DA5 . Maybe it is real ? /s
A reanalysis of Poore’s published normalization code (https://github.com/biocore/tcga/blob/master/jupyter_notebooks/TCGA%20Batch%20Correction%20–%20Final%20Analysis.ipynb) reveals 58 genomes that should have been discarded “prior” to voom normalization. Inclusion of these low read no information genomes will cause voom not to estimate the variances correctly. Hepandensovirus is among the 58 genomes that were marked for removal by Poore’s own normalization code. I have provided a reproducible R script that anyone can run to identify the 191 genomes that were marked for removal and the 58 that were not removed (https://github.com/aaronneilthomason/poore-58-missed-genomes). The oncogene article that analyzes the post-voom-snm prepared file is irrelevant considering this issue occurs prior to voom normalization.
A principal component analysis of Kraken-TCGA-Voom-SNM-Most-Stringent-Filtering-Data.csv will show that it still has batch effects present, and that it’s PCA plot does not match that of the 2020 paper. I provide that script here (https://github.com/aaronneilthomason/poore-58-missed-genomes/blob/main/Poore-batch-effects-still-present-in-stringent-file.txt). Why does it not match? This is interesting. I provide a different script to build a new voom-snm normalized data set from scratch that removes all the 191 no information genomes, that voom itself says should be removed. That script is here (https://github.com/aaronneilthomason/poore-58-missed-genomes/blob/main/Poore-reproduce-published-PCA-graphs.txt). A principal component analysis of this normalized data produces an identical PCA plot to the paper. Why the discrepancy? The Kraken-TCGA-Voom-SNM-Most-Stringent-Filtering-Data.csv file was produced incorrectly by including 58 no information genomes that voom said to removed. This likely caused voom to not estimate variances correctly. It may have even failed to converge, and honestly that may not have been reported by voom — that it had a problem. It would be interestingly to replicate what voom reports with the 58 no information genomes present. You can find the list of which genomes were removed / not removed with this script (https://github.com/aaronneilthomason/poore-58-missed-genomes/blob/main/Poore-normalization-misses-58-genomes.txt). I have provided all the code so that all parties can run these themselves and do further analysis.
I’ve added a mandatory citation notice to the project including for journal retractions.
Any reference to the discovery of 58 genomes that were not removed from the Poore et. al. 2020 study must cite the author Dr. Aaron Thomason and this repository.
https://github.com/aaronneilthomason/poore-58-missed-genomes