
While large language models are taking the blame for hallucinations, punctuation and all manner of language choices these days, turns of phrase were being tortured well before the arrival of LLMs.
Overlooking that fact seems to have led to a recent correction to a retraction – yes, you read that right – in Sage’s Journal of X-Ray Science and Technology. The original article, published in February 2022, was on detecting coronary artery plaques. It contained several known tortured phrases, synonyms and rephrasings — often awkward and nonsensical — substituted in text to evade plagiarism detectors.
For instance, the paper used the term “cardiovascular breakdown” for “heart failure”; “outward appearance acknowledgement” instead of “face recognition”; and “attractive resonance” for “magnetic resonance.”
The authors also described a pixel-wise analysis as “pixel astute investigation” and suggested that “treatment of weakness” — anemia, perhaps? — “would really bring down medical care costs while diminishing dreariness and mortality.” While some may describe the effects of anemia as dreary, “morbidity” is the term more typically used in the literature.
Computer scientist Guillaume Cabanac flagged the paper on PubPeer last March for containing tortured phrases, and in October the publisher retracted it. The notice originally read:
Sage was made aware of concerns raised on PubPeer regarding the potential use of tortured phrases in this article. Tortured phrases can indicate that a large language model was used to deter plagiarism checks from detecting unattributed text.
Cabanac, of the University of Toulouse in France, was quick with a followup comment on PubPeer: “This statement is inaccurate,” he wrote. “Tortured phrases are produced by ‘text spinners’: online websites running a basic algorithm that replaces some words with synonyms, using a thesaurus.”
Cabanac, Cyril Labbé, of University Grenoble Alpes and sleuth Alexander Magazinov identified and described tortured phrases in 2021, and the Problematic Paper Screener now contains more than 8,000 terms as potential fingerprints for papers that may be plagiarized.
“On the one hand, we were able to reproduce tortured phrases using the Spinbot spinner” in the 2022 paper, Cabanac continued on PubPeer. “On the other hand, we were unable to reproduce tortured phrases with LLMs.” And ChatGPT was released in November 2022 — months after the paper was published.
To his knowledge, LLMs don’t produce tortured phrases, Cabanac said in a follow-up email to us.
Sage issued a correction to its retraction in January, stating the original notice “incorrectly cited the origin of tortured phrases to the use of a large language model.”
By our count, more than 30 other retraction notices have had corrections issued to them, most commonly to fix errors in the text.
Cabanac pointed out that authors often defend the use of tortured phrases, saying the phrases “already exist in the scientific literature,” they represent “stylistic and linguistic preferences” across regions and researchers, and such replacements are effective for “minimizing direct textual matches with existing literature and avoiding unintentional plagiarism.”
With this “normalization of deviance,” Cabanac said, and with more papers getting published that contain tortured phrases, LLMs may one day generate tortured phrases themselves. But for now, tortured phrases are the domain of older technology.
Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on X or Bluesky, like us on Facebook, follow us on LinkedIn, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].
