Caught by a reviewer: A plagiarizing deep learning paper lingers

Last May, Devrim Çavuşoğlu, an engineer at Turkish software company OBSS, was looking at feedback from a conference reviewer of a paper he and his colleagues had submitted. One comment stood out to him: The reviewer had noticed a resemblance between Çavuşoğlu’s work and another paper accepted to a different conference on computational linguistics. 

When Çavuşoğlu first skimmed through the other paper, he came across some sections containing an uncanny resemblance to his own ideas. “I thought, it’s like I wrote that,” he recalled. “How could it be so similar, did we think about the same thing?” 

He checked the accompanying source code and found the authors of the other paper seemed to have directly copied and built upon his own publicly released code without any attribution – a violation of the license connected to the work. “I was shocked, to be honest,” Çavuşoğlu told Retraction Watch.

In July 2021, Çavuşoğlu and his team had publicly released their software, jury, on the code-hosting platform GitHub. The software, designed to assess the quality of natural language generation models, was licensed to allow others to use it with attribution. They decided to submit a paper describing their work for a conference in early 2023 and received comments from the conference reviewer a few months later in May. 

The plagiarized paper appeared in October 2022 — in the period between when Çavuşoğlu and his team released the code and submitted the paper about the work. They were not aware of the other paper, according to Çavuşoğlu. “NLG-Metricverse: An End-to-End Library for Evaluating Natural Language Generation,” was published by five authors from the University of Bologna in Italy, and describes software the authors claim to have created. They make reference to Çavuşoğlu’s project in the paper, but at the time of publishing, failed to disclose they had used his existing code in their work. 

Çavuşoğlu flagged the similarities with the conference committee that had accepted the paper. It took eight months and several follow ups from Çavuşoğlu for the committee to decide to retract. The notice on the source code appeared just last week on May 21, stating that the retraction was “per publication chair request.”

Last August, Çavuşoğlu emailed the committee for COLING 2022, the conference that had accepted the other paper, as well as the Association for Computational Linguistics (ACL), which runs the conference and publishes its proceedings.His letter described the unreferenced use of jury and identifies nine instances of suspected plagiarism, including identical segments of code and segments with only slight changes. He received a response in October, two months later, from one of the COLING 2022 committee members, Leo Wanner. Wanner wrote that the group was looking into the issue. 

The most irritating part, Çavuşoğlu told Retraction Watch, is that the authors had written about jury in the paper, referring to it as one of the other available resources, and had even compared attributes of their own project with jury in a table. “They copied our source code and our system architecture,” he said, but there was no mention that they had built their software on top of jury

He argues in his letter that the references to jury expose the plagiarism as intentional, since the authors were aware of the work. 

While improvements to code are typical in open-source software, failing to credit the original work still amounts to plagiarism, Çavuşoğlu said. If there had been appropriate attribution – which usually involves adding a license file which credits the original authors to the repository of files – “there would be no problems in terms of software use,” Çavuşoğlu said. “But they didn’t include it.” 

The authors of the plagiarizing paper have not replied to requests for comment.

In December 2023 – several months after Çavuşoğlu had first flagged the similarities – the files on GitHub were updated to include an acknowledgement to jury

Last month, Wannerwrote to Çavuşoğlu that “indeed, in the paper you drew our attention to, required references to your work are missing and that this constitutes a case of plagiarism.” He added that the publication chairs for the conference had asked the ACL Anthology, a library of the work published at ACL conferences, to retract the paper “quite a long while ago.” 

“Contacting the ACL authorities was not easy,” Çavuşoğlu said, noting the length of time it took to have the plagiarism verdict. “I clawed my way to what we have now.” Wanner did not respond to requests for comment. 

The PDF of the paper associated with the code is watermarked as retracted, and a note was included on the GitHub page for the work last week – about eight months after Çavuşoğlu’s first email to the committee.

There is no notice explaining the reason for the retraction. Matt Post, who manages the library of conference papers for ACL, said updates are processed in monthly batches, and this one would be live after the end of May.

Like Retraction Watch? You can make a tax-deductible contribution to support our work, subscribe to our free daily digest or paid weekly updatefollow us on Twitter, like us on Facebook, or add us to your RSS reader. If you find a retraction that’s not in The Retraction Watch Database, you can let us know here. For comments or feedback, email us at [email protected].

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.