Harold “Skip” Garner has worn many hats over the course of his career, including plasma physicist, biologist, and administrator. One of his interests is plagiarism and duplication the scientific literature, and he and colleagues developed a tool called eTBLAST that compares text passages to what has already been published to flag potential overlap.
A new paper in Research Integrity and Peer Review by Garner and colleagues estimates “the prevalence of text overlap in biomedical conference abstracts.” We asked Garner some questions about the paper.
Retraction Watch (RW): You used a “text similarity engine” called eTBLAST. What is eTBLAST, and what does it do?
Howard Garner (HG): eTBLAST is a search engine that quantifies the amount of similarity between a text query and a given collection of text being searched; in this case Medline abstracts or collections of abstracts. It works by submitting, for example, a paragraph, and then it compares that to other paragraphs, for example, abstracts. eTBLAST, created by Heliotext, LLC, is available at Etblast.org, for searching Medline/PubMed for free. It has thousands of users a day, scientists use it to find references and collaborators, patent attorneys use it to find Intellectual Property information, and those journals that cannot afford paid services use it to check submitted abstracts for possible ethical violations.
RW: Tell us about the dataset of abstracts that you used.
HG: We first collected as many abstracts as we could from professional/scientific meetings. We did this by “scraping” (downloading) collections of abstracts from those meetings that posted them on the web. We then used eTBLAST to compare the similarity between each of those downloaded abstracts and all other abstracts from a given conference and previous years of that conference. We also searched each abstract we downloaded against Medline/PubMed.
RW: How often did you find plagiarism in conference abstracts? How did that rate compare to what has been found in the published literature?
HG: We found “highly similar abstracts with no overlapping authors” (putative plagiarism) at a rate of about 0.5% in this study of scientific meetings. We previously presented the rate in Medline/Pubmed in 2008 to be 0.05%. Since our original publication many journals instituted a check for inappropriate submissions, so the current rate of plagiarism and duplicate publication is much lower. This is a very serious issue at scientific meetings; it is 10 times worse than in peer-reviewed publications. Please note that what we find is “highly similar abstracts with no overlapping authors”, but ultimately conference organizers, editors or ethics committees must review this information and make the ultimate determination of “plagiarism”.
RW: What about duplication, aka “self-plagiarism?”
HG: We found duplication within the same meeting to be 2% and to be 3% between meetings of the same conference. In comparison, the rate of duplication publication in Medline/Pubmed in 2008 was 1.35%. Inappropriate self-duplication is most often seen at a given conference, year-to-year, when someone submits the same abstract every year. Another case is when people submit an abstract to a meeting that was the same or nearly the same (no new information) as was published in a manuscript in previous years, e.g. was very similar to abstracts in Medline/PubMed. Another variant is “salami-slicing” where several, highly similar abstracts, with slight difference in focus are submitted to a conference by the same collection of authors, but with different presenting authors.
RW: Should conference organizers be concerned about these figures?
HG: Yes, conference organizers have ignored this, much like journals did until about 13 years ago, when editors realized they had a fiduciary responsibility to those that read their journals to ensure that they contained high quality, ethically sound publications. I know it is possible for conference organizers to improve their meetings, for I have worked with a European conference organizer in the past. In addition to helping improve overall conference integrity, these tools can also be used to help automate the conference agenda, and help organizers differentiate between those submissions that should be posters, and those that should be oral or invited presentations.
RW: What could conference organizers do to prevent cases of plagiarism and duplication?
HG: I would recommend that conference organizers: 1) inform submitters (and have them affirm) that their submissions adhere to the highest ethical norms; 2) process all submissions through a “checking” system before they are accepted, and then take appropriate action. I would further suggest that the Office of Research Integrity (which supported this project) consider developing guidance for conference organizers and review these types of questionable research integrity activities/violations.
Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on Twitter, like us on Facebook, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].
Thanks for this story. I’m wondering if Dr. Garner (or others) could expand on how “these tools can also be used to help automate the conference agenda”–I would be very interested to learn more about that.
Not all conference organizers have ignored this issue. A scientific organization I work with uses a commercial plagiarism checking software to check abstracts submitted for an annual meeting. We have seen very little if any plagiarism with “non-overlapping authors.” In my experience, all clear cases have been self-plagiarism.
If the abstract has been presented before at the same meeting, or if it was already published in a journal well before the submission, we would usually reject the abstract. However, there are some gray areas. What should be done in the following cases?
1) Abstract was submitted to a conference while manuscript was under review at a journal; paper was accepted/published before the meeting is held.
2) Abstract is 80% identical in language to the same group’s abstract at last year’s meeting, but now with key new results.
3) Abstract has been presented recently at another conference (i.e. another organization’s meeting)…keep in mind that some conferences do not make abstracts public, so a negative ruling effectively “penalizes” those who are willing to have their abstracts made public.
4) What about now? The COVID pandemic halted or slowed research for many labs for most of 2020 and for some into this year. Should we relax standards somewhat in light of these delays and the relative lack of new data? Or cancel meetings since there might be more repetition than usual?
Maybe it’s appropriate to be more “lenient” with abstracts than with publications. Abstracts are similar to pre-prints. You show them publicly or to different groups of colleagues to get feedback on ongoing work before it is “crystallized” into a publication. Is it wrong, does it hurt anyone, does it hurt science to share the same abstract with two or three different meetings in a year as long as it has not been previously published or shown at the same meeting?
I’m thinking basically the same thing.
I have ‘gone on tour’ with a set of results and presented the same abstract at a couple of different conferences, eg one national, one international. I have also published a paper and then gone and presented the data at a conference to increase the paper’s impact. Usually I conference first but sometimes things don’t work out that way.
If someone is just copying and pasting someone else’s research and presenting it as their own, that’s clearly wrong. But if selling your prior published work in a conference hall is wrong, well we’d better purge every senior scientist whose ever stood at a lectern because that’s basically all they do.
Kenneth,
A similarity score matrix can be computed for all abstracts to a meeting, clustered and then can be arranged into sessions. It is a bit more complicated than that which makes them work better, but in essence that works.
We descuss methods a lot in the paper that address some of your questions.
Thanks, Skip!
This appears to be very much a particular view of conferences current within one particular branch of scholarship. I’m a mathematician: I’ve been to many conferences, spoken at some, and been on the organising or scientific committee for several. The primary mode of publication in mathematics remains traditional peer-reviewed journals, supported by use of Arxiv or other preint servers. Some conferences issue peer-reviewed proceedings, either in journal special issues, in books series, or in stand-alone publications, published either as books with academic publishers, or one-off. Some conferences do not issue proceedings at all, or, increasingly commonly, record their talks and post them on the internet.
I have tried to encourage my co-organisers to be clear, and to make it clear to speakers, what sort of talk they are being asked to give. I’m perfectly happy to see talks reviewing the field, summarising their own or other peoples’ work, speculating on open problems or likely avenues for exploration, announcing new results to be detailed elsewhere, explaining applications of known results to new practical problems, and many other things that are of value to their colleagues in the academic and non-academic worlds — provided that they are well-planned, well-delivered, and that they are doing what fits in with the conference plan. I have to say that I personally am not always a fan of people standing up and reading out their latest paper or preprint: that’s not always the best use of the time of the majority of the people in the audience. I would much rather they referred to the preprint or publication and explained to the audience what was important and novel about it, why they did it, and where it might be taking the field — the minority who really want to know the details can read it later, or indeed at the same time. If a speaker’s talk overlaps with, or even repeats completely, a talk they themselves gave at another conference: well, that may not be optimal, but it may still still be suitable, depending on the nature of the meeting, the audience and the purpose of the talk.
Of course a conference talk, just like anything else put out in public by an academic under their own name is subject to the normal rules of academic conduct as regards plagiarism, the giving of due credit, integrity, honesty, accuracy and so forth. And if it is intended to be formally published, then it is subject to the further rules of formal publication, such as novelty and originality. But to treat every conference talk of every kind as if it were a formal publication would prohibit many of the sorts of conference talks that we find valuable: indeed, would call into question the whole purpose of holding a conference at all.
I don’t understand what this refers to:
“conference organizers have ignored this, much like journals did until about 13 years ago, when editors realized they had a fiduciary responsibility to those that read their journals to ensure that they contained high quality, ethically sound publications.”
It may not have been the case in some subjects, but all the journals (not to mention conferences) I’ve been involved with have recognised such a duty going back considerably further than 2008.
In 2008, we published Errami M and Garner HR, A tale of two citations, Nature 2008 Jan 24;452(7177):397-9 PMID: 18216832; and in 2009 we published and Tara C. Long, Mounir Errami, Angela C. George, Zhaohui Sun & Harold R. Garner, Scientific Integrity: Responding to Possible Plagiarism, Science, Vol. 323, 1293-1294, March 6, 2009, PMID: 19265004. We received a tremendous amount of backlash from many, many editors and publishers of journals from the top down that said computer-based checking was not needed, as their journals were free of any plagiarism or duplicate publication issues. However, when we showed them examples, they started to employ checking upon submission, which is commonplace now. Our study only focused on PubMed/Medline, at the time, as I recall it only included about 15,000 journals, so it is limited to only those journals indexed therein.
I’m sure that was a very useful tool, and I’m equally sure that some journal editors needed it more than they thought they did. But that’s not really the same as saying that it was in 2008 that “editors realized they had a fiduciary responsibility”. It’s saying that some editors discovered they weren’t doing as well as they thought they were at delivering on the responsibility that they had accepted long since. The first case I recall of a mathematical journal retracting a paper for plagiarism was in the 1980s. I don’t know how it was discovered, but the editors of the journal at the time were perfectly well aware of their responsibility. As a reviewer for ZBmath wrote at the time: “Plagiarism in mathematics, once relatively uncommon, is nowadays more and more spreading.”
Conference abstracts mean very different things in different fields, something the author of this study seems shamefully ignorant of.
Yes plagiarizing others is beyond the pale. But the self-plagiarism charge is silly.
In my field, the abstract is a rough promise of what you’ll be talking about, to be fleshed out with whatever results are new and exciting by the time the meeting actually occurs. Abstracts do NOT count towards tenure or grant renewal in my field. NIGMS does not even want the students’ conference abstracts listed on training grant applications. The short papers published in some conference’s proceedings are not given much weight – most people would be loathe to bury exciting research there that could be published in a higher-profile journal. In short, the point of our conferences is not for accruing publications – it is to share and discuss recent results.
Truth is, I’ve never understood the publication of abstracts at all – it seems rather pointless.
I thought I would quote the proceedings policy of a conference ude to take place this year. It seems to me a sensible way of combining the “two cultures” issue that’s surfacing here. It will be noted that even on the “proceedings track”, articles are re-reviewed before formal publication.
Two types of submissions are accepted, both of which will be reviewed using the same standards:
Proceedings Track. Original contributions of high-quality work consisting of an extended abstract, up to 12 pages, that provides evidence of results of genuine interest, and with enough detail to allow the program committee to assess the merits of the work. Submission of work-in-progress is encouraged, but it must be more substantial than a research proposal. Accepted submissions in this track will be invited for publication in a proceedings volume.
Non-Proceedings Track. Submissions presenting high-quality work submitted or published elsewhere, or for which publication in the proceedings is not desired by the authors, may be submitted to this track, provided the work is recent and relevant to the conference. The work may be of any length, but the program committee members may only look at the first 3 pages of the submission, so you should ensure that these pages contain sufficient evidence of the quality and rigour of your work.
Papers in the two tracks will be reviewed against the same standards of quality. Since ACT is an interdisciplinary conference, we use two tracks to accommodate the publishing conventions of different disciplines. For example, those from a Computer Science background may prefer the Proceedings Track, while those from a Mathematics, Physics or other background may prefer the Non-Proceedings Track. However, authors from any background are free to choose the track that they prefer, and submissions may be moved from the Proceedings Track to the Non-Proceedings Track at any time at the request of the authors.
What about the same abstract at two conferences in different languages? E.g. in my field & country, conferences are held in French, and attract very different attendees than the bigger, international and/or English-language conferences. Also, the abstracts are very short, and do not count as papers in the field.
Such abstracts will count as self-plagiarism but I would argue, as some prevoous comments, that it is relevant to present the same results at both conferences.