Retraction Watch readers are likely familiar with the varied — and often unsatisfying — responses of journals to scientific sleuthing that uncovers potential problems with published images. Some editors take the issues seriously, even hiring staff to respond to allegations and vet manuscripts before publication. Some, however, take years to handle the allegations, or ignore them altogether.
Recently, STM’s Standards and Technology Committee (STEC) appointed a working group to look at these issues At a webinar last week, the group — including members from the American Chemical Society, Elsevier, Springer Nature, Taylor & Francis, Wiley, and other publishers — released a draft of their recommendations, which:
outline a structured approach to support editors and others applying image integrity screening as part of pre-publication quality control checks or post publication investigation of image and data integrity issues at scholarly journals, books, preprint servers, or data repositories. It provides principles and a three-tier classification for different types of image and data aberrations commonly detected in image integrity screens of figures in research papers and for a consideration of impact on the scholarly study; it also recommends actions journal editors may take to protect the scholarly record. The guidance covers data as rendered in figures in research papers or preprints including source data underlying these figures, where available. It does not include the reanalysis or forensic screening of raw data and large datasets (for example for statistical reporting).
Tier 1 is:
Image aberrations include unequivocal or possible aberrations restricted to a subset of image panels or the source data provided. The main conclusions of a study typically do not decisively depend on data in this category. Image irregularities can in principle be due to inadvertent mistakes in data processing or cosmetic image processing (‘beautification’) that subverts the proper interpretation of the data by the reader. There is no evidence for intent to mislead. Source data is readily available and explains the aberrations or possible aberrations. Aberrations have no impact on the reliability or interpretation of the data or the conclusions made and can be rectified by supplying properly processed versions of the same data or alternate data.
Tier 2:
Significant data “beautification” or undeclared image/data manipulations, which undermine objective data presentation, which are at odds with accepted scholarly practice, and which change the scientific conclusions for key data in a research paper. Intent to mislead cannot be excluded without formal further investigation.
Tier 3:
Severe image manipulation, with unequivocal evidence of obfuscation or fabrication and an intent to mislead, typically in more than one image/data panel, with a lack of compelling, authenticated source data. A level III paper will typically have multiple individual figures panels with level II or III aberrations.
We asked four longtime image sleuths — Elisabeth Bik, the pseudonymous “Claire Francis,” Mike Rossner of Image Data Integrity, and David Vaux — for their takes on the guidelines.
Vaux, a member of the board of directors at the Center For Scientific Integrity, our parent nonprofit organization, called the draft a “fabulous document. I can see a lot of consideration of needs and roles of sleuthers has already gone into it.”
“If only everyone (researchers, editors, reviewers, institutions) would sign up to such a declaration! (Obviously, getting them to implement and adhere to it would be another issue…),” added Vaux, a cell biologist at the Walter + Eliza Hall Institute of Medical Research in Melbourne.
“I am very happy to see these specific recommendations for how to handle image integrity concerns in manuscripts or post-publications,” Bik said. “These will be very helpful guidance for journal editors, whistleblowers, and authors. These recommendations will work very well to supplement COPE guidelines, which are often vague and not always useful in situations where e.g. authors do not reply.”
Francis was a bit more guarded, noting that the guidelines refer several times to the fact that researchers are responsible for the integrity of their images and that editors also have responsibilities. “Many will not like that,” Francis said.
Vaux had some specific comments about the language in the document, for example, a sentence that begins “Journal editors should analyze submitted figures and source data…”
“I agree but it is amazing that this has to be said,” Vaux said. “I remember having many arguments with Nature editors saying they should actually read the papers and look at the figures themselves, and not just read the title, authors’ names and institutions, and then leave all else to the peer reviewers.”
Ideally, Vaux said, “guidelines to authors will state that adherence is implicit in submitting the paper. Authors agree to provide all raw data, cooperate with investigations, respond to correspondence in a timely manner, or their paper can and will be retracted.”
Among other suggestions, Vaux said “a section should be added on interactions with national ombudsmen/offices for research integrity. (Assuming the country is civilised enough to have one).”
Bik said the recommendations are a useful update for COPE’s guidelines:
This STM document addresses particular scenarios that the older COPE guidelines have not covered well, and this will greatly improve how such situations can be handled. In particular, I am happy to see that the STM recommendations state that anonymously-raised concerns should be taken into consideration, as long as they appear to be legit. The recommendations also take into account the authors’ willingness to show source data and provide explanations, which is a good indication whether the authors made inadvertent mistakes or bad research practices, or whether they handled with an intention-to-mislead.
Bik continued:
It is also great that the recommendations take into account the situation that if a manuscript or paper has multiple low- or middle-level concerns, that that should be cumulative, i.e. that multiple level II image issues can be counted in a cumulative way, and could raise the whole paper to a level III. Another great recommendation is that a journal should post editorial notes in cases where severe problems are found and where the resolution is expected to take a long time. Finally, it is very helpful that the STM recommendations specifically mention that authors cannot replace images-of-concerns with a new set of images, if generated later.
She, too, had some suggestions for improvement:
The recommendations could be even better if the table describing the three different levels could also include a couple of specific examples that appear to be currently missing. For example, how should a figure with two overlapping immunohistochemistry panels be classified? Or two panels that overlap, but with a rotation or mirroring? These situations occur quite often, but the table does not mention how to classify them. If there is one such overlap without rotation, that could still be an honest error, but rotations, changes of aspect ratio, zoom, or mirroring could hint at an intention-to-mislead. I hope the STM Working Group can incorporate some specific scenarios in these guidelines, and I would be happy to help provide them with some often-encountered examples.
Rossner had some specific suggestions, too:
The document does not address how to accomplish the important step of image screening. This has been a contentious issue for nearly 20 years, and the working group may have been avoiding it. In my opinion, journals have an obligation to screen all images in all figures in all manuscripts accepted for publication. There are various ways to accomplish this systematic, universal screening, such as, visual screening in-house, visual screening outsourced, and algorithmic methods that are now coming online and need to be vetted for effectiveness by comparing them to visual screening. STM might consider creating recommendations for the screening process—what should be screened and how it can be screened.
Rossner also said the draft’s distinction between “images” and “data” is misplaced, and recommended using the term “image data.” He also disagreed with the guidelines’ suggestion that “Authors should be informed in advance if the editors plan to approach the corresponding authors’ institution.”
…I think many Research Integrity Officers may also disagree with it. Institutions often need to have the opportunity to sequester data before an inquiry. Informing the author before informing the institution might give the authors a chance to conceal data before they are sequestered.
Rossner had a number of other comments, including a critique of the three-tier classification system and whether it is a good idea to differentiate issues based on intent. We’ve made his responses available here.
Francis added:
I think that many editors don’t look beyond the trendy title and the institutional address/authors names.
Referring to the three-tiered classification suggested by the guidelines, Francis said:
I think that Level I vertical splicing of gel/blot lanes is Level III deletion of parts of an image (signal or background). I do not think that splicing (where splicing does not occur at the same place in all the panels being compared) is taken seriously enough. You can’t tell what is missing.
Like Retraction Watch? You can make a one-time tax-deductible contribution or a monthly tax-deductible donation to support our work, follow us on Twitter, like us on Facebook, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].
Not sure which exactly problem this all is solving.
The editors who ignored readers’ concerns will continue to do so.
The editors who allowed corrections for badly falsified stuff will continue to do so because “conclusions not affected”. They’ll just adopt the term “tier I aberration”.
Seminars and discussion panels on research integrity will continue to produce guidelines. These guidelines will continue to be evaded through loopholes or ignored altogether.
So what?
Concur. Editors could save all a lot of grief simply by: 1) Requiring a ‘prenuptial’ agreement for publication with the corresponding author; and 2) Enforcing it based upon the perceived strengths of any comments appearing on PubPeer. Any journal publicly doing so would instantly gain a market advantage in their credibility, simply because the community could having access to the same evidence. And you can be confident that its submitting authors would have bothered to look at the primary data. Much simpler, direct, and requiring less expertise in interpreting all these rules.
Note to anyone who is concerned about the utility of such a tiered framework: we have applied it successfully for over 6 years at EMBO Press and find that it leads to a more consistent assessment process and journal response (EMBO J (2015)34:2483-2485).
Thanks for the excellent comment and suggestions. Briefly, Elizabeth Bik’s suggestion for examples is a very good one and STM is releasing a series of 5 videos that analyze such examples. Each case is different and needs a bit of framing discussion – too much a priori shoeboxing can mislead (re. the example of a (partial) duplication+rotation/mirroring/aspect ratio as indication of intent, `I’d a priori agree, but recall at least one example where exactly this issue turned out to be unequivocally a mistake due to the authors getting lost in reams of poorly annotated data).
Mike Rossner on how to do the screening: crucial point and in fact one of the other main areas of focus for the working group (we developed set of basic criteria to evaluate computational processes and hope to jointly develop tools) – more on this separately.
re. David Vaux’s “Journal editors should analyze submitted figures and source data…” I agree but in practice it needs a secondary assessment unless the issues truly jump out. e.g. the systematic (manual) screen at EMBO Press takes 40-60’/manuscript.
Mike Rossner on contacting RIOs first: good point – this was in fact a core discussion at the CLUE workshop: https://www.biorxiv.org/content/10.1101/139170v1. We advocate parallel approaches in cases where there is clear evidence of intent but typically we follow the presumption of innocence principle (NB: same applies to PubPeer posts).
Regarding Mike Rossner’s outline of a decision tree for a more finely tuned response: absolutely and the flowcharts COPE posts can be useful. The intent of the three tier classification is a simple workable framework for systematic screening – cases can change in severity depending on the author and institutional responses etc. A decision tree is without doubt a good supplementary tool we can work on.
Source data = unmodified raw data: that depends on the data. For simple autoradiography to film sure (even straight printouts on paper are not raw data), but for other data (e.g. microscopy) the output is invariably modified, hence the definition. Thanks for the suggestion to add a section on national offices – Australia is more advance here than many others.
Thanks Bernd and congratulations on the STM document. Note however that Australia does not have a national ombudsman or ORI, but relies on the “self-regulation” model, in which conflicts of interest invariably arise. I hope that Australia follows the lead of European nations – of which Sweden was one of the most recent – to recognize that institutions cannot be trusted to govern research integrity on their own, but need an independent body to provide oversight, advice, ensure compliance, and collect data. Also, to clarify, when I commented on the statement “Journal editors should analyze submitted figures and source data…” I was just advocating that editors actually read manuscripts and try to interpret the data themselves – using their own eyes and brains – before they make decisions or use software or pass the manuscripts to the peer reviewers.
Fair comments, but I struggle to understand how “the authors getting lost in reams of poorly annotated data” can do anything other than call into question an entire study. Honest mistake or otherwise, it cannot be relied upon if that’s how disorganised the process was shown to have been.