What’s in a picture? Two decades of image manipulation awareness and action

This year marks the 20th anniversary of the publication of “What’s in a picture?  The temptation of image manipulation,” in which I described the problem of image manipulation in biomedical research. 

Two decades later, much has changed.  I am reassured by the heightened awareness of this issue and the numerous efforts to address it by various stakeholders in the publication process, but I am disappointed that image manipulation remains such an extensive problem in the biomedical literature. (Note: I use the term “image manipulation” throughout this piece as a generic term to refer to both image manipulation (e.g., copy/paste, erasure, splicing, etc.) and image duplication.)

In 2002, I was the managing editor of The Journal of Cell Biology (JCB), and STM journals were transitioning away from paper submissions.  We had just implemented online manuscript submission, and authors often sent figure files in the wrong file format.  One day, I assisted an author by reformatting some figure files.  In one of the Western blot image panels, I noticed sharp lines around some of the bands, indicating they had either been copied and pasted into the image or the intensity of those bands had been selectively altered.  

Mike Rossner

I vividly recall my reaction, which was, “Oh shit, this is going to be a problem.  We’re going to have to do something about this.”  With the blessing of then editor-in-chief, Ira Mellman, I immediately instituted a policy for the journal to examine all figure files of all accepted manuscripts for evidence of manipulation before they could be published.  We began using simple techniques, which I developed along with three of my colleagues at the time, Rob O’Donnell, Erinn Grady, and Laura Smith.  The approach involved visual inspection of each image panel using adjustments of brightness and contrast in Photoshop, to enhance visualization of background elements.  

Such adjustments can reveal inconsistencies in an image (such as sharp lines in what should be a uniform background) that are clues to manipulation such as splicing, or consistencies in an image (such as a matching pattern of background spots) that are clues to duplication.  By this point, it was clear to me that image manipulation was an issue all journals could, and thus should, address within their production workflows.  There were personnel costs involved, but I believed the costs were worth it to protect the scientific record.

As far as I know, JCB was the first scholarly journal to implement systematic image screening.  Other people had raised concerns about the potential problem of digital image manipulation in biomedical research before then.  But, in 2002, no one had yet done anything to detect it systematically and prevent manipulated images from entering the published literature.  

The start of JCB’s image screening process was announced in an editorial published in September 2002.  That editorial also contained my first call for principal investigators to closely examine the image data in prepared manuscripts from their labs before submission to a journal, as another means of preventing image falsification.

After about six months of observing what authors were doing with their images, we came up with the first journal guidelines for handling digital images.  These were published in July 2003.  At a meeting of the JCB editors in December of that year, I discussed with Ken Yamada ― a PI at the NIH, member of the NIH ethics committee, and a JCB editor ― ways we could share the image screening policies and practices we had implemented at the journal.  Joan Schwartz, then assistant director for ethics and education at the NIH, invited us to write an article for the NIH Catalyst, an intramural NIH publication.  The resulting “What’s in a picture?” article was published in May 2004 and reprinted verbatim in JCB two months later.  

Shortly after “What’s in a picture?” was published, I spoke at a conference sponsored by the Office of Research Integrity (ORI) about research on research integrity.  Attendees debated the prevalence of research misconduct, because no one had investigated it in a systematic way.  Survey data and some back-of-the-napkin calculations at the time estimated between one in 100 and one in 100,000 researchers had committed some form of research misconduct.  Based on the numbers of cases resulting from the systematic image screening at JCB, I believed the incidence of image manipulation alone was on the order of magnitude at the high end of that range; one in 100 accepted manuscripts at the journal contained evidence of image manipulation that affected the interpretation of the data.

That talk led to an invitation to speak to the Division of Investigative Oversight at ORI in January 2005.  The meeting concluded with a consensus that image manipulation was a serious problem that all biomedical research journals were eventually going to have to address to prevent fraudulent data from entering the published literature.  It also concluded with a question from Chris Pascal, then the director of ORI, asking how we were going to get other journals to start addressing it.  The collective answer was that it was going to take a serious case of image manipulation at a high-profile journal to get other journals to take note.  Then the Hwang case broke.

In 2004 and 2005, Woo Suk Hwang and his colleagues published two papers in the journal Science describing the production of human embryonic stem cells.  The work came under scrutiny after a whistleblower in the lab alerted the Korean media to potential ethical violations and falsification of data, including image manipulation, in the articles.  After an investigation by the authors’ institution, both papers were retracted in January 2006.

The Hwang case jolted the biomedical research and publishing communities, and led to  publicity for the image screening program at JCB, which, I believe, would have detected at least some of the manipulation in the Science articles before publication.  Over the next several years, my colleagues and I at Rockefeller University Press trained roughly 25 other publishers in our visual screening techniques, and many of them – including the publishers of Science – began systematic image screening before publication.

But other stakeholders in the publication process – researchers, funders, institutions, most publishers – remained naïve about the extent of the problem and their susceptibility to this form of misconduct.  It seemed like they were willing to let what they thought was just a small problem continue, because no-one was calling them out publicly.

In 2010, journal editors began receiving emails from someone with the pseudonym “Clare Francis,” who appeared to be scouring the published literature for examples of image manipulation.  In the early days, many of the allegations were bizarre and lacked merit (although some did), which led journal editors to disregard allegations from anonymous whistleblowers in general.  This unfortunate attitude caused a setback in the effort to get journals to address the issue of image manipulation.

In July 2012, Paul Brookes of the University of Rochester started a personal blog called Science-Fraud.com.  He anonymously posted allegations of image manipulation that he and some helpers detected.  Within six months, the site was shut down due to threats of litigation.

However, during that same six-month period, a neuroscientist named Brandon Stell founded the website PubPeer.com as a public forum for post-publication peer review.  This site was different, in that members of the public could post comments about published articles, and they could do so anonymously.  The right of commenters to maintain that anonymity has since been upheld in court.  PubPeer has flourished in the intervening dozen years, and a substantial proportion of the comments on the site involve allegations of image manipulation in the biomedical literature.

One of the most prominent and prolific posters on the site is not anonymous.  Elisabeth Bik has made allegations of image manipulation in thousands of articles in her posts on PubPeer.  She rose to prominence in 2016 with a landmark paper on the frequency of image duplication in published biomedical research articles, which represented a tour-de-force analysis of 20,000 articles.  The frequency that she detected for image duplications that were unlikely to be due to clerical error (2% of articles screened) was in keeping with the rate of manipulation that affected the interpretation of the data at JCB (1% of articles screened).

Bik has inspired the establishment of a community of so-called image sleuths who scour the literature for examples of image manipulation in published articles.  The sheer number of their postings has highlighted the extent of the problem.  The fact that these postings are public means stakeholders are now more concerned about damage to their reputations, making them more likely to take action.   How the sleuths choose which papers to scrutinize is unclear, but that mystery keeps all stakeholders on their toes about whether they will be called out next.

As sleuths work through older publications, they have demonstrated that manipulation of digital images was occurring years before I realized it at JCB in 2002.  As an example, a misconduct investigation into Marc Tessier-Lavigne at Stanford University in 2023 led to the retraction of a Cell paper from 1999 and two Science papers from 2001 due to image manipulation.

As far as I am aware, the first commercially available software for detecting image manipulation was released in 2007.  That software did not prove effective.

Within the last few years, however, several other companies have released commercially available algorithms for detecting image manipulation.  One that distinguished itself was ImageTwin, which was the first to develop functionality to screen for duplications against a database of millions of images from previously published articles.  

None of these software developers has provided information on how effective their algorithm is relative to visual screening by someone with a trained eye for detecting image manipulation.  I have written about the need for transparency about the effectiveness of these algorithms, so that users are informed about their capabilities and limitations.  Those I have tested are reasonably good at detecting duplications in micrograph images, but their effectiveness at detecting duplications in blot images is more limited.  Their ability to detect manipulations such as copy/paste, erasure, or splicing is disappointing in their current state of development.  Any algorithmic output must be evaluated by a trained and knowledgeable human to confirm its findings.  

I strongly encourage the use of these algorithms to screen images at scale.  Indeed, the screening of the images in an article against millions of previously published articles is only possible algorithmically.  As a by-product of this functionality, collaborative efforts are in progress to enable publishers to screen their submissions at scale against manuscripts submitted to other publishers. 

In contrast to their use for image screening at scale, algorithms are inappropriate for evaluating specific allegations of image manipulation.  Visual inspection is the only standard that should be used (unless the allegation is that an image has been created using generative AI, see below).  The algorithms I have tested have too many false negatives and false positives to be used in this capacity.  Thus, a negative result from an algorithm is not proof of innocence, and a positive result is not proof of guilt.

In some cases that have played out publicly on X, authors attempted to refute allegations from Bik of image duplication using algorithmic methods, but the results produced by those methods were flat-out wrong when compared to visual inspection.  I have experienced similar situations, and Leonid Schneider has also reported on cases  in which publishers used an algorithm to incorrectly dismiss allegations of image manipulation. 

The perception that algorithms are superior to human ability is dangerous at the current stage of algorithm development—false negatives will lead to manipulated image data remaining in the published literature, and false positives will lead to incorrect accusations of misconduct.  Visual inspection remains the gold standard for evaluating specific allegations of image manipulation.

In 2015, I established a company called Image Data Integrity to consult about image manipulation in biomedical research.  I work mostly with institutions conducting internal investigations in response to allegations made on PubPeer.  I continue to see image manipulations in the same types of images — mostly in blots, but also in other images, such as, micrographs, photographs and scatter plots.  I continue to see the same types of manipulations and duplications.  I also continue to see the same types of inconsistencies between loading control and experimental blot image panels (e.g., one is spliced, and the other is not spliced), indicating that they were detected on different blots, making the loading control invalid.

I also continue to hear the same types of responses from authors, the most common of which is, “Oh, it’s just a loading control.”  It is time for the biomedical research community to firmly reject this perspective.  Without a valid loading control, a blot experiment cannot be properly interpreted  – no valid comparison of protein levels can be made  – and thus any conclusion the authors have drawn is unreliable.

The predominant change in the last 20 years has been the dramatic increase in the number of published articles questioned for image manipulation. The increase, which has several drivers, has occurred despite the fact that people’s skills in Photoshop have undoubtedly improved over this timespan.

First, we have seen  a dramatic increase in the number of articles published per year, especially low-quality papers.  This surge is due in part to the advent of predatory publishers, which were an unintended, but not unforeseen, consequence of the pay-to-publish open access business model pioneered by BioMed Central over 20 years ago.  Predatory publishers carry out only cursory (if any) peer review to maximize the number of articles they publish and thus their profits.  

The increase in the number of articles published also reflects the emergence of China as a global player in biomedical research within the last 25 years.  This emergence has added hundreds of thousands of articles to the published biomedical literature annually, due, in part, to problematic incentives designed to boost the country’s rankings

Anonymous public commenting has become common, which has led to increased detection because of an increase in the number of image sleuths scouring the published literature for image manipulation. The accuracy of those sleuths’ allegations has improved; they are consistently more reliable than in the early days of Clare Francis and are therefore taken more seriously. Finally, algorithms that detect image manipulation have come online, lowering the bar for entry to becoming a sleuth and to implementing systematic image screening.

Whether the percentage of published articles with problematic images has changed is unclear.  Those numbers were remarkably consistent over the 12 years that I was involved with image screening at JCB, and similar studies more recently have provided similar numbers.  

However, this rate is irrelevant to the important consequence resulting from the increase in the absolute number of articles questioned for image manipulation.  The increase, along with coverage by sites like Retraction Watch, and more recently, by the national media, has led to an increase in the awareness of the problem of image manipulation among all stakeholders in the publication process and even the general public.  

The increased awareness has led to some reassuring actions by various stakeholders, but more is needed. 

No one disputes the value of the goal of preventing manipulated images from getting into the published literature, but there is considerable debate over who should be responsible for screening and at what stage of the research process.  In my opinion, all stakeholders should be screening data and evaluating data anomalies─in grant applications (funders), before manuscript submission (researchers and institutions), and during manuscript review (publishers). This is like the Swiss cheese model of prevention that we became familiar with during the COVID pandemic. There will be holes in the checks at every stage, but, with more stages, more holes will be filled.

The goal of preventing falsified image data from reaching the published literature should be an end in and of itself, but additional benefits to all stakeholders include the savings in time, reputation, and money from dealing with fewer allegations of image manipulation.

Anecdotally, the availability of algorithms has led many more publishers to systematically screen images before publication.  Unfortunately, as far as I know, no-one is tracking that number.  For those who had previously done no screening at all, this is an improvement, and, overall, this reflects a sea change toward achieving the goal of detection before publication.  However, some publishers have downgraded from visual screening to algorithmic screening.  Ideally, they would continue their visual screening and use the algorithm as an additional check.

Nearly five years ago, Alison Abbott wrote a feature article in Nature describing three European institutions that were using algorithms to screen all manuscripts produced by their staff for evidence of image manipulation before submission to a journal.  I believe that the thousands of other research institutions around the world should do the same.  If institutions are concerned about imposing screening on their staff, they could at least offer this as a service to those who want it.  Institutions should also consider screening the publications of job candidates before hiring them.

Anecdotally, many institutions have established mechanisms to be notified about allegations made on PubPeer and social media (such as subscribing to the PubPeer “Dashboard”), but this effort is reactive.  Institutions must switch their footing from a reactive stance to a proactive stance to try to prevent image manipulation from happening in the first place, and, if it does happen, to prevent manipulated images from getting submitted to a journal.  Again, these goals should be ends in and of themselves, but they will also help to preserve the reputation of the institution and keep the grant money flowing in.

We cannot overlook a root cause of the image manipulation problem: the institutionalized culture of publish or perish caused, in part, by the criteria used by institutions for hiring and promotion.  A lot of ink has been spilled over the past dozen years or so discussing the need to change the incentive structure in biomedical research to reduce the pressure on individual researchers to publish positive results.  I will not reiterate the arguments here, except to say that, as one of the founders of the Declaration on Research Assessment (DORA) in 2012, I wholeheartedly support these efforts.  But progress in this direction is disappointingly slow.

Researchers know that they can get just about anything published these days.  Under the current pay-to-publish model, there will always be publishers that game the system and publish any submission without concern for the integrity of the data.

But researchers also know that they cannot get just about anything funded.  Funders could help to dis-incentivize image manipulation if they screened the applicants’ publications that are cited in a grant application.  I am not aware of any funder that does this, and this is truly a missed opportunity to avoid funding research based on fraudulent data and feeding a vicious cycle.

In a promising development, the recent funding for the NIH came with an admonishment from the US Congress to “proactively look for signs of academic fraud.”  As far as I know, the NIH has not yet responded with any indication of how they will do this, but a simple approach would be to screen the PI’s cited publications in a grant application or progress report before approving it.

Some institutions have offered access to algorithms to their staff, relying on the PIs to use them to screen individual manuscripts before submission to a journal.  This, in my opinion, is an inappropriate work-around being offered to PIs, who should directly compare all of the data in figures prepared for a manuscript to the source data from the lab (or at least delegate this responsibility to a trusted and trained lab member or even to a core facility provided by the institution).  This expectation needs to be codified in institutional policy.

Given the scale of the problem, PIs must come to terms with the possibility that image manipulation might happen in their labs.  They also need to accept the possibility that what they’re seeing in PowerPoint presentations in lab meetings may have already been manipulated.

Crucial to the recommendation to compare prepared figures to source data is an understanding of what constitutes source data.  They are not the composed, assembled display of multiple image panels in a PowerPoint file.  They are the actual files obtained through the research process, unaltered post-acquisition.  In the case of image data, they are the proprietary files produced by the imaging system (or the actual pieces of X-ray film, if you’re an old school autoradiography person).

I acknowledge that the software that drives gel documentation systems and microscopes has image-processing capabilities.  Thus, images may have been altered before they were ever saved in the proprietary file type, and comparison to the published image would not reveal the alteration.  But if there are any suspicions that an image in a proprietary file type has been altered post-acquisition, those changes should be present in the file metadata.

The only sure way to dispel an accusation of image manipulation in a published article is for the accused researcher to provide the source data for comparison.  Too often in institutional investigations, researchers claim that they don’t have the source data because their institution did not mandate the retention of those data for more than a few years.  That needs to change.

Data storage is now much cheaper than it was 20 years ago—disk storage is now ~100-fold cheaper than it was then.  Institutions should mandate that the data underlying a published article be retained in perpetuity (at least for the actual data presented in the article) and provide the data storage and structure necessary to do so.

I am not talking about the multiple terabytes required to store the thousands of images in an imaging screen or the hundreds of whole slide images (WSI) that were reviewed for a study.  I’m talking about the few megabytes (or gigabytes for WSI) needed to store the few images from that screen or study that were shown in the article.

The only publisher that mentions the concept of retaining data in perpetuity is Springer Nature for the Nature Portfolio of journals.  They state that “We recommend retaining unprocessed data and metadata files after publication, ideally archiving data in perpetuity.”  This needs to be changed to a requirement by all journals, to begin to force institutions to do the right thing with the data underlying a publication and retain it in perpetuity.

There are numerous precedents (many of them cited on Retraction Watch) for retractions predicated on the inability of authors to provide source data.  Institutions will be protecting their PIs from this eventuality by requiring them to retain these data in perpetuity.

Considerable progress has been made in addressing the issue of image manipulation over the past 20 years, but challenges remain. Paper mills are a growing and dire problem for publishers and the integrity of the published literature.  These are companies that produce wholly fabricated research papers (often duplicating images from previously published papers or fabricating them using generative AI, as discussed below).  The companies get the papers accepted for publication in a journal and then sell authorships on those papers.  The higher the impact factor of the journal, the higher the cost of authorship. 

One study by a major publisher used software to screen manuscripts for hallmarks of paper mill production and found that 10 to 13% of its submissions were from a paper mill.  I refer the reader to some recent commentaries and news articles on paper mills for more information.

The use of generative artificial intelligence exploded with the release of Chat GPT in November 2022.  Even before then, however, computer scientists were working on machine learning frameworks for generating images, and some have applied those technologies to image types relevant to biomedical research (specifically Western blots).

In early 2021, Qi, Zhang, and Luo posted a preprint describing the use of a generative adversarial network (GAN) to produce synthetic Western blot images that strongly resembled authentic blots.  In 2022, Wang and colleagues published an opinion piece in the journal Patterns in which they demonstrated the synthesis of both Western blot images and medical photographs using a GAN.  Also in 2022, Ed Delp from Purdue University and his colleagues used a variety of generative models to produce a library of 24,000 synthetic Western blots.  They applied various types of detectors to this data set, the best of which reportedly was able to detect synthetic blots with 90% accuracy.

More recently, one of the authors of that paper took the detection of synthetic Western blots a step further and reported that he and his colleagues could detect the particular AI model that was used to produce a synthetic Western blot.

We are in the dark about how bad the problem of generative AI already is in biomedical publications.  The only thing we can say for certain is that it will get worse, with the now-widespread access to AI models for generating images and the difficulty of detecting them visually.  The old hallmarks of visual inspection do not apply.

The development of algorithms for detecting images produced by generative AI is promising, and the near-term opportunities include independently verifying their effectiveness, using them to document the current extent of the problem, commercializing them, and implementing their use among all stakeholders in the publication process.

While journals and institutions are still working through the massive backlog of cases of image manipulation perpetrated using old image-processing techniques, generative AI is currently creating the new front in the battle against image manipulation.  I had hoped I would be forced into retirement by the end of image manipulation in the published literature, but it looks like I might be forced into retirement by generative AI that is impossible to detect visually.

Twenty years ago, people were either unaware of the problem of image manipulation in the biomedical literature, or they thought the problem was so minor that they didn’t have to deal with it.  Today, it would be difficult to claim to be unaware of the extent of the problem.  Twenty years ago it was mostly a one-sided arms race – images were being manipulated, but very few people were doing anything about it. 

We may still be losing the arms race, but now at least it’s two-sided.  

Mike Rossner is the founder of  Image Data Integrity, Inc.

Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on Twitter, like us on Facebook, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].

Processing…
Success! You're on the list.

21 thoughts on “What’s in a picture? Two decades of image manipulation awareness and action”

  1. Image manipulation is the most obvious means of fraud and your estimates are vast underestimates, at best. Only the laziest are caught with these now easy to detect misrepresentations and often it is simply blamed on a “mixup,” whereupon a new image is substituted. In egregious cases the image manipulation represents a pattern that spans the arc of a research career, but this, again, is the purview of the careless. The frontier ahead lies in the data itself–the raw data points used to generate the graphs and figures, and hidden behind the iron curtain of a data availability statement that is honored, at best, in 10% of instances [1][2]. Even when data are furnished for publications the norm is to supply the post-processing, post-normalization datapoints as they appear in graphs, rather than the raw untampered output from the apparatus used to generate it (with verifiable timestamp and metadata intact). Anonymous requests for raw data via Pubpeer and even directly via email are conveniently ignored. Certainly “we have come a long way” but I can tell you in no uncertain terms that the path ahead is longer than we admit.

    contro-vento

    [1] Gabelica M, Bojčić R, Puljak L. Many researchers were not compliant with their published data sharing statement: a mixed-methods study. J Clin Epidemiol. 2022 Oct;150:33-41. doi: 10.1016/j.jclinepi.2022.05.019. Epub 2022 May 30. PMID: 35654271.
    [2] Hussey, I. (2023, May 8). Data is not available upon request. https://doi.org/10.31234/osf.io/jbu9r

  2. Very nice article, Mike, and thank you for your long-standing efforts in this area of research integrity. Although, as I think you know, I have no scientific background, I have been active in this space (until recently, anonymously), and I’ve personally benefited from the early work of you, and the work and advice of, Clare (Sometimes Claire, for whatever reason), Elisabeth and Paul.

    Comments about a couple of your observations:

    * Algorithms vs Human: I agree with your view that at this point algorithms cannot function as well as a skilled human, and I too, worry that authors and editors are too quick to pretend that these tools are a panacea or a way to excuse obviously problematic images. However, one function at which the algorithms may have an edge over humans is “reproducibility.” If you or I run the same image through ImageTwin, we are generally going to get the same. For larger organizations the ability to get repeatable results seems to be an important element of a rigorous review process, and for legal reasons it could even become critical. I think the goal of making a concern about an image “objective” over “subjective” is a good one, if a difficult one to reach.

    I think it is a problem if the human-based image analysis step in this process is dependent on the *skill level* of the humans involved. Systems that depend upon highly skilled humans seem to need large financial incentives to perform satisfactorily (you need to pay for top talent), and I don’t see that this segment of scientific publishing is getting anywhere near this level of funding.

    * Generative AI: Generative AI tools can *already* create unique images that cannot be detected as fake by the naked eye. This means the practice of asking for raw data to address concerns in a paper is *already* insufficient to demonstrate authenticity for motivated fraudsters; with some effort, fraudsters can provide something that looks like original image, but isn’t (I just went through this with a paper from Oncotarget). I agree with your statement that retaining raw data files generated directly by the lab equipment will be helpful (making this mandatory will be hard), but I think we are only a few AI generations away from where that on its own becomes insufficient too. As today, we will probably catch the sloppy cheats, AI will make it harder to catch the clever ones.

    On the latter issue, I do expect better AI *detection* tools to be developed, but I believe the arms race has already started and human eyes alone will play a smaller and smaller role going forward. I do feel confident that the fake images that get published today will be able to be detected by better tools in the future, so fraudsters should not sleep comfortably after a fake paper gets published.

    Looking forward, I think mass-produced science needs to adopt a “supply-chain” mentality to ensure that research integrity is maintained from beginning to end. The system is now too large, and the incentives are too great, for “trust” to play a role anymore (I personally think people who continue to talk about “trust” in science should be derided as hopelessly naïve). Well-designed supply chain management tools could provide an audit trail and transparency for interested parties to investigate whenever they wish – and this might include your idea about starting with the raw data files generated by the lab equipment. Unfortunately, I think it will be hard to get a sea-change like this implemented unless the funding agencies require it (for publicly supported research).

    Again, thank you for all you’ve done.

  3. What are the low-hanging fruit here? We need multiple checks from start to finish, but what are the stages where less-intensive visual detection efforts would be appropriate? I’m thinking about copyeditors and proofreaders employed by publishers. What kind of training would be appropriate for them?

    1. Soon AI evaluators will become smart enough to detect such manipulations in milliseconds (if not already). Making such AI modules is pretty easy at this time. They just need a large training set of genuine images versus manipulated images. Then they will learn to detect manipulations at high accuracies or at least flag suspicious images for human evaluation.

      And soon, hostile AIs and hostile humans will become smart enough to leave almost no trace of image manipulation. And this cycle of war will continue to turn.

    2. Hi Leslie. There are no work-arounds here. Images in a manuscript should be systematically screened before submission by institutions and during manuscript review by publishers. Staff who do not have a scientific background can be trained in both visual screening techniques and the use of algorithms. Those checks are on top of the responsibility of PIs to compare prepared figures to the source data before submission. There is no special training required for that step; just a side by side comparison to answer the question “do the data in the prepared figure match what was actually observed?”

  4. I worry that in the AI-generated image vs. AI detection, it’s an arms race that will always be murky as to who is winning. If we do a careful validation that a specific AI-detector works well 99% of the time, might not the AI-generation software update the very next day and now we’re down to a 10% success detection rate?

    And yes, I’m simplifying the numbers: there’s false-positives and false-negatives. My main point is that people keep tweaking the AI algorithms, not always openly, so I’m not sure what it would take to really know AI-detection software is adequate.

    The one possible counter-argument would be that maybe it will work adequately retroactively: we’d be able to identify images in 2022 papers that were AI-generated due to tell-tale AI signatures that hadn’t disappeared in subsequent years.

    1. Yes, I think the retroactive aspect will always be the way this works. There’s no incentive to build tools to detect fraud that isn’t yet known about. The tools will be developed to detect, and then minimize the harm, after fraud is discovered by alternative means.

      The deterrent is still there. If someone’s fraud isn’t caught until after their career has advanced, all that that they gained by cheating can be taken away. A precarious position to be in.

      1. Hi Cheshire. Thanks for your kind words above.

        The detector will always be behind the perpetrator in the arms race. What seems crucial to me is to get new detection tools applied systematically (before publication) soon after they’re developed, so at least that particular type of manipulation (or generation) is then prevented from getting into the literature.

  5. How the sleuths choose which papers to scrutinize is unclear

    Many would be happy to talk about their strategies if asked.
    Clearly this bears on the problem of screening papers proactively. Publishers are interested in which strategies for revealing concerns could be applied to manuscripts as they arrive, rather than discovering the problems later. Ambulances, fences, cliffs.

  6. One of the common refrains you hear from senior authors of papers that are found to contain manipulated images is that the fault lies with another author who prepared the figures.
    If a PI insists that during the writing of the paper the person contributing a figure presents a) the raw original data, b) the final figure and c) how b) arose from c) then a lot of these “Oops an accident” problems would vanish.
    Have lab meetings with people you are working with and deal with the nuts and bolts of presentation early in the process of writing the paper.
    It won’t stop paper mills and unscrupulous PIs, but if you are neither it would save you a world of hurt down the road.

  7. Would it be reasonable and practical for journals to append to each published article a warning about deficiencies in the review and editorial process, such as “Raw data not independently reviewed”, “Not verified by an independent statistician”, “Lab notebooks unavailable”, etc? The information accompanying prescription drugs describes adverse reactions and may include a “black box warning”. Maybe authors (and journals) would thus be motivated to enhance the credibility of their articles by minimizing such warnings; and readers will be reminded to evaluate the article with appropriate caution.

  8. Great article!
    I think it’s rather telling that, in a long discussion covering decades of image manipulation in the scientific literature, the Committee on Publication Ethics (COPE) is not mentioned once. This fact alone serves as a strong indicator for how completely useless they are at addressing the problem. They collect fees, provide a logo for publishers to shove on their website saying “we’ve got this whole ethics thing licked”, and then do absolutely nothing.
    Thankfully, as the good Dr. Rossner illustrates here, who needs COPE when there are plenty of other ways to address the problem?

    1. Thanks, Paul. You raise an interesting point. COPE does have flowcharts and recommendations for how publishers should handle allegations of image manipulation. But they have never engaged me in their discussions while developing these resources, so they are not really on my radar. I never joined COPE during my time as a journal editor or publisher because I never felt the need to have some third party’s seal of approval.

      1. Mike, from my long experience in working with journals about questioned images, you do understand that the negative perception you just expressed about the “need to seek approval” is widespread and the roots of the problem that now confronts correction of the literature!

  9. Hi John. I’m not sure what you are referring to. I did not feel the need to have the approval of a peer group for what I was doing to address the issue of image data integrity at JCB. Are you referring to the perception that journals’ concerns about their reputations makes them hesitant to address allegations of image manipulation in published articles? My response to that would be to encourage journals to address the problem of image manipulation before publication with systematic screening of accepted manuscripts.

    1. None of the above, really. Here I consider only journals, mostly others but also the events at JCB to which you refer, that for a variety of reasons seem still to fail to use commonsense solutions -fully within their own prerogatives- that would alleviate much of the problem

      Indulge me a little History here. As you referred to in your narrative, I had invited you with your screening assistant, and also Drs Hanni Farid and Ken Yammada to meet with us at ORI in January 2005. The result of that meeting was we all agreed that ORI should try to organize a conference to address what ORI had long known would be a growing problem. (For the record, it was Dr. Chris Anderson who first expressed concerns about digital images at a 1993 ORI conference (see 2 reports in Science 263(5145): 317-318, 1994). ORI’s publication on “False Images” was in 2002, and ORI’s first forensic tools appeared in 2002.

      As a result of our January 2005 get together, ORI was then able to organize a sponsored meeting hosted by AAAS with multiple speakers to which all agreed. Unfortunately the key speaker (a journal editor) then got upset with simply the order of the talks, claiming the government was being too controlling. Our rationale was simply that we did not want to force all the big names to not have a timely discussion session, (or as I saw it to not have to sit on their prostates all morning without speaking). Perversely then, despite our attempts at revision, nothing including AAAS could put “Humpty” back together again.

      Getting the AAAS editor-in-chief Dr. Kennedy’s buy-in for that meeting as no small ORI accomplishment! Then not long afterward the Hwang case you referred to hit.

      Second, I was told journals editors knew this was an important issue and so collectively encouraged the National Academies to examine it. I could be wrong, but I thought the editors concern created the initial motivation for the National Academy’s book “Research in the Digital Age.” Predictively, NA experts instead followed their own areas of interest and so never really addressed initial motivation. I confess unawareness about any specific conference devoted to these problems and what can be done about them.

      It takes more than awareness to detect Image Falsification. As a point at hand, the coauthor at NIH on your paper later found image falsification had occurred in his own lab. One of the ORI tools I chose as a reason for using false coloration in my PubPeer Affidavit for Sarkar was that NIH case. (That image is at point 37, pp16-17 in my PubPeer Affidavit.) My underlying motivation was to show university scientists that most of these detection schemes were straightforward and as such require neither particular nor specialized expertise to evaluate the merits of an allegation. PubPeer proves that point. The rationale behind the initial tools is still on the ORI website.)

      Finally, your comments apply chiefly to concerns about detecting falsification blots as intentionally AI-modified images (addition to false positives) is a valid concern. But in fact many other scientific images can easily be detected that have been falsified without manipulation (by simply altering the conditions in which the data was acquired). But neither of these possibilities is a problem if journals would only insist on requirements -fully within their own purview- to insure Institutions keep original data.

      But foremost, I agree fully with the overall excellent points in your preceding contribution to Retraction Watch. Good article, but incomplete. IMHO the history here is that many journals have worked themselves into this dilemma either by mistrust or appearing “to seek the approval” based upon the experience of other entities. The real story begins in what happened 30 years ago, not 20.

      1. Sorry, about my 80 yr old fingers, which left out ” . . by mistrust or not wanting to appear to seek the approval” based on the experience of other entities.”

  10. Hi John. I apologize for not citing your work and contributions in the list of other people who had raised concerns about image manipulation in biomedical research before JCB started its image screening program. To address that oversight, here are links to your questioned images article (https://www.tandfonline.com/doi/epdf/10.1080/08989620212970) and your Forensic Tools page (https://ori.hhs.gov/forensic-tools).

    I will not get into specifics, but I have a different perspective on the demise of the proposed ORI-sponsored conference in 2005. Despite the setback caused by this journal editor, we have come a long way.

    Here’s my recollection of the National Academy activities in 2006/2007. In February of 2006, the editors of JCB wrote a letter to Ralph Cicerone (then President of the NAS) asking the NAS to address the issue of image manipulation: “We feel that establishing clear and logical standards for ethics in scientific publishing is one that is appropriate to be addressed by a high level panel convened by the National Academy of Sciences.”

    In November, 2006, Cicerone sent a request for information to all editorial board members of PNAS asking for their input because the NAS Committee on Science, Engineering, and Public Policy (COSEPUP) was about to begin a study on the integrity of research data. In April 2007, Randy Schekman, Monica Bradford, Linda Miller, and I were on a panel that presented to a COSEPUP subcommittee called the “Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age”. The product of that committee’s work was entitled “Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age” (2009, https://doi.org/10.17226/12615). Some of that report was incorporated into the third edition of “On Being A Scientist” (2009, https://doi.org/10.17226/12192).

  11. Thanks for the great article, Mike! Image manipulation detection is only part of the fraud detection story – plagiarism has been a concern for many years prior, and has met with much the same results. The journals go “Meh”, the corresponding authors speak of “mix-ups”, people think that algorithms exist to detect plagiarism (they don’t, they only find text-matching bits that could constitute plagiarism, but they also miss a lot), and plagiarized publications don’t get retracted.
    The hope that there will be algorithms to detect image manipulations consistently (that is, without false positives or false negatives) is a false one. As you have stated Mike, a human must always decide, but it is getting harder and harder to detect manipulations with the naked eye alone. And it won’t get better – the AI systems are needing enormous amounts of energy and water to run, and they will soon crash into a wall that will have them collapse. We do not know which old (pre-2022) images are manipulated and which aren’t – the ones we think aren’t could be manipulated as well. That makes training an AI impossible.
    What has to get better is academic publication, a system which is utterly broken at the moment. It might help a bit to publish the peer review reports with the publication online, and to include the names of the reviewers. But that also has problems, I just reviewed a manuscript that I do not want the author to know that it was me for fear of retribution.
    We need to get away from “# of publications” = “how good you are”, but that will need a seismic shift in how organizations determine who gets money and promotions.
    It is good to know that there are colleagues who are fighting this system, too: we are not alone.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.