Editor and authors refuse to share data of paper containing alleged statistical errors

Olivia Robertson

Last July, David Allison and his students identified what they considered to be fatal errors in a paper that had appeared in Elsevier’s Diabetes Research and Clinical Practice.

The authors of the article, led by Sergio Di Molfetta, of University of Bari Aldo Moro in Bari, Italy, used a cluster randomized controlled trial, but did an improper statistical analysis, according to Allison’s group. 

In August, Allison, dean of Indiana University’s School of Public Health in Bloomington, and his colleagues requested the authors’ data.

Then they hit a wall.

Over the next three months, Allison –  who said he works to contribute to the “regular reproducibility, transparency, and trustworthiness of science” –  and his colleagues emailed with the authors, explaining their reasoning for requesting the data. They copied Antonio Cerellio, the journal’s editor in chief, on an email requesting he uphold the journal’s policy on data sharing. 

In November, in an email seen by Retraction Watch, Francesco Giorgino, the paper’s corresponding author, said sharing data with a third party would breach the study participant’s consent and European rules on data protection. 

The suggested statistical analysis by Allison’s group would “likely be biased” as there were only 23 patients in the study, Giorgino said. However, “some descriptive steps of the statistical analysis were perhaps carelessly omitted,” he said because the authors thought they were too detailed for a pilot study. Giorgino did not respond to our request for comment.

In April, Allison’s group submitted a critical letter of the paper to the journal. This letter was rejected in May by Cerellio as the “paper cannot be assigned sufficient priority to allow publication.” Olivia Robertson, a postdoctoral fellow in Allison’s group, emailed Cerellio urging him to reconsider publishing the manuscript to address the paper’s errors. 

In an email seen by Retraction Watch, Cerellio responded:

I think that it its time to stop this never-ending story. You have been in contact with the authors, who were raising several legal issues. From the journal’s view, considering that the paper received a full review process and that it is not our intention to be engaged in a legal issue, the question stops now. If you continue to have concerns, you can prosecute the authors. 

Cerellio did not respond to our request for comment.

Allison and his group have not heard from the editor since this response and said they are unsure of what “legal issues” Cerellio may have been referencing.

“We are disappointed that the editor and original study authors have not been more responsive and committed to upholding the accuracy and integrity of science by correcting the errors we identified but we are forging on, nonetheless,” Robertson said. “Unfortunately, this isn’t the first time our group has received an unsupportive response.”

Like Retraction Watch? You can make a tax-deductible contribution to support our work, subscribe to our free daily digest or paid weekly updatefollow us on Twitter, like us on Facebook, or add us to your RSS reader. If you find a retraction that’s not in The Retraction Watch Database, you can let us know here. For comments or feedback, email us at [email protected].

18 thoughts on “Editor and authors refuse to share data of paper containing alleged statistical errors”

  1. How is one supposed to “prosecute the authors”? Under what legal theory could a civil plaintiff assert claims against the authors?!

    1. You can’t, in my opinion it’s basically the equivalent of “doing what you want would likely incriminate us, so get bent. We know this looks bad, so we’ll throw some stuff at the wall so those sympathetic (and also worried about their own iffy publications) can push a narrative on Twitter.” It’s the state we’ve allowed much of science to reach.

      1. Unfortunately this happens far too often. Researchers will claim, in the US, HIPAA or some other privacy reason for not releasing their data which is ridiculous considering the data are blinded. Often you have editors just not willing to rock the boat. But just once I would love to see a researcher just blatantly state that we don’t want to give you the data because you just want to find something wrong with it. Now that is the state of science today. The most egregious discipline by far is climate science. Not only do editors refuse to publish anything that does not go along with the current narrative but they also blatantly require any data that does not adequately support the narrative be removed from a manuscript. No s*** folks this happens often I know! So in the fields that quite frankly we see the most government money flowing to are the ones that are the most blatant at usurping scientific rigor or blowing it off altogether. This too happens in pharmacology however it is typically caught because there are so many ways to interpret and/or use said data.

  2. My field of research is plagued by data manipulation. I’ve begun a project to attempt to analyse the raw data for several figures from approximately 100 papers that claim that a particular compound has therapeutic efficacy–a compound which my group has tested extensively and found no effect. Despite data sharing statements to the contrary, we have been unsuccessful at obtaining any raw data. If the National Institutes of Health were serious in its quest to promote transparency and reproducibility it would make publication of *all* raw data, not just large datasets, a precondition to funding. This “dirty little secret” is costing taxpayers billions of dollars.

    1. Fwiw, with the 2023 NIH data sharing policy, the NIH specified that small datasets are not an acceptable reason to not share data. But it is going to take a few years for studies funded under the new policy to make it to the data sharing phase.

  3. I find your choice of using Dr. Robertson‘s photo as illustration surprising. She is mentioned only in passing and doesn’t appear to be the main character in this saga.

  4. The GDPR concern may be legitimate. I understand that if the consent forms did not mention that data could be shared with other researchers to reproduce the analysis (which it should have), it would be illegal to do so. Anyone has experience with this?

  5. Shared clinical data is de-indentified as routine practice, and all analytical procedures should be clearly specified. Shame on the authors and the journal’s editor. Most importantly, shame on Elsevier who should simply replace the editor with someone who understood scientific ethics. “Prosecute” indeed.

  6. This looks like a tricky subject that I feel RW didn’t address adequately here. Any entity under GDPR would have legitimate, legal concerns sharing any non-aggregate data to U.S. researchers simply by virtue of the looser restrictions there.

    Hence the editor’s quip about prosecuting the authors, which I understand would likely turn out in the author’s favor. Even de-identified data still classifies as personal and thus protected if it’s not presented in aggregate. (bit.ly/3XnwlFu, page 3)

    RW should be the adult in the room here and state, before picking sides on anecdotes like this, their position and constructive thoughts on research reproducibility in the age of GDPR, a topic others have covered to a much greater extent.

    1. Then per that journal’s polocies, it shouldn’t be published there. Data can’t be “trust me bro.”

      1. I agree “trust me” doesn’t work. Sloppy researchers definitely shouldn’t hide behind GDPR.

        But neither does disregarding the law: all it takes is a single participant to file a successful complaint for the uni’s lawyers to shut research down faster than a faked data scandal.

        The problem here as I understand is the authors could be held liable for handing over data collected under GDPR to U.S. researchers not bound by it. I’m thinking the solution is to instead hand over data to an EU-based team bound by GDPR.

        Would this work? Should publishers facilitate it? Rejecting all papers that that use GDPR data sounds extreme. I’m not an expert, and I would like RW to go beyond the anecdote here and perhaps interview an actual GDPR expert to provide some context and insights to their readers, that’s all.

        1. You have to either:

          (1) De-identify the data before sharing or publishing on it
          (2) Not write papers on data you can’t show so your work can’t be checked and actually peer-reviewed instead of peer-stamped.

          The alternative is just “trust me bro” and welp, no thank you. There’s more than one reason for the replication crisis, but “trust me bro” is definitely one of them.

    2. GDPR relates to personal (directly identifying) information. If the data are de-identified, GDPR wouldn’t apply anymore?!

      1. Sorry, I missed the link about de-identified data in your post. I was told that de-identified data -or more precisely: anonymized data (no key to re-identify individuals exists)- don’t fall under the GDPR.

        1. To elaborate: so-called de-identified data refers to data that do not allow the individual to be identified in the absence of additional data (to put it in the broadest terms). Even aggregate data can be used in certain cases to determine the values for some of the individuals involved. Advertising in the U.S. is based on a surveillance model which combines working with all publicly available data together with theoretically anonymized data which identifies each user uniquely. On the other hand, there is some theoretical work about how to properly anonymize data by introducing a small amount of noise in the aggregate figures. https://en.wikipedia.org/wiki/Differential_privacy
          None of this is helpful in policing academic research misconduct and my impression is we have conflicting requirements in place, some with the force of law and others with the force of policy, when human subjects are involved.

        2. I understand that under GDPR, only aggregate data (e.g. averages by age group) is considered anonymous and not regulated.

          So a table of physiological measurements where each row is a participant’s data (height, weight, hemoglobin, cholesterol, etc.), even without a patient ID, is not anonymous because you could potentially re-identify the participant from this unique sequence of values *if* you had the original ID table to cross-reference. (Even if you don’t *actually* have it!) This is called “Article 11 de-identified” in the source I linked and qualifies as personal data under the law.

          I’m not an expert here and would love to learn and dig deeper. I fell down this rabbit hole from my hunch that the authors had real legal concerns with sharing the data. We can’t resolve this here but for now I’ll just not blame the authors for the “wall” Allison hit. I have no reason to doubt their motives for not sharing, nor the adequacy of their peer-reviewed work (without more details on Allison’s findings).

    3. A really useful narrative here would be for RW to discuss how issues of Data protection can be overcome to ensure transparency and openess. Some DPOs consider annonymised data pseudo-anonymised and don’t permit sharing for fear of prosecution.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.