A new study in Clinical Chemistry paints an alarming picture of how often scientists deposit data that they’re supposed to — but perhaps not surprisingly, papers whose authors did submit such data scored higher on a quality scale than those whose authors didn’t deposit their data.
Ken Witwer, a pathobiologist at Hopkins, was concerned that a lot of studies involving microarray-based microRNA (miRNA) weren’t complying with Minimum Information About a Microarray Experiment (MIAME) standards supposedly required by journals. So he looked at 127 such papers published between July 2011 and April 2012 in journals including PLOS ONE, the Journal of Biological Chemistry, Blood, and Clinical Chemistry, assigning each one a quality score and checking whether the authors had followed guidelines.
What he uncovered wasn’t pretty — and has already led to a retraction. From the abstract:
Overall, data submission was reported at publication for 40% of all articles, and almost 75% of articles were MIAME noncompliant. On average, articles that included full data submission scored significantly higher on a quality metric than articles with limited or no data submission, and studies with adequate description of methods disproportionately included larger numbers of experimental repeats. Finally, for several articles that were not MIAME compliant, data reanalysis revealed less than complete support for the published conclusions, in 1 case leading to retraction.
Here’s that retraction, for “Host cells respond to exogenous infectious agents such as viruses, including HIV-1,” published in PLOS ONE in 2011:
The authors wish to retract this article for the following reason:
Upon re-evaluation of the analyses performed, we discovered an error in the data fed into the software, which resulted in incorrect results in Table 2 and Figure 2. During the initial analysis, we eliminated miRNAs if they showed an expression CT of value 35 in over 75% of the samples. This decision was based on the instructions from the software during the initial data feed process for the selection of particular miRNAs (row) for exclusion. Unfortunately, the software included the excluded miRNAs as controls along with the endogenous controls and analyzed the data. As a result, the analyses identified miRNAs that are not statistically significant.
The multiple corrections on the paper show a correspondence between Witwer and the paper’s corresponding author, Velpandi Ayyavoo, dating back to October 2011. The original study has been cited six times, according to Thomson Scientific’s Web of Knowledge, including once by a study by Witwer and colleagues.
As Witwer writes:
Reporting and quality issues were found for articles in journals with impact factors ranging from approximately 1 to 30, with no obvious association between impact factor and quality score, indicating the endemic nature of the problem. However, other associations were clear. MIAME noncompliant studies were twice as likely to arise from array experiments with n of 1. Articles with vague descriptions of experimental design were disproportionately those with few experimental replicates. Studies with fully submitted data received significantly higher mean quality scores than articles with partial submitted data or no data deposition.
Witwer has a number of suggestions, many of which come down to researchers adopting a different ethos. We smiled at this passage:
Unless I have personally and fully funded my laboratory and research out-of-pocket, my data do not belong to me. They belong to my institution and to the taxpayer, and I have no right to withhold them to prevent another laboratory from analyzing my data in a way I did not consider.
Not surprisingly, the study is accompanied by an editorial titled “More Data, Please!” by Keith Baggerly, of MD Anderson. Along with a colleague, Baggerly, Retraction Watch readers may recall, was the bioinformatics specialist who uncovered a litany of problems in Anil Potti’s work. Baggerley writes of Witwer’s analysis:
I echo his concerns and agree the problems can and should be addressed. Data reporting problems are affecting a number of areas beyond miRNA studies.
Baggerly, who co-authored an editorial describing these problems in ‘omics research in 2011, explains why lack of access to data is grabbing everyone’s attention, citing a few recent studies:
Ideally, reproduction (which should be faster and cheaper) should precede replication as a sanity check. Poor data access hinders both. Even when data are supplied, reproducibility should not be presumed; in their survey of 18 microarray studies, Ioannidis et al. (5 ) were able to access data for 10 studies but could reproduce quantitative results for just 2.
Given this poor rate of reproduction, poor replication rates, such as the rate of 6 of 53 reported by Begley and Ellis (6 ), for even “landmark” studies are not a huge surprise.
He says the “implications can be severe,” but that “the problem is fixable.”