After PLOS ONE allowed authors to remove a dataset from a paper on chronic fatigue syndrome, the editors are now “discussing the matter” with the researchers, given the journal’s requirements about data availability.
As Leonid Schneider reported earlier today, the 2015 paper was corrected May 18 to remove an entire dataset; the authors note that they were not allowed to publish anonymized patient data, but can release it to researchers upon request. The journal, however, requires that authors make their data fully available.
Here’s the correction notice:
S1 Dataset was published in error. The error was corrected in the XML and PDF versions of this article on May 9, 2016. Please download this article again to view the correct version.
The paper “Therapist Effects and the Impact of Early Therapeutic Alliance on Symptomatic Outcome in Chronic Fatigue Syndrome” also contains the following message:
Data Availability: Our ethical permission did not expressly permit us to share patient data, even anonymised patient data, in a public forum. Data will be made available to bona fide researchers on application to the principal investigator, Alison Wearden or the trial statistician, Graham Dunn.
The author’s institution, Manchester University, sent Tate Mitchell the study’s consent form about a month ago, in response to a request. A records officer at the university sent Mitchell this comment along with the form:
Please note that we have not released our entire dataset. The data which are available in association with the PLOS-One article entitled “Therapist Effects and the Impact of Early Therapeutic Alliance on Symptomatic Outcome in Chronic Fatigue Syndrome” comprise a small anonymised subset of our dataset containing only variables relating to the analysis presented in the paper.
For more on this trial, which is sometimes referred to as a sister trial of the controversial PACE trial of chronic fatigue syndrome, see this post by David Tuller. Data from both this study and PACE — whose consent form is very similar — have been subjected to a number of freedom of information requests. And PLOS ONE added an editor’s note — that looked a lot like an Expression of Concern — to a PACE trial sub-analysis when its authors refused to share the data.
We were curious about whether the correction and removal violates PLOS ONE‘s clearly stated requirement about data availability:
PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception.
A PLOS spokesperson told us:
The authors contacted the editorial office about possible restrictions for the public availability of the data in relation to the information outlined in the consent form for the trial. The data was removed to avoid any possible breach in patient privacy. The PLOS ONE editors are in the process of discussing the matter with the authors, in consideration of the requirements for data availability under the PLOS Data policy (http://journals.plos.org/plosone/s/data-availability).
Update 5/23/16 9:10 a.m. eastern: We’ve heard from author Wearden, who told us more about why they decided to remove the dataset:
We published the PLOS One paper on therapist effects and therapeutic alliance and provided a de-identified dataset containing the variables used in the analysis.
On 10th March, I received a freedom of information request for a copy of the patient consent form for the FINE trial. The request referred to the ongoing case relating to the PACE trial, and raised the issue of whether in fact we had been correct to make the data relating to the therapist paper open to the public. We did not request permission to do so in our trial consent form.
The dataset supplied to support the PLOS-One article was supplied in good faith and in the belief (still held) that no patient or therapist would be identifiable from it. The Freedom of Information request made me wonder if we had acted correctly, given our ethical permissions. In consultation with my co-authors and after discussing with various colleagues, I decided that it would be better to remove the dataset from public access (while still being prepared to supply it to bona-fide researchers). I wrote to PLOS-One on 18th April asking them whether it would be possible to do this.
The contents of the paper have not been retracted. The dataset has not been retracted. There is nothing wrong with either of them. The only issue is whether or not we were right in publishing this dataset given the consent that we had obtained from the trial participants.
Wearden added:
…when I contacted PLOS about this originally, they acted quickly and efficiently in changing the status of the data.
Since the change in status of the data I have had a few emails and PLOS have told me that they have also been contacted.
I can confirm that I am in current email contact with PLOS. I have just emailed PLOS to tell them how I have been handling emails sent to me.
Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our new daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.
As an aside, the dataset in question which was removed by the FINE authors was also used as a proxy to analyze any effects the numerous changes made from PACE’s published trial protocol might have had*. Also, the anonymized dataset was available for roughly 17 months before being removed; this whole thing is just becoming more ridiculous by the day. It also raises the question that if explicit permission is required to share fully anonymized data, how can any data from any trial using human subjects be shared?
*Exploring changes to PACE trial outcome measures using anonymised data from the FINE trial.
Sam Carter 2016 Feb 15
http://www.ncbi.nlm.nih.gov/pubmed/23363640#cm23363640_14248
Opinion: raw data should always be submitted to editors and should always be available
after publication; of course in the case of patients the data should be anonymized.
And if the participant hasn’t given consent for their data to be published, even in anonymized form? Or an ethical review board has not given approval for the data to be published in that way?
We should have all of that into account when we design the study protocol and much time before starting the study: we should ask ourselves first if we want to publish any eventual results from our study. If the answer is yes we should make sure we only use data from patients who gave their consent, also that we had approval from an IRB, etc
I agree. And I think current current consent forms (and indeed underpinning ethical approvals) are more likely to reflect this.
Note the FIRM consent form is dated 2004. There is no way the authors of the form could’ve anticipated the advances/changes in data storage, privacy laws, and legal/ethical landscape that exists today. And unfortunately sometimes the specifics/nuances of eg consent forms only become apparent when there are problems.
On a different note, the 20 year timeframe for data retention actually surprised me. I’ve had studies approved as late as 2010 where data retention was limited to 7 years after completion of the study, based on university and Australian government health service rules.
Data Availability: Our ethical permission did not expressly permit us to share patient data
So the authors published the paper at PLoS in the belief that they could meet that forum’s requirements for publication. Then they belatedly realised that No, the requirements were out of reach. Have they considered retracting their paper accordingly?
It’s not a violation of PLOS’s standards unless the researchers hold on to it and control access to it themselves. It has to be placed with a third party who ensures long-term data access. I believe that could be the ethics board of the university, etc. But the researchers themselves cannot be the ones who control access to it.
Questionable treatment protocols are being set based upon the interpretation of undisclosed data. The ME/CFS community and multiple researchers find these protocols to be unscientific and discriminatory. If the authors are unable to provide the raw data for peer review and scrutiny as they claim, it seems it would be wise to retract the whole paper. Is the raw data not able to be disclosed for ethical reasons, or it is not being disclosed because it doesn’t support the paper’s conclusions and recommendations? This one doesn’t pass the smell test.
Is the raw data not able to be disclosed for ethical reasons, or it is not being disclosed because it doesn’t support the paper’s conclusions and recommendations?
Emphatically the former, as the data have been available for over a year before being taken down.
This is a complicated issue. While I am all for transparency, there are cases where even publishing de-identified raw data may be problematic. One example is data obtained from samples stored at national biobanks. Many countries have laws which explicitly prohibit export of personal biometric data (even de-identified ones) derived from biobank samples. Although the law community is slowly making inroads regarding the issue, even the most optimistic do not see a quick resolution due to the enormous implications to society. If the raw data requirement becomes absolute, many researchers who depend on biobank resources (mostly medical doctors and public health-related disciplines) will not be able to publish in many journals, which would be a net negative for the public and research community.
Another issue is consent. What did the participant consent to when they agreed to participate? The FIRM form has this statement (which I believe is not in the PACE consent form?):
“6. I understand that data collected may be stored in coded form for up to 20 years after my completion of or withdrawal from the study, after which time the data will be destroyed.”
Posting an anonymised dataset online would make it impossible to destroy all copies of the data after 20 years.
Does that just mean that personally identifiable data will be destroyed after 20 years? Not all data collected will be destroyed, as papers will have been published, meta-analyses conducted, etc. Does this consent form mean that researchers have a responsibility to destroy anonymised data? Presumably, this is already impossible as the data has been available for download from PLOS for some time.
Those are very interesting questions!
Publishing data within a paper would not have been new or controversial in 2004. Publishing underlying data sets is a very recent development and almost unheard of 12 or so years ago!
My personal view is that the researchers did not have consent to make the anonymized data set publicly available. The reason is clause 6: it specifies a time limit (20 years) and a type of date (coded data). To me, Coded data would include anything that wasn’t raw data. Also, it’s in a separate clause so takes precedence over data sharing allowances in the DPA (clause 5).
As to the raw data….
It’s curious that clause 6 refers to “coded”, not “raw ” or “all”, data. One could argue that after 20 years, researchers would still have the raw data and could then process it according to the DPA. It’s almost certainly not what the participants expected, but I could imagine an ethics committee considering it “in the public interest ” and allowing data to be available on request.
Some countries’ laws governing such matters allow for de-identified data in publications for samples collected by the researcher him/herself. The key is that the data cannot provide enough information that would allow other people to re-identify the subject. One example I was given by our law counsellor was a particular fingerprint vs. a list which includes gender, BMI, heart rate and blood pressure info. There is no way you would be allowed to submit even a single anonymised full fingerprint, whereas the list would be OK. In the same vein, the whole genome sequence of a single subject cannot be put into a public database, whereas a “consensus” whole genome sequence is (likely) no problem. Usually the line is drawn by the institutional review board based on their (sometimes subjective) interpretation of current research regulations, although the final say lies in the governing body (usually a Ministry of Health or similar department) if there is a public outcry.
As I understand it, subjectivity is a feature, not a bug. Committees have to balance technical and legal issues with public good and needs /expectations of local community where research is being done. They include lay people specifically so different perspectives are represented and debated.