Following criticism, PLOS apologizes, clarifies new data policy

plosIn response to “an extraordinary outpouring of discussions on open data and its place in scientific publishing” following a February 24 announcement about a new data policy at PLOS, the publisher has apologized and corrected the record.

The new policy — which was actually first announced on January 23, as we noted here — had led to criticism at the DrugMonkey blog, and a February 26 clarification seemed to do little to convince another critic. (Not all disagreed with the policy, however.)

In particular, there were objections to a section that began with

Data are any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances.

From a blog post published yesterday by biology editorial director Theo Bloom:

We apologize for causing confusion

In the previous post, and also on our site for PLOS ONE Academic Editors, an attempt to simplify our policy did not represent the policy correctly and we sincerely apologize for that and for the confusion it has caused. We are today correcting that post and hoping it provides the clarity many have been seeking. If it doesn’t we’d ask you once again to let us know – here on the blog, by email at [email protected], and via all the usual channels.

Two key things to summarize about the policy are:

  1. The policy does not aim to say anything new about what data types, forms and amounts should be shared.
  2. The policy does aim to make transparent where the data can be found, and says that it shouldn’t be just on the authors’ own hard drive.

Correction

We have struck out the paragraph in the original PLOS ONE blog post headed “What do we mean by data”, as we think it led to much of the confusion. Instead we offer this guidance to authors planning to submit to a PLOS journal.

The post continues with an example of how the policy would work. Here’s the struck-through paragraph:

What do we mean by data?

“Data are any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances.” Examples could include spreadsheets of original measurements (of cells, of fluorescent intensity, of respiratory volume), large datasets such as

next-generation sequence reads, verbatim responses from qualitative studies, software code, or even image files used to create figures. Data should be in the form in which it was originally collected, before summarizing, analyzing or reporting.

The move looks like the right thing to do. The problem seemed to have stemmed from how the policy was communicated, rather than what PLOS actually wanted to accomplish, which is better data sharing. In a time when reproducibility is a growing concern, the latter is a must.

We note that this is not the first time PLOS has run afoul of scientists after overstating — or at least not stating carefully — a new policy. Something similar happened in 2012 when the publisher tried to clarify its retraction policy. That led to a similar walk-back.

25 thoughts on “Following criticism, PLOS apologizes, clarifies new data policy”

  1. The problem with data sharing is that is favors disproportionately big laboratories (big gets bigger model), though the extent of this may depend on the field. Let’s say you are the head of a small laboratory, competing against big laboratories with lots of manpower. One of the valuables you have are your datasets, which you can mine for more than a single paper. If you publish a good study, typically there is residual value in your dataset that you can publish later on. If you share your data, big labs can always parasitize your data andscoop you. Given that there are more small labs than big labs, I don’t see this idea gaining in popularity.

    1. Jerry, I though this was quite a new and extremely important point that I think very few have actually thought about carefully. I have noticed in several journals that alot of supplementary files are added online. Many of these files are raw data sets in Excel, Word, PowerPoint and other simple file formats. Indeed, these datasets would allow for the verification and reproducibility in the hands of a RESPONSIBLE researcher, and that, I believe is the main drive by PLOS, i.e., to increase accountability. The objective is bold, the dream is noble, and there is absolutely nothing wrong with what PLOS is trying to achieve, despite the anti-PLOS crowd out there. BUT, I personally believe that Jerry’s last point is going to be the serious thorn in the side of science, not only of PLOS if this policy is actively pursued. There is a serious, very serious risk that unethical scientists will pick up on those raw data sets and manipulate or abuse them to create new data sets. Just the actual thought of it is scary. Think about it, what is stopping an individual from a poorly stocked lab coming to a high class journal, picking up the raw data that has been set there originally to implement transparency, and then manipulate it to create new data sets that are then sent to other medium or high class journals? I think PLOS and the scientific community had better stop dead in their tracks to re-think the risks, not only the benefits. As I say, this idea hadn’t even crossed my mind until Jerry suggested it. You may think that my ideas are radical, or folly, but please put this into a realistic context. Put aside the excellent journals for one moment, like Nature, or PNAS, or JBC, PLOS, BMC (whatever). Now, pick up any journal in Marsland Press’ selection (http://sciencepub.net/), and see how open access files of raw data may be extremely dangerous. Go ahead, use your imagination, and feel that cold chill creep up your spine…

    2. So, now it will be the responsibility of the authors to make online available huge datasets like tomography data that can easily reach tens of TBs? Or upload the used software, that alone can make a mid level laboratory into cutting edge?
      If this gets pursued, it will actually hurt the big labs that do science from a big budget. Those who will benefit will be the institutes with insane amount of manpower but still low funding (second class institutes in developing countries). If human workforce is cheaper than research it will be quite tempting to re-use the available real data to squeeze more out of it without spending a dime on actual experiments. As the know-how is not uploaded, journals should be prepared for a storm of papers with misinterpreted experiments.

    3. That’s one reason why people publish ambiguous methods descriptions or even leave out a step of the synthesis (I’ve heard people joke about it, call it the special sauce).

      Personally, I think people going for papers like this (hoarding data, not revealing methods) are scamming the system. I think journal should not allow it. If you want the prize (publication), then publish the data and methods. Funding agencies should also require this.

      If it leads to some change in research structures, I don’t care. I already think there are too many researchers and diminishing returns to federal research.

      But I think for the vast majority of researchers, it doesn’t lead to much scooping. (How many people even read your papers now?) I think it just requires the papers to be more buttoned up and less like press releases, more like technical reports. So the whining is more about work.

      1. And note that some fields (e.g. crystallography) have long required archiving of data. and people are used to it. And it has greatly increased the reliability of structures and also been invaluable for looking at multiphase samples, etc. by other researchers.

    4. It’s pretty strange to hear this argument. We have completely forgotten why we are all here. Discovering new things is hard enough, why are we making it harder by withholding data? We are afraid that somebody else will have a better idea? Really?? Embrace it! Put your data online! Let everyone know and, you’ll see, nobody will ever get away without acknowledging that it’s yours.

  2. Promising start to a policy, but now walked back too much.

    PLoS (and every other journal) should require that raw gel images are posted in supplemental results.

    This will take the authors very little time and will cost next to nothing.

    It would eliminate most fraudulent image manipulations, which are the majority of cases that actually see the light of day in ORI sanctions.

    Other data such as large metagenomic datasets are already usually archived. PLoS should simply have a policy requiring this.

    TB of data from a monkey behavior study – OK a bit much. However, raw gel images and metagenomics datasets are not very invasive requirements that PLoS and other journals should codify immediately.

  3. On one hand, PLOS say they’ll take new measures to ensure transparency, reproducibility,etc., they publish this kind of articles claiming that most of the published papers are not true, and other ethical stuff. On the other hand, Plosone continues to publish a lot of crap, the editors are never helpful to provide or ask the authors for the raw data, and is extremely unfrequent any retraction.
    This kind of pro-transparency claimings by Plos are always for the future publications, they do not want to look back and correct what they have done…

    1. Could you please quantify this “a lot of crap” that you feel has been published by this journal. Please list very specifically the papers you are referring to.

        1. Well, if you look for this “other” journal on Pubpeer (in the way you did to PloS One) you’ll also have tons of good examples of “crap”, so your argument is not valid: https://pubpeer.com/search?q=nature&sessionid=590A3164E4DC9C0E4E43&adv=none

          However, I do agree there are loads of crap there, if we consider “crap” something “not really accurate neither well written”. Crap can be many things. Really.

          And if you know people considering PloS One TOP in our beloved country you are very unfortunate. I don’t know these guys. PloS One is often a last, still honest, resort after failing miserably on the IFs 5-10…

      1. JATdS, I could give you many examples myself, mostly related to the (ab)use of fluorescence spectroscopy to analyze the binding of ligands to proteins (often serum albumin). However, I am tired of the fight (I commented on several papers – often with underwhelming responses if any, from authors and handling editors alike) and the search system of PLoS One is not good enough for me to find those papers without having to wade through hundreds of irrelevant papers.

        For one example, though, see my comments on
        http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0028361

        That double log equation I comment on (equation 3) is notably highly popular. The attentive reader will notice some similarity with an equation in this Wikipedia article
        http://en.wikipedia.org/wiki/Hill_equation_(biochemistry)

    2. In my response to the referees at plos one I specifically said I would be more than happy to provide *all* my raw data. Quite frankly I could use other eyes on this data; esp with the clinical implications.

  4. This walk back is sort of amazing, really, as there were only about ten people making a big fuss online, and of those only drugmonkey was really forceful. Dozens of others thought the policy as a whole made sense.

    1. As someone who hadn’t really noticed this ferment on the internet (I had just thought “Oh good they’ve learnt from their mistakes!” upon reading that Plos would want original data archived) this negative reaction to making more data available to our fellow scientists is all rather astonishing. Of course we do all come from different fields and research traditions and now that David Hauser is eliminated from monkey misbehaviour, DrugMonkey’s scientific reading is entirely safe from abusive science. Secure in his research cocoon, it is perhaps understandable that Drugmonkey doesn’t seem to be a fan of post publication peer review either. What a waste of time that would be in his perfect world.

      A twitter comment:

      “PubMED Commenting sure gives the #OA #waccaloons a put-up-or-shut-up moment. Here’s your post-pub-review opportunity big talkers”

      And blogging about “OpenSciencePostPubReview”:

      “The Open-Everything, RetractionWatch, ReplicationEleventy, PeerReviewFailz, etc acolytes of various strains would have us believe that this is the way to save all of science.”

      Jeepers – RetractionWatch not saving the world!? Does the monkey mean us? Well that might well be one way to make the #waccaloons of the world unite.

      No doubt we all think it is fine if Drugmonkey does not want to look at any data other than his own. That is his choice. However, one might argue that to apply this narrow view to e.g. stem cell research (you all know what I am talking about today) is very unfair to any honest souls who may be plying their trade in that most dirty of games.

      I’m personally 100% behind Plos when they decide they want to archive more data. More information is good.

      Well actually only 99% behind Plos – because they have a track record of not talking to scientists when they make self-important decisions, as when they unilaterally decide to retract papers in high profile fashion

      http://blogs.plos.org/speakingofmedicine/2012/09/25/the-role-of-retractions-in-correcting-the-scientific-literature/

      Well actually only 98% behind Plos – because they don’t allow critical comments on their publications and they don’t know how to deal with deeply flawed papers when presented with strong evidence and don’t know how to communicate with the messengers of bad news.

      http://www.psblab.org/?p=268

      Well actually only 97% behind Plos because, in the present context, they don’t appear to be working with other publishers about data archiving. Surely they are not trying to pull a fast one? Currying favour with rank and file research grunts? Dear Plos – if the system is systemically broken, you probably cannot fix it by attempting to out-compete the closed wall publishers. They will have to move too, you numpties! Many of us drugmonkeyed #waccaloons – I am one of them – do hugely value open access hence the tag #OA – but still…

      It is very difficult for journal editors to have proper discussions with working scientists. Most times when editors think they are having a discussion, scientists are on their “best behaviour” e.g. at meetings where publishers have stands, because they always have their own future submissions in mind. Though they could be on their worst behaviour because right now they are pissed off about the latest rejection. It is easy for journal editors to fool themselves into thinking they have understood issues like data archiving. But if you editors were to form some working groups to bring in motivated scientists to discuss data archiving, then (a) they will work towards that goal with you and (b) if there is criticism of the plan (as there will be by mini-drugmonkeys in any field), the journal can pass blame on to the scientific panel instead of editors having to look personally gormless yet again. And some of those criticisms might be valid but yet, if there is a standing panel of scientists, there can be continual reasoned review of archiving requirements.

      We are in a miserable period for active scientists whose work is hampered by massed ranks of jobsworths inhibiting from all sides. There is the potential for a new golden age where publishing good data and having it archived will benefit the serious scientist and hamper the charlatan. Journals would have no choice but to become a part of that and decide what they can usefully archive to add value to their trending status as “dumb pipes” of information exchange. For what the journals’ limited business models cannot handle, the procedure is already established for storing large data sets of genomic, proteomic, structural information in guberment(sic)-funded open access databases. Storage of large data sets of imaging data is urgently required too. Who amongst you will forbid the publishers and the computational resources to work together to bring about a new era in biomedical research?

      Monkey see, monkey do..

        1. At a quick look this seems to be a decent initiative. Scientific advisory panel in place. Let’s see if it takes off.

          While on the topic of NPG, it would be nice if Nature followed the practice of Nature Cell Biology and some other of its journals in placing “raw” gel images and microscopy fields in the supplement. Just possibly that would have been an awkward and protective hurdle to leap in the STAP cell case where images were used that were three years old and generated at a different institute.

          http://blogs.nature.com/news/2014/03/call-for-acid-bath-stem-cell-paper-to-be-retracted.html

    2. There was a lot of objection on the PLOS ONE internal editor’s forum as well. Or maybe, a lot of “concern.” Since the paid editorial staff are mainly production people who lack subject area experience, the burden of checking and validating the archives is going to fall on the volunteer academic editors. The idea that study data should be replicated somewhere other than the researcher’s personal hard drive, and should be accessible for re-analysis and replication, is good and noble. But I suspect it will be just as spotty in actual implementation as P1’s current editorial process on reviewing and accepting manuscripts.

  5. It could be that making a single data sharing policy to cover all of science including the social sciences, that is anything more than a vague “sharing is good” statement, isn’t really possible.

  6. It’s a complicated situation, one commenter pointed out the problem for smaller labs having their information used and thus losing out on value, but at the same time the lack of open information is costing lives

    1. “lack of open information is costing lives”. Typical mission creep. What applies to clinical trials does not apply to the majority of research. So, why should clinical trial criteria be used in other domains?

  7. I think they should’ve just scrapped the entire policy until they could sort out a coherent, workable solution that is appropriate for the wide variety of fields represented in PLOS journals. That policy, as written, was disastrously vague, with no plan for how it would be interpreted or enforced on a case by case basis or for different types of data and analysis. As a P1 AE, I certainly have no intention of adding to the volunteer work I do by spending the many hours of due diligence it would take to determine if an author has met the policy or attempting to mediate when someone claims they haven’t.

    I honestly think that because of where PLOS comes from as an organization, they only considered a very narrow set of kinds of data when crafting the policy (primarily genomics), and sort of willy-nilly thought it would be fine to apply the norms and expectations of that small subdiscipline to 1) Fields who aren’t asking for it (often for valid scientific or practical reasons, believe it or not), 2) Types of data for which there are no agreed upon, or indeed useful, ways to share in the way the policy demanded.

  8. “Something similar happened in 2012 when the publisher (Ginny Barbour) tried to clarify its retraction policy. That led to a similar walk-back.”

    I’m sorry but that is simply not true. Barbour did not walk back a nanometer, she just tried to spin it differently. She continues to insist that she as editor has the right to unilaterally retract anything she thinks is wrong, peer review be damned. It is misleading to suggest otherwise.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.