Do publishers add value? Maybe little, suggests preprint study of preprints


Academic publishers argue they add value to manuscripts by coordinating the peer-review process and editing manuscripts — but a new preliminary study suggests otherwise.

The study — which is yet to be peer reviewed — found that papers published in traditional journals don’t change much from their preprint versions, suggesting publishers aren’t having as much of an influence as they claim. However, two experts who reviewed the paper for us said they have some doubts about the methods, as it uses “crude” metrics to compare preprints to final manuscripts, and some preprints get updated over time to include changes from peer-reviewers and the journal.

The paper, posted recently on ArXiv, compared the text in over 12,000 preprint papers published on ArXiv from February 2015 to their corresponding papers published in journals after peer review.

The authors report in their paper, “Comparing published scientific journal articles to their pre-print versions:”

…the text contents of the scientific papers generally changed very little from their pre-print to final published versions.

Specifically, the authors compared the differences in wording in the main bodies, titles and abstracts of preprints to the final versions of manuscripts using several text comparison algorithms.

One key limitation of the study, however, is that most of its sample consisted of physics, math and computer science papers, which are routinely posted on ArXiv before submission to a journal in the field. In a statement to us, the authors added:

We are in the process of expanding our experiments to other disciplines such as Economics and Biology. We do indeed anticipate similar results but at this stage have no hard evidence to support [this] intuition.

Originally, the sample consisted of more than a million papers, but the number soon came down — to 12,666 — after the authors discovered that many papers on ArXiv did not contain the DOIs of their final peer-reviewed versions published in journals. In the paper, the authors — all based at the University of California, Los Angeles — report further limitations of the systems used:

The main reason why this number [12,666] is fairly low is that, at the time of writing, the above mentioned CrossRef API is still in its early stages and only few publishers have agreed to making their articles available for text and data mining via the API.

(Here’s the Crossref API the authors refer to.)

When asked if the role of scholarly publishers is diminishing, the authors — led by first author Martin Klein and last author Todd Grappone — told us:

Obviously, academic publishers play a major role in academia. However, part of the argument of their added value to scholarly communications is the coordination of peer review and the enhancement of the publication’s text. Our study tried to provide empirical indicators to inform the discussion about this value statement.

But comparing differences in overall lengths, character sets or text sizes do not necessarily indicate how much the meaning of the paper — which the authors call “semantic content” — changed from preprint versions of papers to final publication.

Timothy Gowers, a mathematician at the University of Cambridge, UK, who created an ArXiv “overlay” journal Discrete Analysis (which links to papers published on ArXiv and carries out peer review) told us:

The paper uses somewhat crude metrics to measure the similarity of papers. I can imagine a change being made to a mathematics paper that would show up as a very small change with those metrics, but be of crucial importance to the paper.

He added, however:

The only significant role publishers play for mathematics is offering a stamp of approval, a service that can be performed far more cheaply.

Sabine Hossenfelder, a theoretical physicist at the Nordic Institute for Theoretical Physics in Stockholm, Sweden, pointed out that many authors update their ArXiv preprints with final versions of papers, so comparing the two may be a flawed method. She added:

The authors try to fix that problem by pointing out that many of the latest arxiv versions are actually posted before the publication date. This however is misleading: the authors normally know the final version that will be published when it has been accepted — that is often weeks before the paper  actually gets published. Why wait with updating until the journal publishes it?

She went on:

Everybody who has ever dealt with an editor knows that even  small changes to a paper can make a huge difference to it being understandable. I also don’t know why the authors want the reader to believe that changes to the length of the body of the paper are somehow not so relevant. Most of the changes during revision are that referees want explanations added.

Another limitation of the study is the authors’ lack of focus on equations, tables and figures, Hossenfelder noted:

Changing as much as an index in an equation can dramatically increase the value of a paper.

She concluded:

…if I would interpret their data, I would instead read out of this that, yes, indeed…peer review does ‘significantly’ change almost all published papers. But what does that mean for the value of the published paper?

We contacted two publishers for their views on their study’s conclusions. Rebecca Lawrence, managing director of Faculty of 1000, told us that the “key element” that publishers bring to the table is “logistical support” for the peer-review process. She noted:

What would be an interesting comparison would be to look at articles on F1000Research where the whole process of peer review and the new article versions is open, to see how much these articles were changed following peer review.

An Elsevier spokesperson added:

Academic publishers are more important than ever, continually improving the quality, discoverability and utility of an increasing quantity of published scholarly content. Publishers perform a wide range of value added activities to not only the article, but for the journal and broader community as well — some seen, some unseen.

The Elsevier spokesperson referred us to this Scholarly Kitchen article by Kent Anderson entitled “96 things publishers do,” listing the functions publishers perform besides managing peer review, along with copy-editing and formatting. In the introduction, the post notes:

Often, authors are the ones asserting that journal publishers do so little, which is understandable, as authors only experience a small part of the journal publishing process, and care about the editing and formatting bits the most, making those the most memorable. In fact, publishers’ service mentalities often include deliberately limiting the number of things authors have to worry about, which further limits their view of what it actually takes to publish a work and remain viable to publish the next one.

Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our new daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.

18 thoughts on “Do publishers add value? Maybe little, suggests preprint study of preprints”

  1. Ask if the forward progress of your field in science would be impeded in anyway if the communication of results was done solely through pre-prints. My field, physics, definitely would not be.

  2. I would be very surprised if these results held for social sciences, where I have routinely seen large changes between initial submissions and final versions. Methodologically, a comparison between submitted and final versions would have more validity. It is also worth considering that there is a degree of self selection in the decision to post a preprint, which could reflect author confidence in that version.

  3. How many journals were represented, and of what quality? Low-quality journals may not have done proper due-diligence as long as they got money. High-quality journals accept high-quality papers, which should not require much modification anyway. I call selection bias.

    1. Dean,
      We had a total of 381 unique journals represented in the study, most of them published by Elsevier. We did not look into journal quality though.

  4. I am in total agreement with Jonathan, Linda and Dean. I’ve published a couple of chemistry papers with little or no input in content and form from the publisher. However, I’ve noted that for authors who use English as a second language, publishing in English really benefits from the publisher’s value addition.

  5. Value added by publishers (in my experience):

    1. Changing UK-English spelling to US-English inconsistently and for no good reason.

    2. Bothering you about including the publication city when citing books, in an age where most books are published electronically or at least are not printed in any one, single location.

    3. Bothering you about updating the status of “articles in-press” that you cited that nevertheless still remain in-press (and by the very same publisher who apparently can’t check the status of its own articles).

    4. Failing to open your carefully prepared vector graphics figures and converting them to 150 dpi crummy raster-graphics versions instead.

    5. Giving you “48 HOURS TO RESPOND TO THESE AUTHOR QUERIES” after taking 3 months from acceptance to get the proofs out.

  6. Jonathan Tooker
    Ask if the forward progress of your field in science would be impeded in anyway if the communication of results was done solely through pre-prints. My field, physics, definitely would not be.

    My own experience is that the quality of the reviewing has decreased significantly over the 30+ years I’ve been publishing (in physics). With some journals the quality of reviewing is a function of the professional editorial staff; with other journals, it is a function of the editorial board. In either event, the peer review value-added is not what it once was.

    As to the question of progress in physics (and other fields, I suspect): it may be that with peer review as Jonathan has experienced progress would not be impeded if if communication was done solely through preprints; however, my experience with preprints that I retrieve from, e.g., arXiv is that they could stand some critical review and questioning of context, methods, interpretation, and conclusions.

  7. I have not read this carefully, but it does not appear they considered papers that did not pass the review (i.e., did not appear in the journal). Weeding out papers is a service. Of course they do not pay reviewers for this work.

  8. The quality of peer review can vary widely. Those of us who have published extensively will know this, especially when you have experience with many publishers and journals. This emphasizes how fallible peer review actually is, and why we have a retraction crisis, because quality control has not been uniform. That said, the publishers have used free labour in the form of unpaid peer reviewers (in most cases), despite making record profits. There is thus a certain percentage of that peer pool that is not satisfied with these conditions, and feels exploited. Such individuals will contribute to poor and incomplete, even superficial, peer review. Sadly, this results in boosted egos, but poorly vetted literature. And publishers have taken benefit from this. So, to say that publishers contributed nothing is in fact true. Peer reviewers and editors did the contribution. If we removed the publisher from the equation, what remains remains true, i.e., that only the peer reviewers and editors did something, and contributed something of value, ranging from superficial to excellent and profound input.

    So, what is the value of a publisher? Basically, it’s platform, security and marketing prowess. The ability to display work to a wider audience. And that is why the “predatory” journals are gaining because they need not have powerful and personalized data-bases to show-case scientists’ work: they have Google to do the work for them.

    As for this study, I think the methodology is deeply flawed. A paper that goes through open peer review, by adding at first a copy on an open server like ArXiv, will not be handled in the same way if it were to be processed through regular peer review. Although it is likely that few such cases exist, the correct “control” in this analysis should have been the exact same paper submitted through a regular peer review, and the same copy posted to such a preprint server. Then, the final versions should be compared.

  9. I agree with Jaime. The methodology in this study looks extremely suspect, but they are conflating “publishers” with “peer review”.

    *Publishing*, i.e. distribution and visibility, is fully covered by the likes of arXiv and in my subject (physics) there are virtually no published papers that are not also available on arXiv (and updated after peer review). It is very rare that I will use a journal website’s version of a paper from the last 20 years rather than the free & openly available arXiv copy.

    But peer review can be extremely valuable, as everyone who’s participated in the system knows. It’s not *nice* to be told that your work is imperfect, but you swallow your pride and acknowledge that the end result will probably be better for the review. Of course everyone has frustrating peer review experiences — nit-picking on comma positioning or Fig. vs Figure while missing content issues — but they are not the norm, at least in my experience. Peer review need not be organised via a “journal” or “academic publisher”, especially since it is unpaid work — a review “wrapper” on arXiv and the like is quite feasible.

    As the Timothy Gowers quote says, there are much cheaper ways to achieve peer review and stamp-of-approval, and we should be pursuing them rather than propping up the technologically irrelevant dead-tree publishers. But since that stamp of approval is a key metric for research funding, the obvious steps toward better cost efficiency require collective action — a prisoner’s dilemma, as usual.

  10. Ehm, so they are surprised that a preprint deviates very little from the published version? There is a very simple explanation to that: if the paper does not pass peer review, there is nothing to compare. Thus the study is inherently biased towards good papers.

    1. Hi imohacsi,
      Your point that all papers in our study have a version published by a commercial publisher and hence have (likely) passed some sort of peer review is well taken. However, given the fact (as shown in the paper) that 90-95% of papers in our dataset were uploaded to arXiv *before* they were published by a commercial publisher – a lot by 6 months and more – indicates that the influence of a commercial publisher on the paper (for example with peer review and copyediting) is rather small, according to our results.
      I would also argue that passing peer review does not automatically translate to a good quality paper.

  11. Imohacsi, well stated. Another possibility is that “peers” might not want to waste their precious time commenting in an open review platform, unless it is really a study that is very close to their field of study, or of high interest. For example, I have read many open access papers that allow for “peer reviewer” commenting, bu I have never felt the urge to do so, except at PubPeer. By the way, when the authors state that “Originally, the sample consisted of more than a million papers, but the number soon came down — to 12,666 — after the authors discovered that many papers on ArXiv did not contain the DOIs of their final peer-reviewed versions published in journals.”, why was the DOI such an important factor for selection? Or was a million papers simply too much data to handle? I am curious about the excluded approx. 987000 papers.

    1. Hi,
      You raise a good question re DOI and our filtering. We worked under the assumption that a DOI uniquely identifies a paper even though it is well known that the DOI system is not always flawless. At the time we conducted the experiment, held 1.1 million records and about 45% of the papers had a DOI included in their metadata. We then queried the CrossRef API with each DOI to a) identify the “final published” version of the article and b) to obtain its full text. Obviously this access method is based on UCLA’s serial subscriptions. So these 3 filters contributed to the rather bad 1.1m -> 12k ratio.

      We could have used other methods to match arXiv versions to “final published” ones such as the paper title and authors but we haven’t done that yet.

  12. Although this neglects the value of publishers as gatekeepers, as it only uses papers that were ultimately accepted.

  13. Brian, more and more, scientists are becoming concerned with the quality of this “gate-keeping” by publishers. In some cases, even among reputable publishers, there are some journals (see PubMed Commons or PubPeer for examples) where gate-keeping has been deplorable, and now they are having to clean up literature that was accepted with weak or poor peer review. I am not a fan of preprints, although I can see their advantage.

  14. I’m a manuscript editor at a small medical publisher and I can tell you what value I add to the articles that get published: I turn tortured grammar into something resembling coherent English, I fix bad spelling and punctuation, I reformat tables so that they are legible and actually convey some information to the readers, I edit graphics so that everything is labeled and has a unit of measurement (because it’s helpful to know what’s being measured), I renumber those figures and tables when the authors cite them out of order in the text, I correct references using PubMed and Amazon and Google (you wouldn’t believe how many refs. I see with only author names and the article title, and I have to go searching for the journal name, year, volume, and page range), I make sure that all abbreviations have been defined at first use, and finally I get out my trusty calculator and make sure all the percentages are correct, everything adds up, and the numbers are consistent across the article. And what do I get for this? A paltry salary and anger from the authors who think I’m being too picky when I insist that the number of patients in the cohort can’t change between the introduction and the results without some sort of explanation as to what happened to them.

    1. This is interesting, because I think it highlights the differences that exist across fields. In physics, the quality of arXiv manuscripts is usually pretty good when it comes to referencing, citations, etc. because everyone writes their papers in LaTeX and you don’t really have the freedom to get it wrong. And the typographical issues are highlighted by the peer reviewers. So while I understand the value added for your field, even those improvements are more to do with peer review than with the publisher in many areas.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.