The week at Retraction Watch featured an exclusive about a prominent heart researcher being dismissed, and a look at signs that a paper’s authorship was bought. Here’s what was happening elsewhere:
- Bad peer reviews: Kingsley Purdam is gathering examples. (Times Higher Education)
- Crimes against science: The Grumpy Geophysicist has some suggestions for an updated list.
- The World Health Organization concludes that researchers at Oxford committed “research ethics misconduct” during a multi-million dollar study on fetal growth. (Kai Kupferschmidt, Science)
- Different U.S. government agencies report misconduct in different ways, which can result in some debarred researchers remaining on faculty. (Jeffrey Mervis, Science)
- China detains five officials accused of tampering with air quality data amidst government attempts at real-time air pollution monitoring. (Yang Fan, Radio Free Asia)
- “It is not possible to establish a direct and unequivocal relationship between citations and scientific merit, as it would be desirable.” So how can we try and normalize citation metrics? (Lilian Nassi-Calò, SciELO blog)
- A new paper argues the lack of reproducibility in psychological science means a higher threshold is needed for what constitutes a scientific discovery. (Journal of the American Statistical Association)
- “A growing number of politicians seem to be getting caught with their diplomas down.” Politicians have a problem with plagiarized theses, say our co-founders in STAT.
- At what point does misreporting clinical trial outcomes become a cause for serious concern? asks Peter Doshi. (The BMJ)
- A citation royalty scheme? Jeffrey Beall has the details.
- “Today, leading research supporting graded exercise therapy, often called GET, is unraveling, and the scientists behind that research, along with the esteemed journals that published their findings, have come under fire for what have been called dubious study methods and a questionable peer review process.” (David Tuller on chronic fatigue syndrome, in Undark)
- Andrew Gelman channels W. H. Auden to describe the state of spin in science.
- “Various bodies, including the Australian Academy of Science, had been “asleep at the wheel” on research misconduct.” A fresh call in Australia for a new national body to manage misconduct complaints following the trial of Caroline Barwood. (Darragh O’Keefe, The Australian, sub req’d)
- Bibliometrics warp science, argue Rinze Benedictus and Frank Miedema in Nature, who say societal relevance must be taken into account
- “The lords of the bedchamber took greater pains than ever to appear holding up a train, although, in reality there was no train to hold.” Yves Gingras argues that measuring productivity is the Emperor’s New Clothes. (Colleen Flaherty, Inside Higher Ed)
- Are academic journal subscriptions becoming obsolete? asks Donald A. Barclay. (The Conversation)
- Donald Trump’s willingness to file lawsuits prevents a report on his history of litigation from being published…because its authors feared a lawsuit. (Adam Liptak, New York Times)
- Who are you? Hindawi’s head of research integrity, Matt Hodgkinson, discusses the problem of ensuring authors and peer reviewers are who they say they are. (Hindawi blog) And see our Q&A with him here.
- “It may be time to consider the predatory publishing phenomenon as an infectious disease characterized by specific and detectable clinical signs.” (Journal of the Neurological Sciences)
- “Review papers have been criticized for not generating new knowledge, which presumably would be much better than simply summarizing existing knowledge.” But Andrew Hendry argues that review papers have many important uses. (Eco-Evo Evo-Eco)
- The FDA launches a new website to allow anyone to submit a claim that a medical device manufacturer may be violating the law, after a scathing report of their current system. (Zachary Brennan, Regulatory Focus)
- Is the best way to raise an individual’s or institution’s impact on the scientific literature to publish less often? (Canadian Association of Radiologists Journal, sub req’d)
- “Chris Chambers is a space alien.” NeuroNeurotic on Registered Reports and Exploratory Reports.
- “10 Simple Rules For Data Storage.” (PLOS Computational Biology)
- “We have removed an article about the history of maths from The Conversation site. The reason for this is that the editing procedures we normally follow were not adhered to in this instance.” The article in question had called for the decolonization of math.
- According to a new study in PLOS ONE, “reviewers with higher levels of self-assessed expertise tended to be harsher in their evaluations.”
- It’s “time to revolt against impact factors,” says Philip Ball. (Chemistry World)
- A new paper provides “an overview of key existing efforts on scientific integrity by federal agencies, foundations, nonprofit organizations, professional societies, and academia from 1989 to April 2016.” (Critical Reviews in Science and Nutrition)
- “For me, the importance of reproducibility is not simply, or even perhaps primarily, its role in checking the integrity of a piece of work,” writes Matthew Stephens, “the first person to have a paper pre-reviewed by Biostatistics for publication.” “Rather, I see reproducing a piece of work as the first step towards building on it.”
- “[T]he growing expectation that one of ‘the fruits’ even of academic research should be patents threatens a full-on “tragedy of the anticommons”, as even materials scientists need are increasingly patented.” (Hilda Bastian, PLOS Blogs)
- What effect do article processing charges have on libraries? Katie Shamash takes a look. (LSE Impact Blog)
Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.
In the matter of the NY Times piece about the American Bar Association the following might also be enlightening…
https://popehat.com/2016/10/26/no-the-aba-did-not-censor-a-story-about-donald-trump-being-a-censorious-asshat/
Matt Hodgkinson is setting the bar very high for Hindawi. Here’s a challenge for Hindawi: can it run a plagiarism check on all previously published papers before its started to use commercial plagiarism detection software?
Various statements and quotations in the articles about research misconduct by Jeffrey Mervis’ articles in Science prompt my comment that the NSF Office of Inspector general website contains significant documents for cases that result in action by the agency, up to and including debarment. In particular, the central report of investigation document is found there. It is redacted as required. However, the report sections that describe what was done, what the apparent intent was, and what the impact was, are certainly of interest to those trying to get an overview of these issues in research misconduct. The main report also includes a summary of the grantees’ inquiry, investigation, and actions. NSF’s actions are almost always directed to the individual, and NSF does not require specific consequences be imposed by the grantee.
Sci-Hub, Libgen and Bookfi update:
https://torrentfreak.com/court-orders-cloudflare-to-identify-pirate-site-operators-161028/
Oops I think they did it again! The compilers of the Weekend Reads have linked to yet another dodgy paper showing some complicated looking statistics, but with inappropriate reasoning and thus inappropriate conclusions about how to fix statistical evidence presented in scientific writings.
I commented last weekend on the faulty lines of reasoning displayed in the David Colquhoun paper linked in the Weekend Reads.
The link this weekend, to an accepted manuscript in the Journal of the American Statistical Association by Johnson, Payne, Wang, Asher and Mandal of Texas A&M (“A new paper argues the lack of reproducibility in psychological science means a higher threshold is needed for what constitutes a scientific discovery”) reads quite similarly, save that these authors employ a Bayesian set of models.
This is informative as one can see here the type of analysis David Colquhoun may have had in mind when he alluded to Bayesian alternatives in his paper.
But whether one employs frequentist or Bayesian mathematical methods, when the philosophical statistical logic is not properly employed, the same faulty findings emerge.
This team reaches a remarkably similar conclusion to Colquhoun, suggesting that the p-value cutoff be switched from 0.05 to 0.001.
Statistical significance alone is insufficient evidence upon which to base a philosophically sound statistical finding of a discovery of scientific relevance. These authors and Colquhoun do not focus sufficiently on the additional essential requirements of determining an effect size of scientific relevance, and a sample size that is likely to detect such an effect size. I see no discussion in this article of how many papers from the psychology field soundly reason about effect sizes of scientific merit, nor how many demonstrated that they had sufficient data to reliably detect such meaningful effects.
This problem will not be fixed by merely reducing the size of the p-value deemed to be acceptable. Interpreting p-values is always context-dependent. A single p-value in a phase III clinical trial, with a-priori specification of type I and type II error rates, a description of a meaningful effect size, and demonstration of sufficient sample size, is entirely meaningful. A single p-value from a gene chip yielding data on 20,000 genes should not be interpreted alone, and multiple comparisons methods must be employed to properly assess findings across so many results and maintain overall type I and type II error rates.
The fix to this problem is insisting on a demonstration of what size of an effect has any scientific meaning, and insisting on power calculations that show how much data is needed to reliably detect such a difference of scientific relevance. This is the ugly crux of this problem. We need fewer experiments, of larger sizes, to figure out what is going on scientifically and reach sound conclusions.
Without a clear demonstration of what a scientifically relevant difference is, and that sufficient data was available to detect such a difference in a high percentage of experiments, the presented findings are merely suggestive exploratory findings. Unfortunately journals seeking to return double digit profits have lowered their standards so that more studies can be published, and the result is a slew of studies, exploratory and suggestive only, presented as exciting new discoveries. This has been demonstrably successful for journal profitability, but demonstrably disastrous at yielding a corpus of useful scientific findings.
There is no excuse these days upon which to limit the length of articles. Any journal offering a word and page limited set of articles should also offer unlimited supplementary space on the internet so that all of the elements I describe above and below can be demonstrated fully somewhere without page length or word count limitations. No “Brief Communication” or “Letter” article should stand on its own. Such brevia are essentially the modern abstract, and many journals have failed to provide the resources to allow researchers to present their findings in full, nor demanded that they do so. Given the increased complexity of so many multi-centre collaborative efforts needed to solve more complex scientific phenomena, we must have more space to fully describe scientifically sound findings.
From the article’s conclusions: “More generally, however, editorial policies and funding criteria must adapt to higher standards for discovery. Reviewers must be encouraged to accept manuscripts on the basis of the quality of the experiments conducted, the report of outcome data, and the importance of the hypotheses tested, rather than simply on whether the experimenter was able to generate a test statistic that achieved statistical significance.”
This conclusory statement by the Texas A&M team is a sound one, though this will not be achieved by merely insisting that p-values less than 0.001 become a new threshold.
Higher standards of discovery include a solid argument describing an effect size of scientific relevance, and power calculations showing the minimal sample size needed to reliably detect such differences of scientific relevance. Longer articles are needed to fully describe such efforts. An outline of a properly presented scientific finding will include preliminary experiments which suggested a finding; the use of such preliminary experiments to establish an effect size of scientific relevance, and to establish the sample size needed to reliably detect such a difference (high power); then the final experiment or experiments that clearly show detection of a meaningfully sized effect or repeatedly fail to do so. All attempted experiments should be described, so that an assessment of experiment success rate can be established, and when experiments fail to reliably detect a meaningful difference, such studies should be welcomed somewhere to allow others to see failed efforts so as not to waste more time in those pursuits. When journals or on-line archives and databases begin logging all of these descriptions, we will have a much improved evidence base upon which to sort out scientific phenomena. But merely lowering the “p-value cutoff” is not going to get us there.