A widely reported finding that the risk of divorce increases when wives fall ill — but not when men do — is invalid, thanks to a short string of mistaken coding that negates the original conclusions, published in the March issue of the Journal of Health and Social Behavior.
The paper, “In Sickness and in Health? Physical Illness as a Risk Factor for Marital Dissolution in Later Life,” garnered coverage in many news outlets, including The Washington Post, New York magazine’s The Science of Us blog, The Huffington Post, and the UK’s Daily Mail .
But an error in a single line of the coding that analyzed the data means the conclusions in the paper — and all the news stories about those conclusions — are “more nuanced,” according to first author Amelia Karraker, an assistant professor at Iowa State University.
Karraker — who seems to be handling the case quickly and responsibly — emailed us how she realized the error:
Shortly after the paper was published some colleagues from Bowling Green State, I-Fen Lin and Susan Brown, emailed me and my co-author about our estimate of divorce. They were trying to replicate the paper and couldn’t understand why their estimate was so much lower than ours. I sent them the statistical analysis file, which documents all of the steps as to how we came to all the estimates in the paper. And they pointed out to us, to our horror, that we had miscoded the dependent variable…As soon as we realized we made the mistake, we contacted the editor and told him what was happening, and said we made a mistake, we accept responsibility for it.
Speaking to us on the phone, Karraker added:
People who left the study were actually miscoded as getting divorced.
Using the corrected code, Karraker and her co-author did the analysis again, and found the results stand only when wives develop heart problems, not other illnesses. She said:
What we find in the corrected analysis is we still see evidence that when wives become sick marriages are at an elevated risk of divorce, whereas we don’t see any relationship between divorce and husbands’ illness. We see this in a very specific case, which is in the onset of heart problems. So basically its a more nuanced finding. The finding is not quite as strong.
In the original study, Karraker and her co-author relied on data from 2,701 heterosexual marriages that were included in the Health and Retirement Study at the University of Michigan, which follows 20,000 Americans older than 50. They parsed it with computer code, finding out how many marriages seemed to be felled by one of four serious diseases: cancer, heart disease, stroke and lung disease. They found that marriages were 6% more likely to end if the wife falls seriously ill than if she’s healthy, while the same was not true when the husband fell ill.
Having a mistake pointed out is part of the process of doing science, Karraker said:
The original code will be available online. The code that we used for all the estimates. That’s part of good research practice: somebody should be able to replicate your results. We talked with I-fen Lin and Susan Brown, and really appreciated them raising this with us. While you would never want to discover that you made a mistake, what’s ultimately important is to do good research, and sometimes that requires you to make a correction. We’ve tried to be completely transparent about the mistake that we made and correcting it as quickly and clearly as possible.
We spoke to several others involved in the case, who agreed that Karraker took responsibility and handled the error as smoothly as possible.
When we emailed researchers I-Fen Lin and Susan Brown, they sent a statement explaining why they wanted to replicate the study:
We are conducting research on gray divorce (couples divorce after age 50) using the Health and Retirement Study, the same data set used in Dr. Karraker’s paper. Her published numbers (32% of the sample got divorced) are very different from our estimates (5%), so we contacted her to clarify the discrepancy.
That’s when they emailed Karraker, who sent them the code later that day. They alerted Karraker to the error, and she got in touch with the the journal’s editor, Gilbert Gee “promptly,” Gee told us.
Karraker and Lantham’s paper had a major error in their statistical code that was discovered by another research team. The authors contacted the journal’s office about this error promptly. The authors then reanalyzed their data and submitted a corrected paper. This paper was reviewed by senior members of our editorial board, met our standards of peer review, and will be republished in the September, 2015, issue of JHSB. Although regrettable, mistakes happen to all researchers. In my opinion, Karraker and Lantham met the highest standards of professionalism in correcting their mistake.
According to Gee, the new version of the paper will include a statement from the editor — which he told us he’d send after a few colleagues reviewed it — along with a memo from the researchers about what happened.
For now, here’s the official retraction note in full:
The authors have retracted the article titled “In Sickness and in Health? Physical Illness as a Risk Factor for Marital Dissolution in Later Life,” published in the Journal of Health and Social Behavior (2015, 56(1):59-73). There was a major error in the coding in their dependent variable of marital status. The conclusions of that paper should be considered invalid. A corrected version of the paper will be published in the September 2015 issue of JHSB.
We asked Karraker’s co-author, Indiana University-Purdue University Indianapolis sociologist Kenzie Latham, for a statement. Her take on things was pretty much the same as Karraker’s:
In general, this is an unfortunate mistake that occurred, and we have taken steps to correct our error in a timely and transparent manner. Prior to submitting our paper to the Journal of Health and Social Behavior, we spent quite a bit of time soliciting feedback. We sent copies of our paper to senior scholars to review and we presented our findings at conferences and workshops. The original manuscript went through multiple rounds of peer-review before being accepted for publication.
We also asked if she had any advice for researchers that find themselves in a similar situation:
Errors will occur when conducting research–even when researchers take steps to minimize these types of errors. The most important piece of advice is to be forthcoming with your mistakes and correct them so that the scientific literature can advance.
Here’s the original coding line:
replace event`i’ = 1 if delta_mct`i’ != 0 | spouse_delta_mct`i’ != 0
And here’s the corrected coding:
replace event`i’ = 1 if (delta_mct`i’ != 0 | spouse_delta_mct`i’ != 0) & delta_mct`i’ != . & spouse_delta_mct`i’ != .
The study has not been cited, according to Thomson Scientific’s Web of Knowledge.
Hat tip: Rolf Degen
Like Retraction Watch? Consider supporting our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, and sign up on our homepage for an email every time there’s a new post. Click here to review our Comments Policy.
I fail to understand how a researcher using a program to analyze data does not have a control set for the variables so that calculations that were performed by hand, created to test the program, are not implemented or designed properly. My students would never be allowed to apply a program to a real set of data and draw conclusions without demonstrating with a control data-set that their program does what they say it does! This type of error is almost always completely preventable. It saddens me to see this type of error continually occurring.
As a professional software tester, I cringe with vicarious embarrassment when I read stories like this. The confirmation bias makes me head hurt…
People make mistakes. Always have, always will; no one’s perfect. Wouldn’t it be better to focus on the collegiality that was involved in fixing this mistake, and on Karraker’s retraction, which I think is a class act? Demonizing people for making good-faith mistakes and pointing fingers as if we are infallible only succeeds in shaming people into hiding their mistakes, which isn’t a good idea at all.
I get the benefit of saying this 9 years after your comment, I still have many people quoting the original posted results of the study, so there is actually a lot of lasting damage in the fact that this was allowed to be published.
I would also point a share of the blame on whatever god-awful language that is written in, along with the obfuscated variable names. If it was in something more sane and human-readable (like R), written in compliance with a good style guide, it’s much more likely the error would’ve been spotted. Not only that, the time it would take the researchers to develop and debug the code would drop by an order of magnitude.
It’s stata syntax, it’s a very common analysis package. Honestly much more popular than R and more readable. This does a recode of the (badly named) variable event, coding it 1 if certain circumstances are met. The other two variables are probably change across waves, given the “delta” in the names, but no knowing without more syntax. The `i’ is just a local macro- usually these are used for looping quickly- set it to 1 and then just set it to increase by 1 at a set of commands and keep repeating until it has generated variables for every wave.
At least among data science jobs, R is 10x more popular than stata (https://r4stats.com/articles/popularity/).
Also while I wouldn’t call that code particularly unreadable, anything that starts using internal macros inherently makes things harder to read. R *should* be more readable than this, but in my time in academia I have seen some absolutely god awful R, so it can absolutely be written in a way that is impossible to parse for a newcomer.
That being said, well written R is so readable you shouldn’t need any comments, as long as people understand what the pipe function does.
I would argue they *do* have a control set – but the issue in their code was not caught because their control set didn’t account for people dropping out of the study.
That is, this issue likely didn’t get caught in their control data set because it’s an edge case they didn’t consider.
Kudos for the way this case was handled. Unfortunately, mistakes like these happen and it’s a good argument for publishing not only the article, but the data files (anonymized) and the analysis scripts with each article as well. After all, scientists usually conduct only more than one study and a mistake might be continuously repeated — either because the researcher has a conceptual error in writing the scripts (happens), or because the scripts get copy-pasted for the new data files (there’s something to be said for efficiency).
But I find it a bit strange to speak of “more nuanced finding” here. Sure, they say the risk of divorce increases, not that illness of a wife (now with one specific disease) *causes* the divorce. You couldn’t make a causal interpretation here. But — without having read the article (and I might be wrong here) — looking at different illnesses and homing in on heart diseases just seems like going for a post-hoc explanation. To be more than a (possibly) spurious effect, it would have to be replicated in different data sets. I’m also curious how this effect is ‘explained’. How do “heart diseases” differ from the other kinds of diseases. Sure, we can easily find “explanations” (e.g., heart diseases caused by a different kind of behavior that might also lead to divorces), but it becomes a problem if they are made after you see the results. As you can “explain” everything.
Surely it’s not just ‘efficient’ to copy-paste a script. Provided that (as not in this case!) the original script apparently accurately instantiates a particular algorithm, it should be reused, on the grounds that two different apparently accurate instantiations of a given algorithm may STILL produce different results in some cases (there’s a reason that FORTRAN still exists). At least, that’s my impression of best practices. (But I’m not a scientist.)
Thank you for your comment on “nuance.” Here that word feels Orwellian. I’m baffled by others crediting much to what looks like a last-minute effort to salvage something from an eviscerated study. I appreciate the authors” candor about what happened, but calling it a question of nuance seems misleading.
I am very suspicious when I see *any* analysis where many relationships are checked for significance, and when one is found it is considered significant. Presumably if you check 20 random, uncorrelated relationships, on average one will be significant to 95%. Were statistical corrections done to account for the number of relationships that were checked for significance?
At least they HAD a statistical syntax file that they could send around! Yes, they should have shared it before submitting the manuscript. And this leads me to reiterate the point I’ve made elsewhere: Ideally, you ask ALL of your coauthors (or at least some of them) to carry out the analyses themselves starting more or less from scratch in terms of syntax writing and then you compare results. The same principle that applied to data entry (more than one data enterer => errors will be weeded out by comparing the files) alsop applies to syntax writing.
But you have to give it to this author team: they had a file, and the error was therefore detectable in principle and also in practice. I’m much more concerned about all those papers out there for which there is no comprehensive syntax file, for which the data were perhaps copied and pasted from one EXCEL sheet to another without paying attention to proper alignment of lines, data were recoded…and recoded again, because someone didn’t know that the recoding HAD been done already, variables were aggregated, but not all that should have been in the final index were actually included, and so on and so on. As far as I know no one yet has done any research on how often “analysis approaches” like this are the basis of published papers. My hunch is: more than you’d like. And my recommendation to editors would therefore be: always ask authors to include the full annotated analysis syntax along with the manuscript. If the authors are unable to provide it, make that a desk rejection.
It wouldn’t have prevented this particular case. But this particular case was eventually resolved precisely because the authors had such a comprehensive script. And they did the right thing once they realized that they’d made an error.
In 2006, Chang et al had to retract 3 papers in Science because of a programming error in a home-grown data reduction program used in their the lab.
Chang G, Roth CB, Reyes CL, Pornillos O, Chen YJ, Chen AP. Retraction.
Science. 2006 Dec 22;314(5807):1875. PubMed PMID: 17185584.
can understand this all too well. People are not that good at programming and they often have no real training in grad school about how to approach it. They think it’s just a simple activity like typing that anyone can perform. And the problems are exacerbated when they are using data from a secondary source that they are not familiar with. I have seen many examples of these kinds of problems in published scientific articles.