JAMA issues mega-correction for data breach letter due to “wording and data errors”

s_cover_jcv062315A JAMA letter published in April on data breaches accidentally included some data that shouldn’t have been published, either — specifically, “wording and data errors” that affected five sentences and more than 10 entries in a table. One result — a reported increase in breaches over time — also went from statistically significant to “borderline” significant, according to the first author. (So yeah, this post earns our “mega correction” category.)

According to an author, an “older version” of a table made it into the letter, “Data Breaches of Protected Health Information in the United States,” which was corrected in the journal’s June 23/30 issue.

The letter and table in question detail 949 breaches of “unencrypted protected health information.”  The letter says the number of breaches has increased from 2010 to 2013; the original article claimed that the P value on that increase was <.001, but the correction says it’s really 0.07. The original says 29.1 million personal records were affected in those breaches; the real number is 29.0. And so on.

For a full comparison with the now-corrected table, here’s an archived version of the original, from April 15, 2105. The correction note details the differences between the two, and a few changes to sentences in the results and discussion sections of the paper.

First author Vincent Liu of Kaiser Permanente Division of Research, Oakland, California, briefly explained to us how they handled the mistake:

The corrections resulted from the inclusion of an older version of the Table (from a prior revision) in the final Letter. Once we became aware that the older version was published, we corrected the Table with the editorial staff. The overall study findings remained consistent.

Liu acknowledged that the first data point presented in the table — a supposed increase in the number of data breaches from 2010-2013 — is now no longer statistically significant:

Most of the changes in the Table were minor, for example, related to the confidence intervals; these values then cascaded through the text and required text revision when updated. One p-value went from significant to borderline significant and the corresponding text was revised accordingly.

Here is the correction notice in full, now appended to the online version of the study:

In the Research Letter entitled “Data Breaches of Protected Health Information in the United States” published in the April 14, 2015, issue of JAMA (2015;313[14]:1471-1473. doi:10.1001/jama.2015.2252), there were wording and data errors. In the Results section, first paragraph, the first and second sentences should be “We evaluated 949 breaches affecting 29 million records between 2010 and 2013. Six breaches involved more than 1 million records each and the number of reported breaches increased over time, although the trend using linear regression did not reach statistical significance (P = .07; Table).” In the second paragraph of the Results section, the first sentence should be “Most breaches occurred via electronic media (67.4%; 95% CI, 64.4%-70.4%; Table), frequently involving laptop computers or portable electronic devices (32.7%; 95% CI, 29.7%-35.7%).” In this same paragraph, the penultimate sentence should be “The combined frequency of breaches resulting from hacking and unauthorized access or disclosure increased during the study period (12.1% in 2010 to 27.2% in 2013; P = .003).” In the Discussion section, first paragraph, the first sentence should be “Between 2010 and 2013, data breaches reported by HIPAA-covered entities involved 29 million records.” In the Table, second column entitled “Overall,” several numbers should be changed as follows: row 2, “Total No. of records affected, in millions,” should be “29.0”; row 6, “Desktop, email, or EMR,” should be “148 (15.6) [13.4-18.0]”; row 7, “Paper,” should be “212 (22.3) [19.8-25.1]”; row 8, “Network server,” should be “101 (10.6) [8.8-12.8]”; row 9, “Other,” should be “178 (18.8) [16.4-21.4]”; row 12, “Loss or improper disposal,” should be “105 (11.1) [9.2-13.2]”; row 13, “Unauthorized access or disclosure,” should be “140 (14.8) [12.6-17.2]”; row 14, “Hacking or IT incident,” should be “67 (7.1) [5.6-8.9]”; and row 15, “Other,” should be “85 (9.0) [7.3-11.0].” Also in the Table, last column, the following P values should be changed: row 1, “Total No. of data breaches reported” should be “.07”; and row 2, “Total No. of records affected, in millions” should be “.88.” Farther down in this same column, the Pvalue for “Data breach category, No. (%) [95% CI]” for the data in rows 11-15 should be “.003.” This article was corrected online

We contacted the JAMA editor listed on the article, Jody W. Zylke, and will update if we hear back.

Hat tip: Douglas Main

