Language of a liar named Stapel: Can word choice be used to identify scientific fraud?

stapel_npcA pair of Cornell researchers have analyzed the works of fraudster Diederik Stapel and found linguistic tics that stand out in his fabricated articles.

David Markowitz and Jeffrey Hancock looked at 49 of the Dutch social psychologist’s papers — 24 of which included falsified data. (Stapel has lost 54 papers so far.)

According to the abstract for the article, “Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel,” which appeared in PLoS ONE:

When scientists report false data, does their writing style reflect their deception? In this study, we investigated the linguistic patterns of fraudulent (N = 24; 170,008 words) and genuine publications (N = 25; 189,705 words) first-authored by social psychologist Diederik Stapel. The analysis revealed that Stapel’s fraudulent papers contained linguistic changes in science-related discourse dimensions, including more terms pertaining to methods, investigation, and certainty than his genuine papers. His writing style also matched patterns in other deceptive language, including fewer adjectives in fraudulent publications relative to genuine publications. Using differences in language dimensions we were able to classify Stapel’s publications with above chance accuracy. Beyond these discourse dimensions, Stapel included fewer co-authors when reporting fake data than genuine data, although other evidentiary claims (e.g., number of references and experiments) did not differ across the two article types. This research supports recent findings that language cues vary systematically with deception, and that deception can be revealed in fraudulent scientific discourse.

In more detail:

Liars have difficulty approximating the appropriate frequency of linguistic dimensions for a given genre, such as the rate of spatial details in fake hotel reviews [8], the frequency of positive self-descriptions in deceptive online dating profiles [10], or the proportion of extreme positive emotions in false statements from corporate CEOs [11]. Here we investigated the frequency distributions for linguistic dimensions related to the scientific genre across the fake and genuine reports, including words related to causality (e.g., determine, impact), scientific methods (e.g., pattern, procedure), investigations (e.g., feedback, assess), and terms related to scientific reasoning (e.g., interpret, infer). We also considered language features used in describing scientific phenomena, such as quantities (e.g., multiple, enough), terms expressing the degree of relative differences (e.g., amplifiers and diminishers) and words related to certainty (e.g., explicit, certain, definite).

We were also interested in whether the fake reports contained patterns associated with deception in other contexts.

To probe Stapel’s studies, Markowitz and Hancock:

applied a corpus analytic method using Wmatrix [19], [20], an approach that is commonly used for corpus comparisons (e.g., [21], [22]). Wmatrix is a tool that provides standard corpus linguistics analytics, including word frequency lists and analyses of major grammatical categories and semantic domains. Wmatrix tags parts of speech (e.g., adjectives, nouns) in relation to other words within the context of a sentence (e.g., the word “store” can take the noun form as a retail establishment or a verb, as the act of supplying an object for future use).

You can see a table of Stapel’s word choices here.

But the Cornell researchers expression caution about the obvious leap here — using linguistic tools to probe manuscripts for evidence of fraud before they’re published:

… [I]t is tempting to consider linguistic analysis as a forensic tool for identifying fraudulent science. This does not seem feasible, at least for now, for several reasons. First, nearly thirty percent of Stapel’s publications would be misclassified, with 28% of the articles incorrectly classified as fraudulent while 29% of the fraudulent articles would be missed. Second, this analysis is based only on Stapel’s research program and it is unclear how models based on his discourse style would generalize to other authors or to other disciplines.

9 thoughts on “Language of a liar named Stapel: Can word choice be used to identify scientific fraud?”

  1. This is an interesting analysis of a unique dataset. As a practicing “data scientist” who works on text mining, I would also caution (as the authors already do) about hoping for a “silver bullet”-type solution based on such work. As we all know, prediction is quite difficult – even in this case of a balanced gold standard set, accuracy is encouraging, but has a long way to go – let alone in the more realistic case of a wildly unbalanced test set, on a different domain, etc. Some additional comments on this blog (in Dutch, via Twitter):

    Note there is a newer 2014 work by same authors:

    From the abstract on the conference site (didn’t have time to ask for/read paper yet) it seems that when executing a similar type of analysis on additional data, fraudulent papers may have lower “readability scores” and less “concrete language” than genuine ones. Again, authors seem to cautiously emphasize the descriptive rather than predictive nature of the work.

    In my humble opinion, nothing will beat U. Simonsohn’s recommendation to “just post” the data. And many fields are definitely moving in that direction.

  2. Reblogged this on fragmentedvision and commented:
    Do journals have the same tool these researchers used? As long as the false-positive rate is not too high, the journals can use this as a screening method for evaluation of manuscripts prior to sending it out for peer-review?

    1. The results (and features) in the paper above are specific to the work of one researcher (and even in that case, the rate of false positives is high). However, if the follow-up 2014 work shows “readability scores” (for which there are standard measures based on outputs of standard tools) may be a signal of problematic work, a journal could easily compute such a score. The tools used by the featured researchers are well-known tools from previous work (they have significant limitations I won’t go into here though :)).

      A journal may automatically score a submission and have a separate mechanism for handling very low readability ones – the nice thing is that no one needs to be unnecessarily accused of fraud, as an unclear, unreadable submission clearly needs additional scrutiny regardless.

  3. Unsurprising that the usual data-dependent cherry picking is the basis for digging out differences post hoc. Ironically, they resort to data-dredging while they overlook a blatant signal Stapel sent.

  4. In Stapel’s autobiography he claims that he hated inventing data, became nervous whenever he had to do it, and as a result, did it in a hurry.

    This might also apply to when he was writing up papers based on made-up data. It would be interesting to check for the rate of ‘typos’ in the original manuscripts (the final papers will have been copyedited.)

    1. I don’t think so Neuroskeptic. As a psychologist I think because Stapels intention was to get more famous and admired he had to submit manuscripts properly, no matter if the data was real or not.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.