An article from 2019 that caught some media buzz – including from the New York Times – for its analysis of political speeches now bears an expression of concern that’s almost as long as the original paper.
In “Liberals lecture, conservatives communicate: Analyzing complexity and ideology in 381,609 political speeches,” published in PLOS ONE, the authors concluded that “speakers from culturally liberal parties use more complex language than speakers from culturally conservative parties,” as they stated in their abstract.
But after reading the article, linguist Joe McVeigh, a university teacher at the University of Jyväskylä in Finland, wrote an online comment on the article detailing “several fundamental and critical flaws in its methodology.” A key issue: applying the Flesh-Kincaid test, which was developed for assessing the readability of a written text, to political speeches. As McVeigh told us:
The F-K test is an outdated and inefficient test that was developed for a different style of written language than the one the authors apply it to. It should not be applied to spoken language.
McVeigh and the paper’s authors, who are affiliated with the University of Amsterdam in the Netherlands and University College Dublin in Ireland, went back and forth in the comments section a few times, with McVeigh’s last comment from the initial exchange dated April 9, 2019.
Three and a half years later, PLOS ONE’s editors published a lengthy expression of concern that began:
After publication of this article [1], concerns were raised about the methods and conclusions. PLOS ONE reassessed the article with input from members of the journal’s Editorial Board who have expertise in linguistics and social psychology research.
The Academic Editors with expertise in linguistics research advised that the Flesch-Kincaid scoring method used in the study does not meet community standards for linguistic analysis, and that this method was not appropriate or sufficient to address this study’s aims and support conclusions about linguistic complexity.
The bulk of the note consists of additional information and analyses by the authors responding to the concerns. They maintained that “our analyses show that the results we present in our article are valid and reliable.”
A linguistics expert told PLOS ONE’s editors that the authors’ additional analyses “suffice to lend support for the reliability of the study’s results,” they wrote in the introduction to the note.
However, the expert:
also advised that the R-values obtained in the validation analyses (in the interval of [0.59, 0.76], corresponding to R-squared values of 34.81–57.8%) are not indicative of a robust validation outcome, and that the concerns about using the Flesch-Kincaid method remain, even considering the new analyses: better methods and tools were available and should have been used given the study’s objectives.
The PLOS ONE editors ended by writing that the original paper’s conclusions “were overstated in light of the study’s limitations,” and revised them to the following:
Our results suggest that speakers from culturally liberal parties use more complex language than speakers from culturally conservative parties, and that economic left-right differences are not systematically linked to linguistic complexity. Further studies—for example including subgroup analyses and additional complexity measures—are needed to confirm and verify these findings.
First and corresponding author Martijn Schoonvelde of University College Dublin told Retraction Watch that he and his coauthors agreed “that our conclusions should be corroborated using other data and other measures but this applies to most empirical studies.”
He also said:
We are disappointed about the expression of concern since we think it is a one-sided response by the PLOS ONE editors to a disciplinary disagreement over a measure. This outcome stifles discussion between disciplines, it does not stimulate it. Nevertheless, we hope that this exchange motivates other researchers in social science and (computational) linguistics to work together on questions related to complexity or comprehensibility of the rhetoric of politicians because it is such an important aspect of political representation and deserves further study.
We asked David Knutson, a PLOS spokesperson, what led to the expression of concern, and he pointed us to the comments on the article. He added:
The authors engaged in a discussion with the reader via the Comment forum but given the nature of the concerns PLOS ONE also investigated the matter.
McVeigh sent us point-by-point comments on the expression of concern, which are available here. He concluded:
Had the authors used modern linguistic methods, instead of an outdated and ineffectual test, their article would not be facing an Expression of Concern or a potential retraction.
Language scholars realized the severe limitations of the Flesch-Kincaid test decades ago, and political scientists should catch up. As soon as they do, they will realize this study is fundamentally flawed.
Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on Twitter, like us on Facebook, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].
I specialize in sex, class, cultural, and civilizational differences in strategy, institutions, argument and lying. The bit about lying in particular is an even more taboo subject that sex, class, and race differences in phenotype, personality and intelligence. And no I didn’t choose this specialization. I chose logic, economics, and law. And in pursuit of law I ended, as many in history do, with these fundamental questions.
The authors of the paper are correct. Here’s a brief summary of why.
(a) The female brain (feminine mind) is biased laterally (prey) for empathizing(more, slower, in-time – people), and the male brain (masculine mind) is biased longitudinally (predator) for systematizing (less, faster, and over time – space). This results in a cognitive division of labor in population over time, that mirrors the sex differences in reproductive responsibility between the female (offspring, consumption, short term resources) and male (brothers/cousins, territory, production, long term resources) because of differences in resource fragility in the short(offspring) vs long term (pack, tribe)
(b) Feminine minds rely on storytelling (classically “feels”) that allow loading, framing, and obscuring (value) , and masculine minds rely on testifying (classically “facts” or “reals”) that limit loading, framing, and obscuring (value) to facilitate systematizing (prediction over time at scale) to calculate internally and communicate externally.
(c) All humans ‘grow’ these neural pathways and biases producing brain organizatino in utero, and these biases continue throughout life. We find very little malleability in the bias even if we may mediate its influence with training. We have succeeded in mediating some biases through repetition. We call these mediations skills. We have not developed the science of truth, deceit, and sex class and culture differences in applying these biases until now, so we have not mediated these instincts, with training despite scaling our population, cooperation (economies, polities), communication, and institutions.
(d) For complex reasons I can’t reduce to a paragraph here, there exist only three axis (means) of human influence (influence persuasion coercion war): 1. Feminine socal superpredation by seduction(care) to ostracization (canceling), 2. Neutral Reciprocity (cooperation-trade to evasion-boycott), and 3. Masculine political superpredation by force by defense to offense. All influence exists somewhere on that triangle. (Oddly, only european males use trifunctionalism (three institutions of independent feminine religion, neutral judiciary, and masculine state), and women and all other surviving civilizations strive toward producing decision making by monopoly instead. This is why europeans invented what we call democracy, which is the antithesis (prohibition on) discretionary (positive) authority, leaving only decidable (negative) authority.) Trufunctioanlism is the only means of preventing authority(command) while preserving decidability(dispute resolution).
(e) The left, progressive(consumption), maternal, infantilizing, limiting responsibility and accountability, and via-positiva authority (do, act together) is an expression of the female instinctual bias. The right (capitalizing), paternal, ‘adulting’, maximizing responsibility and accountability, and via-negativa authority (dont, act with agency) is an expression of the male bias.
(f) My understanding is that the economic and political influence of the feminine mind has been facilitated by the false promise of infinite growth due to the industrial revolution, the marxist sequence’s repetition of the abrahamic religion’s undermining of the ancient world, the capture of the academy by this the reduction of the requirement for responsibiilty and accountability, and our failure to maintain the equal prohibition on the female means of warfare that we evolved to suppress the male means of warfare. And this is the origin of the present conflict, that can only be satisfied by conquest, restoration of reciprocity and responsibility, or separation. And separation will lead to restoration of pre-agrarian trends toward speciation.
(g) The existence of the authors of the paper are a evidence of the emergence of awareness of this set of problems as the cause of present frictions: the ability to use female warfare (social) at scale made possible by mass media, repeating the success of the axial age religions to control the emergent military aristocracies, which in turn repeated the success of females rallying betas to suppress alphas, so that they could control their consumption and reproduction.
Wow am I glad this guy’s not on my side
Same
Do not Google his name, that’s an unhealthy rabbit hole.
(h) The criticism of the author’s method does not indicate that the findings are false. Instead that he inadequately explained the causal mechanism between the loading framing and overloading of feminine and liberal speech signaling vs conservative testimonial speech signaling. And the F/K technique is adequate if not sufficient for contributing to the discourse.
It seems to me like the very title is using pretty loaded language. Using “less complex” language doesn’t necessarily correlate to “communication”, nor does it rule out “lecturing”, imo.
You are correct. I mean, it’s common for academic papers to have catchy titles before a colon and then a descriptive title, which is probably what’s happening here.
But you’re right that complexity in language is, er, complex 🙂 There are reasons to say that communication between friends is more complex than a lecture.
McVeigh complains about the measure, but doesn’t suggest an alternative or provide evidence that this would have a discernible impact on the main claims of the article. Is income a perfect measure of wealth? Is partisanship a perfect measure of one’s political leanings? No. However, according to theories of measurement, as long as an indicator is correlated with the latent variable of interest and there isn’t any systematic measurement error, imperfect measures can still tell us something about theoretical constructs. It’s not enough to say a measure is poor. You have to show why estimated differences are exaggerated due to systematic errors. In what way does the F-K scale exaggerate linguistic differences between liberals and conservatives? What is the source of the systematic bias?
I think you’re also missing the main point. Let’s use the analogy you used. Income and wealth are related in some way but measuring a person’s income is an imperfect way of measuring their wealth. But as I tried to show in my comments, the F-K test isn’t even good enough to be an imperfect measure of complexity. Using the F-K test to measure linguistic complexity is like saying that Dutch people are better than Americans at basketball because on average Dutch people are taller than Americans. The authors have a crude definition of language complexity and they try to answer it with a crude method.
I intentionally kept my comments to Retraction Watch short. I had already pointed out the inefficiency of the F-K test in my comments on the article itself several years ago. I pointed the authors to more reliable ways of analyzing spoken language, including Brazil (1995) and Schiffrin (1994). These works are decades old, as are other ways of looking at linguistic complexity (such as Biber 1988 and Halliday 1989). These works are well known by linguists and discourse analysts. It’s not about the results, but about the method. Maybe the authors’ results would be corroborated with a better method, but this just means that they found the answer by accident, not by design.
What this unnecessarily long and sadly unconstructive feud reveals is, I think, the gap between different disciplines which increasingly come to investigate the same objects but with different concepts, preferred tools/methods, and practices. Political scientists’ interest in text is now well established, with studies usually done with computational methods on big corpora (besides qualitative work on smaller ones). These approaches are currently booming, bypassing and at times clashing with research in linguistics – whose researchers would therefore inevitably seek to regain they scientific capital in the field. The disagreements here are therefore not just that, they are 1) mutual misunderstandings on how to conduct research, on what corpora, with what tools, to reveal what, etc., and 2) a case of scientific capital claims and counter-claims at a moment two previously impermeable fields are coalescing. A constructive outcome is therefore, unfortunately and sadly, unlikely.
From the original US Navy report by Kincaid, et al (available on Google Scholar).
“This scaled reading grade level is based on Navy personnel reading Navy training material and comprehending it. Thus, the three recalculated formulas (derived using multiple regression techniques) are specifically for Navy use.”
“2. Count the number of sentences. Count as sentences each unit of thought that can be considered grammatically independent of another sentence or clause. A period, question mark, exclamation point, semi-colon, and colon usually denote independent clauses. Sentence fragments and incomplete sentences are counted as a sentence. “
Too often, the original report has been cited by researchers who haven’t actually read it.