The camel doesn’t have two humps: Programming “aptitude test” canned for overzealous conclusion

Photo via Benutzerin:BS Thurner Hof

From Larry Summers to James Watson, certain scientists have a long and questionable tradition of using “data” to make claims about intelligence and aptitude.

So it’s no surprise that, when well-known computer scientist Richard Bornat claimed his PhD student had created a test to separate people who would succeed at programming versus those who didn’t, people happily embraced it. After all, it’s much easier to say there’s a large population that will just never get it, instead of re-examining your teaching methods.

The paper, called “The camel has two humps,” suggested instead of a bell curve, programming success rates look more like a two-humped ungulate: the kids who get it, and the kids who never will.

Though the paper was never formally published, it made the rounds pretty extensively. Now, Bornat has published a retraction, stating that he wrote the article during an antidepressant-driven mania that also earned him a suspension from his university. Here’s the meat of the notice:

It’s not enough to summarise the scientific result, because I wrote and web-circulated “The camel has two humps” in 2006. That document was very misleading and, in the way of web documents, it continues to mislead to this day. I need to make an explicit retraction of what it claimed. Dehnadi didn’t discover a programming aptitude test. He didn’t find a way of dividing programming sheep from non-programming goats. We hadn’t shown that nature trumps nurture. Just a phenomenon and a prediction.

Though it’s embarrassing, I feel it’s necessary to explain how and why I came to write “The camel has two humps” and its part-retraction in (Bornat et al., 2008). It’s in part a mental health story. In autumn 2005 I became clinically depressed. My physician put me on the then-standard treatment for depression, an SSRI. But she wasn’t aware that for some people an SSRI doesn’t gently treat depression, it puts them on the ceiling. I took the SSRI for three months, by which time I was grandiose, extremely self-righteous and very combative – myself turned up to one hundred and eleven. I did a number of very silly things whilst on the SSRI and some more in the immediate aftermath, amongst them writing “The camel has two humps”. I’m fairly sure that I believed, at the time, that there were people who couldn’t learn to program and that Dehnadi had proved it. Perhaps I wanted to believe it because it would explain why I’d so often failed to teach them. The paper doesn’t exactly make that claim, but it comes pretty close. It was an absurd claim because I didn’t have the extraordinary evidence needed to support it. I no longer believe it’s true.

I also claimed, in an email to PPIG, that Dehnadi had discovered a “100% accurate” aptitude test (that claim is quoted in (Caspersen et al., 2007)). It’s notable evidence of my level of derangement: it was a palpably false claim, as Dehnadi’s data at the time showed.

We caught up with Bornat via an emailed Q&A:

1) Why now?

I presented our latest results about 18 months ago at a PPIG workshop/conference in the UK. I felt it was helpful, since the claims I made had provoked hostility to the work, to retract those claims verbally. It had a dramatic effect, to the good. But I found (how I know is the confidential bit) that there are people who didn’t hear that retraction, and who are still hostile; and that hostility is doing harm. So I decided to retract more publicly.

Interestingly, one person who I would have counted previously as hostile heard (indirectly) of the verbal retraction, and this summer was more than supportive. Research inspired by our work is going forward. So the retraction was worthwhile.

2) Do you think there’s validity to the idea of or the search for a programming aptitude test?

In the long term, perhaps. I’m wary of the notion of ‘aptitude’. I’d rather understand how people do (and don’t) learn.

3) Was this paper ever published in a peer reviewed journal?

The retracted ‘paper’ never was. Neither was any of the other work (one paper was in a peer-reviewed conference; the rest were in lightly-reviewed workshops). But computer science doesn’t use journals the way other disciplines do: we’re more convinced by peer-reviewed conferences. However, watch this space.

4) Are you concerned about the effect your paper may have had on computer science education?

Yes.

5) Any other comments regarding the retraction?

I didn’t claim in that ‘paper’ any sex differences in learning programming. One statistician claimed to have seen sex-related effects in our data. I trumpeted that (verbally). Wish I hadn’t: friendship is hard to rebuild.

I wish it were easier to retract … it’s painful, and it’s ineffective.

We also got in touch via email with computing pioneer Alan Kay, who had previously commented on “The camel has two humps,” to get his opinion on the retraction, and the idea of a programming aptitude test in general:

 Under the rubric of “coding”, there are now groundswells to have every child learn to program to some extent. This is a very tricky area to critique because the rudiments of “coding” do not necessarily extend into more comprehensive “programming” and into even more comprehensive “systems design”, etc. But, suppose for now we pull back from this enthusiasm to think about “training for a profession” and whether there could be “aptitude tests” to help steer learners and teachers.

Here we come up against the idea of “ability” = talent + will + skill. When I started programming (in the Air Force in the early 60s) there was an aptitude test made up by IBM that all prospective candidates had to take. It was not an easy test, but those who passed it were then able to undergo an intensive week of training (taught by IBM) and come successfully out the other side as a “machine code coder”, and be put into on the job program writing the very next week. Some months later there was another intensive week covering the rudiments of program design, etc.

Looking back on this from the experience of both working as a programmer for a while, and also quite a few years of investigating how to help children learn to use computers as part of “powerful ideas” learning experiences, I posit that the IBM test most likely excluded many potential programmers, and especially many potential *designers*. So it was only useful in predicting one kind of success — for processes that were like building simple dwellings from simple materials: “brick laying and carpentry”.

A simple way to sum up here is to say that “the computer revolution hasn’t happened yet” (not even within the field of computing). So wise people trying to make progress will keep a more open mind about what computing might be about, what it means to teach computing while the field is still being invented, and to be as open and helpful as possible to get many kinds of minds involved in making better forms of computing for the future. And when trying to assess progress in soft areas, to be as tough as possible not to be misled by the kinds of false positives that result when there are many degrees of freedom contributing noise, and too few trials over too short a time to get results that actually reflect what is going on.

 

10 thoughts on “The camel doesn’t have two humps: Programming “aptitude test” canned for overzealous conclusion”

  1. The Camel, Ogden Nash

    The camel has a single hump;
    The dromedary, two;
    Or else the other way around,
    I’m never sure. Are you?

  2. Maybe I’m missing something here but the story seems to be:

    Someone, not in their right mind, argued that a bimodal distribution of a trait is evidence for the innateness of that trait. Lots of other people, who were in their right minds, accepted this argument.

    Personally I’m more worried about the latter people!

    1. I haven’t done any formal studies, but I’ve been involved in computing for a while, and I’ve long thought programming ability seemed more lognormal than regular Gaussian:
      a) Many people can do some programming, especially considering that Excel is form of programming.
      b) There is a long tail to the right, and the really great programmers are spectacularly better than averae or even good programmers.

      This sort of silliness is one of the reasons I had cognitive psychologists mixed into my group of computer scientists. That saved a lot of time.

      1. I don’t see anything wrong with the paper. I’ve experienced similar distribution in most of my courses during university (in physics). Also that some people scored decent in certain topics and being bad in others, but really high scorers were generally always on the top. To me this rather sounds like the distinction between those who take things seriously and those who regard a sober evening as a waste of time. As far as I remember the two groups were quite separated, possibly giving the bimodal distribution in every exact science.

        1. “To me this rather sounds like the distinction between those who take things seriously and those who regard a sober evening as a waste of time.”

          Well said.

        2. “I don’t see anything wrong with the paper.”

          The problem with the paper is that no-one else was able to reproduce those results under scientific conditions. If it’s not repeatable then it’s not really science…

          1. What do you mean under scientific conditions? Should every teacher write a separate paper about the distribution of each exam? Or make the “One Universal Standard Exam”? During my studies I’ve seen quite a lot of the bimodal distribution, so its quite repeatable. BUT none of my teachers decided to publish the distributions and try to explain them without knowing the background of the students.

    2. Precisely. The biomodal distribution in programming outcomes is uncontroversial: marks in first year computing classes tend to have a lot of very low values and a lot of very high values. This is well-documented and repeatable.

      But as is usual in the sciences, the number of conclusions one can draw from a single experiment or observation is much, much less than one. The question of “why are marks in programming courses biomodally distributed?” is a vexatious one, made more-so by the many extremely naive notions that get put about regarding it.

      Two things we have learned in forty years of failing to teach everyone to code:

      1) we can’t find any single testable ability that predicts the outcome of teaching someone to code

      2) none of the vast array of variations in pedagogy that have been applied to the problem changes the distribution significantly.

      I guess the other thing we have learned is that partisans on both sides of the debate are willing to make wild accusations rather than sober assessments of the data. In particular, the patently ridiculous, insulting and false insinuation that no one has ever tried any alternative methods of teaching programming is always made when this issue comes up.

      My own thinking on the matter can be found here:

      http://www.tjradcliffe.com/?p=1338

      and here:

      http://www.tjradcliffe.com/?p=1471

    3. I can find no claim whatsoever about innateness in the original paper.
      The claim was “people who have had no exposure to programming and given a test about assignment statements act as if (a) they used a single mental model throughout, (b) they used several different models, or (c) they refused to deal with a formal system they did not understand. The people in the (a) group do well in CS1; the people in the (b) and (c) groups do not.” The paper does say “that programming teaching is useless for those who are bound to fail and pointless for those who are certain to succeed”, but that is NOT a claim of innateness. The cognitive difference in question could be entirely a result of educational experience and that would be entirely consistent with the factual claims of the paper. A claim that a course in programming that isn’t DESIGNED to affect the way people deal with formal systems as such DOESN’T is not terribly outrageous.

      See the claim above? The retraction does not resile from THAT claim. Bornat still
      believes that Dehnadi found something real and important and worth exploring.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.