Last week, we wrote about a correction of a heavily criticized paper in The Lancet by the Millennium Villages Project, a large aid program. Paul Pronyk, director of monitoring and evaluation at Columbia University’s Center for Global Health and Economic Development, which runs the Project, left his job shortly after writing an explanatory letter that accompanied the correction.
That correction had come after a letter to The Lancet, but also after Nature had raised issues in an editorial. Nature’s editorial concluded:
The Millennium Villages Project has problems beyond the analysis of data. The organizers have been reluctant to publish a full breakdown of costs — making it impossible for those not privy to the information to verify their cost–benefit analysis, which is crucial in development policy because spending is under great scrutiny. The project also seems to lack a coherent policy on when and how it will make data available to independent researchers.
Clemens and others are right to ask that the project make this information available. Greater transparency is essential to build trust and credibility. The project’s approach has potential, but little can be said for sure yet about its true impact. The latest initiative of the Millennium Villages Project, in Ghana and funded by the UK government, is a welcome step in the right direction. It builds in independent scrutiny from the start, and has been open and transparent about its costs. All future projects should follow this model.
Clemens is Michael Clemens, one of the scientists who has been questioning the findings. This week, Nature has a letter from Pronyk responding to the editorial. Pronyk writes that
Some of your criticisms of child-mortality figures from the Millennium Villages project in Africa are unjustified.
For example, referring to a comment in the editorial by Clemens, he writes:
No “alarm bells” sounded over the mortality rate in the comparison villages. These were closely matched to the Millennium sites, chosen because they were poor, often remote, hunger hotspots. We therefore had no expectation that mortality rates at any of these sites would track national trends.
Pronyk also notes the correction to the Lancet paper, and that there will be changes to the Project. One of those changes, of course, was that Pronyk left in late May. The letter’s signature line lists him at the “Center for Global Health and Economic Development, Earth Institute, Columbia University, New York, USA,” which runs the Millennium Villages Project. Today, the Earth Institute confirmed for us that Pronyk was no longer at the Institute, either. The letter, according to a Center spokesperson, represents the views of the Project, however, and was submitted before he left, which is why it lists his former affiliation.
We offered Pronyk a chance to comment on his departure, and will update with anything we hear back.
Update, 4:45 p.m. Eastern, 6/8/12: Michael Clemens, one of the paper’s critics, sent us this detailed comment earlier this week, after this post went live:
Pronyk says it is “unjustified” to “question the methods we used to generate mortality estimates” via long-term recall. In fact, the Project’s own Lancet-registered research protocol states:
“Child mortality rates are themselves susceptible to recall bias. The longer back in history one measures, the greater the potential for error. In addition, non-surviving births are thought to be more frequently omitted than surviving births.”
So the problem of increased measurement error from the recall method is questioned in its own documents. Furthermore, it is untrue, as his letter implies that the intervention villages and the comparison villages were treated identically in the survey. In the comparison villages, mothers were only asked about child deaths at endline (up to 8 years after the deaths in question). In the intervention villages, mothers were asked about child deaths both at baseline and at endline. This has the potential to produce more accurate recall reporting of child deaths in the intervention villages only, due to intrinsic or extrinsic cues for consistent reporting between the two surveys that occurred at the intervention sites. It is a plausible explanation why “baseline” mortality in the intervention villages was much higher in the intervention villages than in the comparison villages, even though the two sets of villages were perfectly matched in every other dimension tested. (The project offers no other explanation for that strange fact.)
Second, the letter states that child mortality in the comparison villages “closely matched” child mortality in the intervention villages at baseline. As I’ve noted, that is false. At baseline, estimated baseline mortality was substantially (and statistically significantly) higher in the intervention villages. At endline, mortality was not significantly different between the two sets of villages. Together, these facts mean that the entire difference in mortality trends between intervention villages and comparison villages is due to the fact that the two sets of villages are not properly matched on baseline mortality. This is a critical flaw in the paper and its unretracted finding. It means that the paper’s main unretracted result could arise from the above-discussed downward bias in estimated baseline mortality in the comparison villages.
Third, Pronyk’s letter and the original article continue to ignore this fact: The Project’s own research protocol documents that the intervention villages and comparison villages were chosen differently. The protocol states,
“Issues of feasibility, political buy-in, community ownership and ethics also featured prominently in village selection for participation in large scale development programs such as MVP.”
The word “feasibility” means that the intervention sites were chosen specifically because it was believed, for various reasons, that the project would work better in those places than it would work in other places. Among the reasons for that belief was that the intervention sites had strong community ownership and political buy-in. There is no evidence to suggest that comparison communities were chosen to ensure that they, too, had strong local ownership of development efforts and wise local leaders. Gabriel Demombynes and I discussed this in our response to the American Journal of Clinical Nutrition article, and Pronyk ignored this critical problem.
Fourth, Pronyk’s letter asserts that the costs of the project are fully transparently reported. That is not true. The project’s publicly-available documents do not make the full costs of the project clear. It is never clear what is included and what is not included in the cost figures they report. The project claims that the intervention costs $160 per person per year for a five-year intervention. It does not make transparent the fact that this figure 1) excludes all off-site expenses of the project, which are substantial, 2) excludes in-kind donations from various public and private partners such as expensive electricity plants from Panasonic, 3) excludes start-up costs prior to the initial five-year period, and 4) excludes continuing ‘consolidation’ costs after the initial five-year period. Nature’s request for transparent release of data on the full cost of the project is ignored by Pronyk.
One funder (the UK Department for International Development) has released the full costs of the one site it is supporting, as I have documented. There it is clear that the project is making a one-time expenditure of about US$6,000 per household, or US$12,000 per household that is lifted out of poverty. These are, respectively, 17 times and 34 times the local income per capita. That’s an astronomical sum. If used to support other interventions, or simply given to the families in question, it could clearly have very large effects on the health and education of their children. No attempt has ever been made by this project to fully transparently report and document its full cost, and compare its effectiveness to alternative uses of scarce aid funds in the same setting.
The link to the original Nature editorial isn’t working
Fixed, thanks.
Judging by his comments in Nature, the problem of advantageous reporting of results does not seem to be endemic to the Millennium Villages Project. Indeed, his repeated statements that ” this is standard practice” and ” comparable studies” indicate the other development projects have the same project. This is understandable since a project with no or negative results probably has a higher probability to lose its funding.
A general problem that I have been having in reading the MVP and other randomised or non-randomised experiments is the lack of lab notebooks, which document all experiments which were executed, not only the ones worth reporting on. After all, if one were to run 20 projects of any kind the probability of one being worthy of reporting is close to 1.
The probability that at least one out of 20 projects will report a statistically significant result depends on several factors including the underlying probability that the intervention is in fact useful. If the 20 projects are all useless, then the chance of at least one showing a spurious positive result is about 0.64 (assuming alpha of 0.05). Obviously, the probability of at least one coming out positive increases as the chance of the trials actually being useful increases, assuming the projects have adequate statistical power. If all 20 projects were useful, and if they all have a power of 80%, then the probability of at least one being “worthy of reporting” approaches 1 (the probability of them all being negative would be 0.2 to the power of 20).
Thanks for the stats. If we consider an alpha of 0.10 (very common) then 20 projects give a 88% probability. Let’s take the project of http://www.econ.upf.edu/docs/seminars/duflo.pdf (also published in AER) it is not clear from the paper whether this was the only experiment that was conducted or whether this was the experiment that gave the intended results.
Also the comments of the author’s themselves throughout the paper indicate that they have been changing the model to align it with the outcomes of the experiment. In this way, the “success” of an experiment becomes pretty much endogenous. See p. 31 “[Is it worth trying to see if we pass this test
with the new model?]”.