If one wants to grow an oak tree, it helps to have both an acorn and a working knowledge of the conditions under which an acorn is most likely to become an oak tree. One also needs to know how long the germination process is likely to take – in the case of the red oak, upwards of two years from flowering to acorn to sapling. Absent such knowledge, one might reasonably (but incorrectly) infer that, upon seeing no outward signs of life six months after planting the acorn, one’s efforts had been in vain. It is only assessment against informed expectations at any given point after planting the acorn that allows one to make accurate inferences regarding success or failure. Needless to say, if one wanted to grow sunflowers instead – which have a wildly different growth trajectory to that of oak trees – one would need knowledge of that trajectory in order to accurately assess the effectiveness of one’s horticultural skills.
It seems to me that while we may have plenty of favorite ‘plants’ (microfinance, girls education, community empowerment) with which to respond to the range of development challenges we face in the world, we nonetheless have very weak knowledge of the timeframes and trajectories of impact we should expect to see unfold for any particular project. And we have even less knowledge of how we might expect any such timeframe and trajectory to vary according to the scale of the project (will bigger in fact be better?), the context in which it operates (will a project that works here work there?) and the quality of its implementation (will less diligent or talented staff be able to attain the same result?). Some projects – one thinks of community driven development (CDD) – might not even have a consistent or knowable (ex ante) trajectory at all. By default, our working assumption in project evaluation is that the impact trajectory of any given intervention is monotonically linear, the evaluation challenge thus being to determine whether the observed slope of the line connecting data at baseline and conclusion is statistically different from the counterfactual, or what would have happened otherwise (all else being equal). The development profession has taken many commendable strides over the past decade to raise the quality and frequency with which it assesses development interventions. Even so, the absence of an explicit theory of change – that is, the lack of a reasoned statement about what outcomes we should expect to see from a given project, by when – should make us wary of many of the claims of impact, or lack thereof, made by evaluators.
Researchers at all points on the methodological spectrum have overlooked or obscured this issue. Qualitative studies have given insufficient attention to identifying plausible counterfactuals of any kind, while those using randomized control trials have arguably given too much attention to what is ultimately only one slice of the overall inferential challenge. Once one appreciates that education, health, agriculture, infrastructure and legal reform projects (and the range of individual projects within these sectors) are likely to unfold along vastly different time scales and trajectories, it becomes readily apparent that constructing a counterfactual is only part of the actual identification problem. We need to know not only what would have happened to beneficiaries in the absence of the project, but also what outcomes are reasonable to expect at any given point after the project’s launch in a given context. Some projects may, of their nature, yield high initial impacts – a bridge, for example, as soon as it opens – while others may inherently take far longer, even decades, to show results, not because they “don’t work” after three years, but because it’s simply how long it takes. Last year, for example, President Zoellick declared in a major speech that building the rule of law is among the highest of all development priorities, yet the current World Development Report on conflict and security provides evidence that attaining a one standard deviation improvement in the ‘rule of law’ has taken the fastest reforming developing countries 41 years!
The key point is that doing some vitally important tasks in development may take two or more generations even under the best of circumstances, and that otherwise technically sound projects can be unfairly deemed to be ‘failures’ if their performance at any given point after implementation is not assessed against informed expectations. (And vice versa; some initially ‘successful’ projects may turn out to have decidedly unwelcome long-term consequences.) Have we ‘failed’ if, in fragile and conflict countries, we have nothing to show after 30 years of faithful effort? Only if we have no theory of change against which to make such a judgment, no realistic sense of where we should be by when, or assume that oak trees and sunflowers grow on the same trajectory, and that their respective growth rates are (or should be) the same no matter what the context. How long did it take to get women the vote? Or to end slavery? It took centuries of committed effort in the midst of what seemed like unrelenting disappointment and futility. By only focusing on and rewarding ‘what works’ in the short run, we risk reverse engineering the whole development challenge, deflecting attention from complex problem solving and building constituencies needed to sustain support for those strategies shaping long-run prosperity. The World Bank should at the forefront of leading responses to these types of challenges. At present, however, political cycles and career incentives overwhelmingly favor projects promising rapid, non-controversial results in five years (or less).
What can be done to improve things? One obvious response is to encourage both practitioners and researchers to be more explicit about what kinds of results we should expect to see, and why, at specific points after project implementation has begun. Another recommendation, following on from a recent paper by our colleague David McKenzie, is to encourage more frequent ‘micro’ evaluations during the project cycle rather than just waiting for a big summative assessment at the project’s end; this also implies creating more space and resources for enhancing the quality of monitoring systems, the better to get a more accurate real-time sense of how projects are doing. In recent years, researchers have done a pretty good job of making ‘E’ (evaluation) cool; we need to do the same with ‘M’ (monitoring), even though, by allowing the content of projects to adapt mid-stream, the ‘purity’ of one’s research design would likely be compromised. Finally, I think we need better awareness of how given project ‘technologies’ (which we assume are invariant) nonetheless interact with scale, context and implementation quality to generate the range of outcomes we observe. Doing so would not only generate useful and useable information for project managers, but ultimately help researchers go beyond determining local average treatment effects to providing better explanations of how project performance varies across different contexts. Either way, oak trees aren’t sunflowers; similarly, it’s a pretty safe bet that roads aren’t bed nets, and that CDD in Indonesia isn’t CDD in Kenya. We shouldn’t assess and compare them as if they are.