The generalisability puzzle

In an in-depth article in the Stanford Social Innovation Review, Abdul Latif Jameel Poverty Action Lab (J-PAL)-affiliated researchers Rachel Glennerster and Mary Anne Bates discuss the different approaches to evidence-based policy decisions, some better than others, and delve into the generalisability framework that supports their own work at J-PAL, integrating different types of evidence, including results from the increasing number of randomised evaluations of social programmes.


In 2013, the president of Rwanda asked us for evaluation results from across the continent that could provide lessons for his country’s policy decisions. One program tested in Kenya jumped out, and the Rwandan government wanted to know whether it would likely work in Rwanda as well. “Sugar Daddies Risk Awareness,” an HIV-prevention program, was remarkably effective in reducing a key means of HIV transmission: sexual relationships between teenage girls and older men. A randomized controlled trial (RCT) found that showing eighth-grade girls and boys a 10-minute video and statistics on the higher rates of HIV among older men dramatically changed behavior: The number of teen girls who became pregnant with an older man within the following 12 months fell by more than 60 percent.1

This study was compelling partly because of its methodology: Random assignment determined which girls received the risk awareness program and which girls continued to receive the standard curriculum. Our government partners could thereby have confidence that the reduction in risky behavior was actually caused by the program. But if they replicated this approach in a new context, could they expect the impact to be similar?

Policy makers repeatedly face this generalizability puzzle—whether the results of a specific program generalize to other contexts—and there has been a long-standing debate among policy makers about the appropriate response. But the discussion is often framed by confusing and unhelpful questions, such as: Should policy makers rely on less rigorous evidence from a local context or more rigorous evidence from elsewhere? And must a new experiment always be done locally before a program is scaled up?

Stanford Social Innovation Review