Randomized Evaluation — The Lab @ DC

Our best tool for figuring out if something works is a randomized evaluation. Also known as a randomized control trial or RCT, this method is considered the most rigorous and unbiased way of evaluating whether a program is causing the changes we observe. It is the foundation of modern science from agriculture to medicine, and we believe it should be the first tool governments, non-profits, and private companies look to when they want to know if something works.¹

When we’re trying to learn what works, we could be talking about a randomized evaluation of something as easy as an email, something as complex as a housing subsidy, or something as resource-intensive as a job training program. For simplicity, we’ll call whatever we’re trying to evaluate a program.

Let’s start with an example:

About a quarter of the District’s 911 medical responses are to events that don’t require emergency services and can be treated in primary and urgent care clinics. An unnecessary 911 response could mean that less staff and equipment can be used in a real emergency. The District decided to try out sending these calls to in-house 911 nurses who can arrange for appropriate care. Now imagine if, at the end of the year, officials see that unnecessary emergency room visits decreased by 10% from the previous year.

First, this would be great! It would mean that residents are using the emergency room more appropriately. Unfortunately, though, it's impossible to say if this change was caused by sending calls to the nurses. Maybe the weather was better this year, and it was easier for residents to get to a clinic. Maybe individual doctors started calling patients when they were due for a checkup. Or maybe the mayor launched a DC-wide marketing campaign highlighting the importance of only using 911 for medical emergencies. Maybe more factors than just the nurses are at work.

Instead of facing all these “maybes,” we decided to do a randomized evaluation to help us figure out whether nurses could decrease unnecessary emergency room visits. When done properly, randomization takes all the maybes out of the calculations. After conducting the randomized evaluation, if we’re sure that the nurses are the reason for reduce emergency room visits, we can work to put the budget and supports in place so all non-emergency calls can be sent to the nurses. If the nurses are not the reason, then we should try something else.

Randomized assignment means that the treatment and control groups should be very similar.

How does a randomized evaluation work?
Participants are assigned to one of two groups. In one group, participants receive the program, and in the other group, they continue to receive the same thing as before. The key is that the assignment is completely random, similar to flipping a coin: heads, you get assigned to the program; tails, nothing changes for you. This process leaves us with two groups—sometimes called treatment and control—that should be identical, on average.

We used randomization in our nurses example above. We randomly assigned callers to either speak to the nurses or to not speak to the nurses and receive a standard 911 response. Because we assigned callers randomly, we could expect the same proportion of male and female callers in each group, the same number of Medicaid beneficiaries, etc. The only thing that is different between the groups is speaking to the nurses. If we find that callers who spoke to the nurses are less likely to unnecessarily use the emergency room than callers who did not speak to the nurses, we can have confidence that this difference is because of the nurses and not some other difference between the two groups.

Is it fair that some people, selected randomly, are not offered the program we are testing?
When considering the ethics of a randomized evaluation, we think through four key questions:

Do we know the program is beneficial for DC residents?
Are there enough resources to serve everyone?
Can we improve on the structure of the program?
Would participating in the evaluation burden or harm anyone?

Let's go through them one by one:

Do we know the program is beneficial for DC residents?
First, we consider whether we already know that the program is beneficial. By “knowing” we mean is there scientific evidence—not just anecdotes or trend data—that the program works? If we don’t know whether the program will have the intended benefits, a randomized evaluation is not only ethical but can also ensure that it does not have a negative effect. If it has no measurable effect, then our limited resources are not spent on something that doesn’t work for residents. Even when there is not scientific evidence to support a program there are also some “no-brainers.” For example, we wouldn’t want to randomly assign something that is required to serve everyone by law.

Are there enough resources to serve everyone?
Second, we consider whether there are enough resources for everyone to participate in the program. If there are and we know that the program is more beneficial than other alternatives, then it would be unethical to not offer it to everyone we can. If we don’t have enough resources to offer the program to everyone and we’re not sure how beneficial it is, we can use a randomized evaluation to make the case for whether we should be funding the program for everyone who is eligible. Random selection is also typically a more fair way to choose eligible participants for a limited program, rather than first-come, first-served or making referrals to residents already being served, which can be biased against specific people or groups.

Can we improve on the structure of the program?
Third, if we know the program is beneficial (Question 1) and we have enough resources to serve everyone (Question 2), then we wouldn’t use a randomized evaluation to test the program. We could, however, think through whether there are important questions about how to improve the program that a randomized evaluation could answer. For example, if the program is beneficial, but we aren’t sure which structure is best, we can randomly assign different versions of the program. In this case, the control group receives the one version of the program and the treatment group receives a version that tests something that we think will improve the program’s impact even more.

Would participating in the evaluation burden or harm anyone?
Finally, in addition to our answers to the first three questions, we always think hard about question #4: would participating in the evaluation itself burden or harm anyone? These harms could come from different sources:

One could be psychological burden from knowing that you’re part of a randomized evaluation that doesn’t necessarily benefit you directly. We put in a lot of work to be transparent about our projects and actively engage residents—by using resident-centered design to get feedback on our ideas, posting our evaluation plans and findings publicly, and holding community events—to ensure that our work is responsive to residents.
Another type of harm is when an evaluation poses large burdens on those who participate, often through excessive surveys or interviews. We try to minimize this harm by designing our evaluations to work with normal government operations—existing touchpoints and data already collected rather than special sessions or surveys⁠—as much as possible.
With great data, of course, also comes great responsibility. A third potential harm is if an evaluation fails to protect residents’ privacy. All of our evaluations have stringent data security protections; our data use agreements document our responsibilities ranging from working on secure networks from encrypted laptops, storing data anonymously or in locked filing cabinets, and destroying unneeded data.
Finally, in each of our evaluations, we also weigh these potential harms against the possible benefits of the evaluation for improving a program and advancing scientific knowledge, and we work with Institutional Review Boards (IRBs) to get external advice on these judgments when appropriate.

How does The Lab use randomized evaluations?
Now that you’ve read about randomized evaluations, check out how we’re using the technique to answer these questions for Washingtonians:

¹ When the question is not about whether or not a program is working, an RCT may not be the best fit. We’ve also highlighted some other methods we use, their benefits, and their pitfalls here.