Predictive Modeling — The Lab @ DC

Local governments are the front line of service delivery for residents. This includes making sure resources, like, housing, rodent, and lead inspectors go where they are needed most. Predictive models can help governments better determine where these resources can better serve residents.

Let’s start with an example:

Rats are a persistent health risk, ¹ and rat colonies grow and spread if they are not treated. The Rodent Control team at DC Health decides where they should inspect for rats based on calls they receive from residents via 311. But people don’t report every rat they see, and some rat infestations may not be discovered. So how does Rodent Control know where else to look?

First, they think about how rats behave. They know rats like old buildings and alleys where they can find places to hide or soft soil to burrow. Rats also like lots of food waste from restaurants, as well as garbage from homes and apartment buildings. While these are likely places to find rats, there are not enough rodent inspectors to go to every block in DC and regularly check. How does Rodent Control decide which places to check over others?

This is where a predictive model might be useful. We can take all the data DC government has on locations of old buildings, alleys, restaurants, and more, and then use a predictive model to make an educated guess about where rats are going to be located.

How do we make a predictive model?
When we make a predictive model, we use data about things we know or can easily find out (like where restaurants and alleys are located) to make predictions about things we can’t, like where Rodent Control is most likely to find rats.

Predictive models often require a lot of complex math, statistics, and computer programs to create, but the logic behind how we make them is actually pretty simple. Humans make predictions all throughout the day. Predictive models work in a very similar fashion. As an example, think of your morning commute. Say that you have to be at work by a certain time, maybe 8:30 AM and you usually take the bus. What are all the things that could make you early or late? You probably ask yourself questions like, Is there going to be traffic? Is it raining or snowing? Is the bus on time? Do I need to add money to my Metrocard? You might not have to ask yourself these questions every day because over time you learn when you need to leave your home in order to get to work on time and adjust for different situations.

Predictive models work in a similar way. They take a specific event, like whether you’ll arrive at work on time, and combine all the things that make the event more or less likely to happen. On-time bus? You’ll get there sooner. Snowy road conditions? You’ll get there later. Lighter than usual traffic? Sooner. No money on your fare card? Later. Some of these things, which we call "features," are more important than others. A blizzard will no doubt make you later to work than if the bus is running a couple of minutes behind schedule. A predictive model gives different values to these features in order to supply us humans with useful information. Like when you need to leave the house if it’s rainy and the bus is running six minutes late.

How do we know if a model goes a good job of making predictions? The short answer is we test it. Our models are making educated guesses but cannot be certain about what will happen. So, a model will tell us how likely something is to happen. In the commuting example, our model might tell us that if you leave at 7:50 AM there an 80% chance of getting to work on time. But, that means that there’s still a 20% chance you’ll be late. This is why we need to make sure that when a model predicts something will happen 80% of the time, that it actually happens about 80% of the time. We call this "validating" our model.

Validation involves checking our model’s predictions about what will happen against what actually happens. We often validate our models using data that we already have, but that we did not use to build our model. In our rodent model, we might predict where Rodent Control will find rats this month with data we collected the month before. We can then compare where Rodent Control actually found rats this month with the model’s predictions to see how often the model was right or wrong. This tells us how well our model predicts!

We can also validate our models by testing them in the real world: what we call "field validations." During field validations, we give predictions from our model to a DC government agency to test. We then compare what they find with the model’s predictions. For example, we chose 100 locations for the Rodent Control team to inspect where our model predicted they were likely to find a rat burrow. We then looked at how many times the model was right that a rat burrow was at the location. It turns out that the model did a very good job predicting where rats would be in the data but not where we sent Rodent Control on inspections. That’s why field validations are so important!

How do we ensure predictive models are used fairly and ethically?
Predictive models can help us distribute resources and services more fairly by identifying people or places that may need extra help. But predictive models can also be biased. For example, in 2016 ProPublica reported racial bias in a model a different city used to decide how much money people need to provide to post bail. Because the predictive model was biased, that meant that people of some races were assigned higher bail than people of other races with the same criminal history and who were accused of the same crime. That’s obviously detrimental to a fair and just court system. ² The Lab is deeply committed to guarding against this bias, so we take several steps to make sure that our predictive models reduce inequalities rather than widen them.

First, we think carefully about what decisions we want a predictive model to inform. Predictive models may not be appropriate for some decisions, and it is important that we think hard about how our models will be used before we build them. The Lab’s models focus on predicting which people or locations have the greatest need for help⁠—for instance, which blocks are most likely to have rat infestations or which children face the highest risk of being exposed to lead.

Second, we need to make sure that our predictive models do not contain unintended biases. For example, if we are developing a predictive model for finding rodents, we might include the number of 3-1-1 calls for rodents for each city block. That seems logical because it would make sense to send inspectors to where people are telling us there are rodents. But we also know that residents may be less likely to call 3-1-1 based on the racial and ethnic composition of their neighborhood. ³ If we didn’t investigate this potential bias beforehand and adjust the model accordingly, we might have a model that only sends rodent inspectors to more privileged areas of the city. We develop our models carefully to try to avoid these unintended biases. This requires attention to the way the data was created, how our models work, and how factors like history and location can affect people.

Finally, most cases of bias in predictive models come to light after the model has already caused potential harm. That is why we publish our plans for new predictive models before we make, test, or use them. This gives DC residents and experts the opportunity to review and comment on our plans before we create and use them. The Lab is also working on new ways to explain and report the results of our models so that people can know how they work and how DC government is using them.

How does The Lab use predictive modeling?
Now that you’ve read about predictive modeling, check out how we’re using the technique to answer these questions for Washingtonians:

¹ Firth, et al (2014)
² ProPublica (2016)
³ Levine and Gershenson (2014)