For the past few days, I’ve been struggling with what I think about the Brennan Center’s new report on the effect of incarceration on crime. What has me torn is this:
1. On the one hand, I think the report’s basic claim is likely more or less correct. The report’s central argument is that incarceration’s impact on crime exhibits diminishing returns. As we lock up more and more people in a time of falling crime, that seems like a reasonable claim.
2. On the other hand, the methods the paper uses are simply wrong, and their invalidity has been well documented for nearly two decades. Moreover, while the report’s basic claim is likely true, its estimates of the exact size of incarceration’s impact on crime are almost certainly too low.
Now that second claim might initially seem like the clearly less-important one. So what if they say that prison contributed to 10% of crime’s decline when it should have been 15%? People only care about the general trend. In fact, policy can only really be based on the general trend—social science isn’t like putting a man on the moon. We operate by rough estimates, not fractions of an inch.
Right? Well… no.
First, the report argues that the effect in recent years could be zero. That’s not a quantitative error, that’s a media-friendly qualitative error. It’s a “nothing works” argument for the prison reform movement.
Second, given the statistical flaws, I can’t actually be sure that my intuition about demising returns is right. The whole reason statistics exists is that our intuitions are quite often wrong. If intuition and reality lined up on a regular basis, we wouldn’t need stats people.
And third, even if the authors caught a break this time and got vaguely-valid results using invalid methods, future studies that use these techniques may not be so lucky. Calling out the bad methods in high-profile work may give those critiques the attention they need to prevent more-serious future failures.
So over the next few posts I want to dig into the statistical flaws with how this paper was written, and what they mean for its conclusions in particular, and for how we should approach difficult statistical issues more generally.
This poses, however, a unique challenge. The report’s claims are facially plausible, as are its estimates. It is not as if it said the earth was flat. It’s more like it said that vaccines cause autism: the correlation exists, there is a causal story that seems at least possible to lay readers, and the result aligns with many people’s (sincere) prior beliefs. And as the medical community has discovered, displacing such “empirical” beliefs is tough.
But I’ll wade in, nonetheless. So in this post I want to zero in on what strikes me so far as being the report’s cardinal sin: the failure to properly account for the feedback effects between prison and crime.
Estimating the relationship between incarceration and crime raises the specter of a fairly intractable statistical problem called “endogeneity” or “simultaneity.” For the basic regression model that the Brennan Center report uses to work, one assumption that has to hold is this: the explanatory variable (here, incarceration) has to affect the outcome variable (here, crime), but not vice versa. So while trends in incarceration can shape crime rates, the model fails if crime rates also shape prison populations.
But that assumption obviously doesn’t hold here: prison populations surely shape crime, but crime rates themselves influence how many people are in prison, both directly (more arrests, convictions, admissions) and indirectly (by, say, changing attitudes towards crime). Due to this problem, simple regression results will be biased.
Not just biased, though. Biased upwards. That is: towards zero, or towards a positive (criminogenic) effect.* So when the Brennan Center argues that prison has no effect anymore, that might very well be false: the uncorrected bias pushes results away from finding a crime-reducing effect. If anything, thanks to the bias a zero-effect suggests that there is at least still some crime-reducing impact to incarceration. (Which is not to say that it is a cost-justifiable effect!)
Now, in the report’s defense the authors do admit that it exists. But their response? “It’s really hard, the one solution people generally use, this thing call instrumental variables, is really tricky to use, so we’re just going to ignore it.” Lest you think I’m being harsh, here’s the relevant passage:
There are other ways to address simultaneity. One is through a controlled experiment. However, with something like incarceration, this is not feasible. Another is through natural experiments or instrumental variable techniques…. However, good instruments are difficult to construct, and even then the results can be highly dependent on the instrument chosen. For instance, Levitt’s 1996 paper uses prison overcrowding legislation as an instrument (it is plausibly correlated with prison populations and plausibly uncorrelated with crime) and finds a large downward effect of incarceration on crime. But Geert Dhondt’s 2012 study uses cocaine and marijuana mandatory minimum sentencing as an instrument and actually finds an upward effect of increased incarceration on crime. The authors recognize the potential issue of simultaneity but due to the complications invoked by instrumental variables did not apply that technique to their analysis.
The authors are right that IVs are tricky: there’s plenty to criticize with Levitt’s, but there are also lots of reasons to suspect that Dhondt’s isn’t valid either.** But it is that last sentence that really troubles me: “since it is tough, we will simply ignore it.”
Now one reason the authors are willing to dismiss the problem is that in the paragraph above the one I quote they cite three papers for the proposition that endogeneity isn’t really a problem in crime-prison models. But they are wrong about that. One paper doesn’t discuss the issue at all. The other two improperly use a test (called the Granger test) to dismiss the problem.
Furthermore, at no point does the report ever cite this fantastic Vera institute report that demonstrates just how important endogeneity is. The Vera report divides the literature into articles that control for endogeneity and those that don’t, and the results are striking (look at Table 1 on page 6): those that try to control for it consistently return much higher results than those that don’t. At the very least, this makes clear that the blithe assertion that we needn’t be concerned with endogeneity is wrong.
To be clear, I’m not saying that by refusing to use, say, Steve Levitt’s instrument the paper is invalid. Nor am I saying that Dhondt’s instrument can’t be valid, even though it produces results that don’t align with my prior assumptions. But what I am saying is that the literature on the importance of endogeneity to this particular question is extensive enough, and the biases introduced by endogeneity in general are well-known enough, that simply punting on the problem is just… unacceptable, particularly in a report that is going to get, and already is getting, so much attention.
This post is getting overly long as it is, so let me just wrap it up with the first big takeaway from all this: The most obvious point by now should be that any of the original results produced by this report should be viewed with great caution. Most likely the model is consistently understating the true impact of crime. This isn’t the only problem with the estimates, and I’ll turn to more in the days ahead, but this is a big one.
At the same time, don’t throw out the baby with the bathwater. Prison may still have a bigger impact on crime than the report states while (1) exhibiting diminishing returns to scale and (2) no longer being cost-justifiable.
In later posts, I’ll think a bit more carefully about a deeper issue that extends beyond this paper, namely what we should do when there is no easy solution to this problem. What if we all (who is this “we”?) ultimately decide that no instrument exists for this problem? There are some technical solutions that may exist, but there is also the more philosophical question of how to make decisions when we know we can’t solve a problem. But more on that down the line.
* Theoretically, more prisons leads to less crime, but more crime leads to more prisons. What a regression returns, simplifying grossly, is the net correlation of these two effects. So if the real effect is, say, that a 10% increase in incarceration results in a 4% reduction in crime, a regression could return a result of a 2% decline, or maybe even a 4% increase. All because it is also picking up the effect to which a 10% increase in crime leads to some sort of increase in incarceration.
** The statement that “the results can be highly dependent on the instrument chosen” is also quite dubious. If there are multiple valid instruments, then we should expect that IV models using each type of instrument would return fairly similar results. That Levitt’s instrument increases the crime-reducing effect of prison and Dhondt’s the crime-increasing effect suggests that one instrument is simply better than the other (or that the models are otherwise differently designed—and again, in ways such that one is better and the other worse). But the authors here make it sound more like random noise, not about the very difficult question of assessing the relative merits of various IVs.
Posted by John Pfaff on February 17, 2015 at 11:28 AM
Comments
This is why I like Don Green’s study of DC drug court sentencing, relying on the fact that defendants were already randomly assigned to judges who had differing propensities to use long sentences.
Posted by: Stuart Buck | Feb 18, 2015 8:48:19 PM
