The video embeded here by Johan Norberg caught my attention with the title “skewed crime reporting” which I read as “skewed data reporting.” Comparing 2016 and 2017 Department of justice statistics on hate crime, that hate crimes rose 17% in one year. This is reported in the Washington Post, AP news, Vox.com, NBCnews, and the NY Times (and those were only the first 6 hits in my Google search on ‘hate crimes up 17%.’ The Hill.com reported that it was the third year in a row that hate crimes had increased.
Seventeen percent! That is a huge increase and since this is now two back to back increases that seems to say that there is something very wrong these days. One may reasonably expect the data in 2018 to again show an increase. Three years in a row, then possibly four? What are we to do?
Well we can start by not comparing Apples to Oranges
All economics students are taught in their data courses and I hope all who dabble in data analysis and data science are as well, to understand the data generating process (DGP) of the data? What is it and how reliable is it. One would think it is great and useful data because is seems comprehensive and is government data published by the Department of Justice data on the fbi.gov website at 2017 Hate Crime Statistics and 2016 Hate Crime Statistics. It’s governmental data, we do not have to be concerned, correct? In this case there is every reason to be skeptical. See first two paragraphs by the FBI UCR Hate Crime Summary, specifically the changing base of the numbers and the non estimation of hate crimes in an area that does not report. I was rejoicing in the fact that the FBI did not quote the 17% increase, but was disappointed to find that they did. I now hope it was not the careful data professionals, but the misinterpretation of a press officer. Why disappointed? Read on.
I learned my data / statistical skepticism from a 1954 book by Darrell Huff called How to Lie with Statistics and have required it in nearly every data class I have taught. So simple, and so devious, most are simple misrepresentation of the facts. You can’t have your own facts, but your manipulation or interpretation may be nonfactual and faulty. In the data course in which I taught last fall I required a free download by Cathy O’Neil On Being a Data Skeptic. And there are other fine resources, but to have my students first distrust anything about data is quite the goal.
All economic students are cautioned or should be cautioned to be skeptical with all data sources, understanding the DGP, but also looking for the year to year changes in methods or wording or scope or instructions. I did not look for, nor would I expect to find a set of instructions sent out with the 2017 survey to reporting agencies to pay particular interest to this or that differently than in 2016, but if I were to analyzing this data I should. Some EDA methods are advisable for economists, (those that inform the researcher, but not those that purport to find truth from data, the latter introducing the inherent bias that correlation is causation), but, alas, I suspect even that would reveal little here. This problem is more fundamental, it is in the base.
Agencies report the hate crimes. Is it only a 10.7 percent increase?
Each year’s data is assembled from the reporting agencies in their sample and that is the problem. The number of agencies between 2016 and 2017 rose by 5.9%, 895 additional agencies in the sample. However, agencies that reported at least one hate crime rose from 1776 to 2040, 14.6 percent increase, near the 17 percent increase in hate crimes. So did hate crimes increase significantly, or was it just because more new agencies join the reporting network? The latter casts doubt on the former reliability.
year | No. of agencies | No. of agencies reporting hate crimes | No. of hate crimes reported | No. of hate crimes per agency | No. of hate crimes per agency that reported hate crimes | Source of Data |
2016 | 15,254 | 1776 | 6121 | 0.401 | 3.45 | https://ucr.fbi.gov/hate-crime/2016 |
2017 | 16,149 | 2040 | 7175 | 0.444 | 3.52 | https://ucr.fbi.gov/hate-crime/2017 |
change | 895 | 264 | 1,054 | 0.043 | 0.071 | |
Percent change | 5.9% | 14.9% | 17.2% | 10.7% | 2.0% |
To examine that is beyond this blog post, but I would start with restricting the sample 16,149 agencies in 2017 to only those 15,254 that reported in 2016. In those 15,254 areas by how much is hate crimes increased? I bet it wouldn’t be 17%. in the table I estimate the increase of hate crime to be 10.7%, but I am not comfortable with that measure either.
Not all agencies report hate crimes, perhaps it is only a 2 percent increase.
I every agency that reported for the first time reported just one crime each it accounts for almost the entire increase suggests Johan Norberg quoting or at least referencing Robby Soave in reason.com. So I am led to a different way of thinking about whether hate crimes have increased. What if we took only those 1,776 agencies in 2016 that actually reported hate crimes and then looked at only those agencies in 2017 to see what they report. Again that is beyond this blog entry, but as a crude proxy, lets compare the agencies that reported hate crimes in 2016 with the 2,040 agencies that reported in 2017. What do we find? in 2016 there were 3.44 hate crimes reported on average by the 1,776 agencies which rose to 3.52 hate crimes reported on average in 2017 by the 2,040 agencies which suggests that hate crimes rose by 2 percent, not 10 percent and certainly not 17.