By combining historic Spokane Police Department crime incidence data and historic weather data for Spokane, a multivariable regression analysis was created to better understand the influences that weather has on crime rates. While there are many factors that impact crime, this analysis shed light into how much weather plays an impact on crime for the specific Spokane region. While the regression analysis itself would not be useful in predicting how many crimes occur on a given day, it does highlight which factors impact crime rates and provides insight that can be useful in developing further models and understanding how crime could vary as the climate changes, or developing baselines to compare other scenarios to.
Predicting crime rates does not necessarily lead to the ability to expect a specific number of incidences on a given day based on the weather. However, utilizing a detailed analysis of the attributes that contribute to increases and decreases in crime can still be useful. By understanding these factors, better staffing plans can be implemented to reduce costs from reducing overtime or avoiding under-utilized staff time. The best way to measure successful use of this information would be to look at overtime spend to determine if there was a reduction from better planning. This type of analysis can also be helpful in determining what the other drivers of crime rates are as well. By developing a regression analysis, data can be weather normalized so it can then be used for baseline comparison against unique events, like the COVID-19 pandemic, or longer-term events like climate change.
The influence of weather on crime rates has been studied for many years. Multiple studies have indicated there is a strong correlation between increased temperatures leading to higher crime rates. A Finland study showed that 10% of the fluctuation in their crime rates came from temperature changes alone (VWU 2019). Different regions have had varying results around the impacts that occur from other factors like rainfall, snow, or extreme cold. Different cultures also behave differently in different weather conditions, so analyzing data for the specific region is an important factor in ensuring the results are applicable to the target population.
To develop a regression model that can predict an assumed amount of crime incidences for a given day for the city of Spokane, a couple of different data sources were required. The first set of data required was the daily crime records for as far back as possible. The second set of data required was the daily weather information from a weather station that is also used for weather forecasts. These two data sets could then be combined to show the different elements of weather and crime incidences that occur daily. From those combined data sets, influencing factors and regression models could be built. Both datasets were available to the public for free from government websites and were available in a downloadable format that could be utilized within data software.
For the daily crime records, the Federal Bureau of Investigation (FBI) publishes daily incident-based data by state with granular details for each incident. The reporting goes back to 2005, however, when obtaining the information and reviewing its accuracy it was found that the Spokane Police Department started sending their data to the FBI starting October 2016. There were many layers of granular data for crime incidences so for this analysis only total crime rate was analyzed due to time restraints. For the daily weather records, the National Oceanic and Atmospheric Administration (NOAA) publishes daily weather recordings by weather station with granular details for multiple weather factors. These two separate datasets were then combined within Excel into a single daily record dataset that consisted of 819 records with a total of 29 separate attributes. This format allows the data analytics software PowerBI to determine the most key influencing factors between attributes being investigated, as well as enabling the ability to run R script on the data set for regression analysis.
Following an analytical engineering approach, the next step after defining the business and data understandings of the problem was to determine which variables would be the most influential in developing a model (Provost and Fawcett, 2013). Since the goal was to predict numerical crime incidences, a regression model was chosen for the modeling technique. Utilizing PowerBI, the data was analyzed to determine whether there were weather variables that were influencing total daily crime rates. Like many studies reported by Virginia Wesleyan University, temperature was shown to be a factor that could influence crime rates by over ten percent and from this analysis the City of Spokane showed a variation of 17%. The data also showed that snowfall and snow depth could impact crime rates by 14% from average as shown in Appendix I. Rainfall, wind speed, and sunlight amounts did not show any effect on the crime rate averages. From this initial analysis, the maximum temperature, snowfall, and snow depth were chosen as the attributes that would be used to create the regression equation since they showed significant influence on the target attribute.
To properly develop a regression analysis, the data was split into a training data set and a control data set. 2018 was held out as the control set, and all other data was utilized for the training data set. A regression analysis was performed on the training dataset that showed that for every inch of snow, crime rates were expected to be reduced by nearly two incidences per day. For every inch of snow already on the ground, slightly over one incidence per day was also expected to be reduced. It also showed that for every 8 degrees of temperature increase, an additional crime per day could be expected.
Once the regression coefficients were determined, the regression equation was applied to the control data set to see how well it would predict data the model had not encountered before. The dips seen in the actual crime incidences during snowy winter months were represented in the regression line as seen in Appendix III. The regression was also run on the entire dataset as well to see if there would be any variation with more data included even though there would not be a control group to compare to. This regression from all the data available visually looks to show more of the seasonality response seen in the data as shown in Appendix V, however, detailed evaluation was then performed to determine how well the regression model was predicting the historic behavior.
To measure the effectiveness of the regression model, a standard approach of analyzing coefficients of correlation and determination were utilized to see how well the regression predicted the daily crime incidences. The coefficient of correlation, or r-value, is a function that provides a value between 1 and -1 where values closer to 1 indicate a positive relationship between the values. The coefficient of determination, or r-squared value, is a value between 0 and 1 where values closer to 1 indicate that the model fits the data set. When running the validation on the regression analysis on the control group, a coefficient of correlation of 0.47 was found, indicating that the weather attributes do appear to have a moderate positive influence on crime rates. However, the coefficient of determination was only 0.22, indicating that the regression model does not predict the actual value of daily crime incidence very effectively as shown in Appendix IV. This result was not that surprising given the fact that crime stems from a large variety of variables other than weather. The curve of the regression analysis to the eye looked like a strong model, however, the actual analysis showed it to be weak. The regression for the entire dataset produced slightly weaker results with a coefficient of correlation of 0.41, and a coefficient of determination of 0.17, as shown in Appendix VI.
While the regression analysis itself would not necessarily be valuable in predicting the actual crime rates for a specific day, the insights gained from the analysis can be utilized moving forward. The trends around snowfall, snow depth, and temperature do indicate that they do influence crime rates. With expectations that climate will create drier and warmer conditions, this analysis would indicate that there could be less of a dip in winter months that have less precipitation and an increase in summer months as the temperature extremes grow.
To be able to use this regression analysis to impact staffing costs, this analysis would need to be enhanced by several different additional projects. In looking at the data, some of the regions that have the most variance from model expectations tended to fall within time periods where the City of Spokane has large events. Potentially adding in scheduling data around when large influxes of tourists are expected for events like Bloomsday and Hoopfest could be useful. Another potential update to the model could be to separate the seasons into winter and summer month regression equations. The model appears to follow the crime trends in winter better than in summer months indicating that the combination of snow attributes and temperature maxes within the same equation could potentially cause confounding factors. Deeper analysis on the separate crime categories would also be important to determine which crimes are more tied to outdoor conditions since not all crimes are outdoor or environmentally tied.
Another aspect that could great improve this analysis would be more data. Since the data collected only went back slightly over two years, there was limited opportunity to have robust training datasets. In future years, or for regions with deeper datasets, this kind of analysis could likely provide more accuracy in prediction. A follow-up analysis would be recommended because as populations increase, and climate change creates more impacts, it will be important to effectively control government costs while ensuring safety of citizens.
Author: Logan Callen
Federal Bureau of Investigation (FBI). “Crime Data Explorer.” Accessed April 24, 2020. https://crime-data-explorer.fr.cloud.gov/downloads-and-docs.
National Oceanic and Atmospheric Administration (NOAA). “Climate Data Online Search.” Accessed April 09, 2020. https://www.ncdc.noaa.gov/cdo-web/search?datasetid=GHCND
Provost, Foster and Tom Fawcett. 2013. Data Science for Business. 2nd Edition. California: O’Reilly Media, Inc.
Virginia Wesleyan University (VWU). 2019. “Weather and crime: is there a connection?” ZME Science, Accessed May 14, 2020. https://www.zmescience.com/science/weather-crime-connection-04234/