Skip to content

Sheri Z, Jake Wolf, Diamond Washington, Lawrence Goodwyn

Notifications You must be signed in to change notification settings

lawrencegoodwyn/Gainesville-FL-Crime

 
 

Repository files navigation

Visualizing Moon Phases and Crime Types in Gainesville,Florida

Overview

Everyone has heard the tale of weird things happening during a full moon. Police, fire rescue, service operators and hospitals are speculated to see a spike in calls during a full moon phase. This project attempts to analyze the calls for service and associated crimes in Gainesville, Florida that went to the police department between 2018 and 2021 during the four moon phases (New Moon, First Quarter, Third Quarter, and Full Moon). We selected this topic to explore if there is any connection to the moon phases and calls to service to determine if the myth is simply a myth or a probable explanation for the weird things that occur on a full moon.

H₀= There is no statistical difference between the moon phase and crime type.

Ha=There is a statistical difference between moon phase and crime type.

In this model the main question we aim to answer: 1.Can we predict the type of crimes based on the moon phases? Larger Analysis Questions: 2.What type of crimes should we expect leading up to the full moon? 3.Are the moon phases relevant? If so, which is the most?

Assumption: There is some assumption that following March 2020 the calls to service will trend differently due to COVID-19.

Presentation

Preprocessing

Data Source

This project uses two sets, one from City of Gainesville website and Moon Phase dates from here.

The City of Gainesville data is a csv file with unique crime IDs, Incident type, Report Date, Offense Date, Report Hour of Day, Report Day of Week, Offense Hour of Day, Offense Day of Week, City, State, Address, Latitude, longitude, and location. The data from this source represents calls for service where a report was written. The moon phase data source is a JSON object for a single location in Gainesville, FL with the moon phase and date for every date between 2018-2021. The keys are "datetime" and "moon phase". Datetime is in year/month/day format and each moon phase value corresponds to a positive value between the numbers 0-1. A New Moon = 0, First Quarter =0.25, Third Quarter=.75, and Full Moon=1.

The selected moon phase date range for this project was January 1st, 2018, to December 31st, 2021.

Data Cleaning

First, We got rid of incidents that didn’t necessarily imply a crime occurred and refined our crime types to the unique list of instances below. Unique incident types that were removed:

  1. assist other agencies,
  2. assist citizen,
  3. warrant arrest,
  4. lost/stolen vehicle tag/ decal,
  5. DCF investigation,
  6. Drug possession of controlled substance,
  7. information,
  8. found contraband,
  9. tow report
  10. recovered stolen vehicle,
  11. stalking (simple),
  12. found-returned,
  13. seize tag

Our data source was cleaned using Jupyter Notebook and Pandas. First we renamed our columns as such:

  • 'IncidentType': 'CFS'
  • 'Report Date' : 'reportDate'
  • 'Offense Date' : 'offenseDate'
  • 'Report Hour of Day' : 'reportHour'
  • 'Report Day of Week' : 'reportDOW'
  • 'Offense Hour of Day' : 'offenseHour'
  • 'Offense Day of Week' : 'offenseDOW'
  • 'City' : 'city'
  • 'State' : 'state'
  • 'Address' : 'address'
  • 'Latitude' : 'latitude'
  • 'Longitude' : 'longitude'
  • 'Location' : 'location

After renaming our columns, we filtered the crimes on 01-01-2018 - 12-31-2021 on the "offenseDate" column. Next, we dropped all columns but "ID","CFS", "offenseDate", "offenseHour", "offenseDOW", "latitude", "longitude" and "CFS_Type".

Then the moon phase data source was read in using an API call as a data frame. The date-time format was changed with pandas "to_datetime". The moon phase and Gainesville Crime and Classifications data frames were all merged on the Date column. Persons, Property, Government and other are the four crime classifications ("CFS") a crime type ("CFS_Type") could fall under.

With the remaining unique incident types, we decided to group them into the following categories:

Crime types

  • Drug

  • Alcohol

  • Assault

  • Battery

  • Fraud

  • Homicide

  • Theft

  • Suicide

  • Government regulation violation

  • Quality of life

  • The 10 crime types were then seperated and classified as either Person or Property crimes with Python using str.contains. And then, the LabelEncoder from sklearn was used to encode the "class", "moonphase", "offensedow", "cfs", "date", and "cfs_type" columns.

Data Storage

Postgres is the database that we used. Utilizing SQL we queried and joined the datasets.

ERDiagram

ERD2

Date is the foreign key connection between CallsForService and MoonPhase. "CFS" is the foreign key connection to classification from CallsForSerivice.

Machine Learning

SciKitLearn is the library that will be used. We will be using:

  • Scikit
  • RandomForestClassifer
  • y = ['cfs_class']
  • X = ['cfs', 'date', 'offensedow', 'cfslatitude,'cfslongitude'.'cfs_type', 'moonphase']
  • The model is then fit using RandomForestClassifier
  • Scaled using StandardScaler()
  • Accuracy score
  • Confusion matrix is output for Predicted vs Actual Person and Property calls for service.

Dashboard

We used Tableau Public for visual displays and a fully interactive Dashboard

About

Sheri Z, Jake Wolf, Diamond Washington, Lawrence Goodwyn

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%