Skip to content
Ahmed Shahriar Sakib edited this page Dec 31, 2021 · 2 revisions

A quick overview of the preprocessed data-

The preprocessed dataset contains additional 5 columns extracted from the location column, another 5 columns extracted from date_of_incident and duration columns. Id , Incident_logo and agency_logo columns from the original dataset was discarded.

Columns Description Data Type
business Name of the business place extracted from location(e.g., JANIE & JACK, DOLLAR GENERAL, etc.) object
address Address where the incident took place (extracted from location) object
address_2 Extended address where the incident took place (extracted from location) object
city City where the incident occurred (extracted from location). It could also be a town or a country object
state State where the incident took place (extracted from location) object
duration_in_seconds Incident duration in seconds (extracted from duration) numeric, int
day_name Name of the day when the incident took place object
weekday The day of the week with Monday=0, Sunday=6. object
month_name Name of the month (extracted from date) object
time_of_the_day morning (5AM-11:59AM), afternoon (12PM-4:59 PM), evening (5PM-8:59PM), night (9PM-11:59PM), midnight (12AM-4:59AM) object

Incidents

Wordcloud

wc

Agency

Wordcloud

wc

Description (Codes)

The codes themselves are defined by each agency, and are typically followed by a number to identify a particular instance of each asset type. A legend is sometimes provided on the agency information page, and following are some common examples:

  • B=Battalion
  • BC=Battalion
  • Chief E=Engine
  • CMD=Command
  • CPT=Helicopter
  • C=Crew
  • DZR=Dozer
  • HM=Hazmat
  • ME=Medic Engine
  • MRE=Medic Rescue Engine
  • P=Patrol
  • R=Rescue
  • RE=Rescue Engine
  • SQ=Squad
  • T=Truck
  • U=Utility
  • WT=Water Tender

Credit: PulsePoint Wikipedia

Note: There is no standard for the identifier abbreviations (E, T, S, BC, RA, PM, etc.), and they can vary significantly from agency to agency.

Example - Ventura County Fire Department PulsePoint Unit Abbreviations PDF

To know more, visit - https://www.pulsepoint.org/unit-status-legend

Incident Location

City

Geocoding

Issues -

  • Some cities in different states have the same name. Examples - BLOOMINGTON in CA or IN state
  • Some cities with the same names appear in two different countries. examples -
    • NAPLES - Italy
    • Columbia - Country in South America
    • Suffolk - UK
    • STAFFORD - UK
    • NORFOLK - UK

Adding city and country names will help to get the appropriate location

Script

from geopy.geocoders import Nominatim # reverse geocoding

geolocator = Nominatim(user_agent='myapplication')



def get_nominatim_geocode(address):
    try:
      location = geolocator.geocode(address)
      return location.raw['lon'], location.raw['lat']
    except Exception as e:
        # print(e)
        return None, None
        
# alternative way: scraping from the website 

# def get_nominatim_geocode(address):
#     url = 'https://nominatim.openstreetmap.org/search/' + urllib.parse.quote(address) + '?format=json'
#     try:
#         response = requests.get(url).json()
#         return response[0]["lon"], response[0]["lat"]
#     except Exception as e:
#         # print(e)
#         return None, None

def get_positionstack_geocode(address):
  BASE_URL = "http://api.positionstack.com/v1/forward?access_key="
  API_KEY = API_KEY_POSITIONSTACK
  
  url = BASE_URL +API_KEY+'&query='+urllib.parse.quote(address)
  try:
      response = requests.get(url).json()
      # print( response["data"][0])
      return response["data"][0]["longitude"], response["data"][0]["latitude"]
  except Exception as e:
      # print(e)
      return None, None

def get_geocode(address):
  long,lat = get_nominatim_geocode(address)
  if long == None:
    return get_positionstack_geocode(address)
  else:
    return long,lat

# example
address = "50TH ST S"

get_geocode(address)
from tqdm.auto import tqdm  # for notebooks

# Create new `pandas` methods which use `tqdm` progress
# (can use tqdm_gui, optional kwargs, etc.)
tqdm.pandas() # https://stackoverflow.com/a/34365537/11105356

# for Canadian provinces
ca_province_dic = {
    'Newfoundland and Labrador': 'NL',
    'Prince Edward Island': 'PE',
    'Nova Scotia': 'NS',
    'New Brunswick': 'NB',
    'Quebec': 'QC',
    'Ontario': 'ON',
    'Manitoba': 'MB',
    'Saskatchewan': 'SK',
    'Alberta': 'AB',
    'British Columbia': 'BC',
    'Yukon': 'YT',
    'Northwest Territories': 'NT',
    'Nunavut': 'NU',
}

canada_mask = pulse_point_city_df.state.isin([*ca_province_dic.values()])

pulse_point_city_df['location'] = pulse_point_city_df['city'] + ', ' + pulse_point_city_df['state'] 

pulse_point_city_df['location'].loc[canada_mask] = pulse_point_city_df['location'] + ', CANADA'

pulse_point_city_df['location'].loc[~canada_mask] = pulse_point_city_df['location'] + ', USA'


# to verify
# pulse_point_city_df[pulse_point_city_df['location'].str.endswith('USA')]
# pulse_point_city_df[pulse_point_city_df['location'].str.endswith('CANADA')]


# fetch geolocation
%%time
location_df = pulse_point_city_df.location.progress_apply(lambda x:get_geocode(str(x.strip()))).apply(pd.Series)
location_df.columns = ['longitude', 'latitude']
pulse_point_city_df = pulse_point_city_df.join(location_df) # pulse_point_city_df will be used later

Top 5 Cities by agency engagement -

Name Count State
1. LOS ANGELES 7449 CA
2. MILWAUKEE 4404 WI
3. COLUMBUS 4115 OH
4. CLEVELAND 3977 OH
5. ROCKFORD 2950 IL

Heat Map

Script

import folium
import geopandas
from folium.plugins import HeatMap
geometry = geopandas.points_from_xy(pulse_point_city_df.longitude, pulse_point_city_df.latitude)
geo_df = geopandas.GeoDataFrame(pulse_point_city_df[['city','count','longitude', 'latitude']], geometry=geometry)

map = folium.Map(location = [48, -102], tiles='Cartodb dark_matter', zoom_start = 4)

heat_data = [[point.xy[1][0], point.xy[0][0]] for point in geo_df.geometry ]
HeatMap(heat_data).add_to(map)
map

Bubble map

map_circle_overlays

Script

import folium
import geopandas
from folium.plugins import HeatMap

# to avoid recursion depth issue change latitude,longitude type to float
# https://github.com/python-visualization/folium/issues/1105

pulse_point_city_df['latitude'] = pulse_point_city_df['latitude'].astype(float)
pulse_point_city_df['longitude'] = pulse_point_city_df['longitude'].astype(float)

map_USA = folium.Map(location=[48, -102], 
                     zoom_start=4, 
                     prefer_canvas=True,
                     )


occurences = folium.map.FeatureGroup()
n_mean = pulse_point_city_df['count'].mean()

for lat, lng, number, city in zip(pulse_point_city_df['latitude'],
                                        pulse_point_city_df['longitude'],
                                        pulse_point_city_df['count'],
                                        pulse_point_city_df['city']):
  occurences.add_child(
      folium.vector_layers.CircleMarker(
          [lat, lng],
          radius=number/(n_mean/3), # radius for number of occurrences
          color='yellow',
          fill=True,
          fill_color='blue',
          fill_opacity=0.4,
          # tooltip = city
          tooltip=str(number)+','+str(city)[:21], # can be displayed max 21 character 
          # most of the city names contain 5-20 characters 
          # check pulse_point_city_df.city.apply(len).plot();
          # get more from tooltip https://github.com/python-visualization/folium/issues/1010#issuecomment-435968337
      )
  )

map_USA.add_child(occurences)

State

Top 5 States by agency engagement -

Name Count Abbreviation
1. California 70989 CA
2. Florida 23213 FL
3. Virginia 16016 VA
4. Washington 15532 WA
5. Ohio 14440 OH

Occurrence Timeline

Animate geo-scatter plot

Script

import folium
import pandas as pd
import numpy as np
import pdpipe as pdp
import plotly.express as px

df_state_incident = pulse_point_df.groupby(["date_of_incident", 
                                      "state"],
                                     as_index=False).count()[['date_of_incident', 
                                     'state', 'title']].reset_index(drop=True).rename(columns={'date_of_incident':'date',
                                                                                               'title':'count'})

df_state_incident.columns = ['date', 'state', 'count']

# set the size of the geo bubble
def set_size(value):
    '''
    Takes the numeric value of a parameter to visualize on a map (Plotly Geo-Scatter plot)
    Returns a number to indicate the size of a bubble for a country which numeric attribute value 
    was supplied as an input
    '''
    result = np.log(1+value)
    if result < 0:
        result = 0.1
    return result

pipeline = pdp.PdPipeline([
    pdp.ApplyByCols('count', set_size, 'size', drop=False),
])

agg_incident_data = pipeline.apply(df_state_incident)

agg_incident_data.fillna(0, inplace=True)
agg_incident_data = agg_incident_data.sort_values(by='date', ascending=True)
agg_incident_data.date = agg_incident_data.date.dt.strftime('%Y-%m-%d') # convert  to string object


fig = px.scatter_geo(
    agg_incident_data, locations="state", locationmode='USA-states',
    scope="usa",
    color="count", 
    size='size', hover_name="state", 
    range_color= [0, 2000], 
    projection="albers usa", animation_frame="date", 
    title='PulsePoint Incidents: Local Emergencies By State', 
    color_continuous_scale="portland"
    )

fig.show()

US States Geolocation

Scrape the US States data

# https://developers.google.com/public-data/docs/canonical/states_csv
state_coordinate = pd.read_html("https://developers.google.com/public-data/docs/canonical/states_csv")[0]
# US States with Total Incident Count
pulse_point_state_df = pulse_point_df.groupby(['state']).count()[['title']].reset_index().rename(columns={'title':'count'})

# Missing US States
state_coordinate[~state_coordinate.state.isin(pulse_point_state_df.state)].reset_index(drop=True)

# Filter US States
pulse_point_state_df = pulse_point_state_df.merge(state_coordinate, on='state', how='left')

# drop Canadian provinces
pulse_point_state_df.dropna(inplace=True)

Choropleth USA

choropleth_map_US

Script

url = (
    "https://raw.githubusercontent.com/python-visualization/folium/master/examples/data"
)

state_geo = f"{url}/us-states.json"
state_data = pulse_point_state_df.iloc[:,[0,1]]

m = folium.Map(location=[48, -102], zoom_start=4)

folium.Choropleth(
    geo_data=state_geo,
    name="choropleth",
    data=state_data,
    columns=["state", "count"],
    key_on="feature.id",
    fill_color="YlGn",
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name="Number of Incidents",
).add_to(m)

folium.LayerControl().add_to(m)

m

With Count marker

choropleth_map_US_marker

Script

# icon credit : https://icon-icons.com/icon/location-sos-phone-call-help/68848
# https://www.clipartmax.com/middle/m2H7i8G6N4H7b1N4_metallic-icon-royalty-free-cliparts-icone-sos-png/

# custom icon : https://stackoverflow.com/a/68992396/11105356

import folium

for i in range(0, len(pulse_point_state_df)):
  folium.Marker(
    location = [pulse_point_state_df.iloc[i]['latitude'], pulse_point_state_df.iloc[i]['longitude']],
    popup = folium.Popup(f"{pulse_point_state_df.iloc[i]['name']}\n{pulse_point_state_df.iloc[i]['count']}", parse_html=True),
    icon=folium.features.CustomIcon('https://i.postimg.cc/JhmnMQXj/sos.png', icon_size=(24, 31))
  ).add_to(m)
m

With Plotly

Script

# https://plotly.com/python/choropleth-maps

fig = go.Figure(data=go.Choropleth(
    locations=pulse_point_state_df['state'], # Spatial coordinates
    z = pulse_point_state_df['count'].astype(float), # Data to be color-coded
    locationmode = 'USA-states', # set of locations match entries in `locations`
    colorscale = 'Reds',
    colorbar_title = "Total Occurrences",
))

fig.update_layout(
    title_text = 'US PulsePoint Emergencies Occurrences by State',
    geo_scope='usa', # limite map scope to USA
)

fig.show()

Incident Date & Time

Daily

fre_daily

Weekly

fre_weekly

Day of Week

incident_by_day

Time of the Day

Time of the Day

Top ten emergencies during 'Midnight' or 'Morning' -

Midnight:

  • Medical Emergency
  • Traffic Collision
  • Fire Alarm
  • Alarm
  • Public Service
  • Structure Fire
  • Refuse/Garbage Fire
  • Mutual Aid
  • Residential Fire
  • Expanded Traffic Collision

Morning:

  • Medical Emergency
  • Traffic Collision
  • Fire Alarm
  • Public Service
  • Refuse/Garbage Fire
  • Structure Fire
  • Fire
  • Residential Fire
  • Mutual Aid
  • Lift Assist

Key Insights

  • Most of the incidents occurred in California
  • Most incidents happened during midnight and in the morning throughout the week
  • Most of the emergency engagement lasted under 30 mins
  • The highest number of incidents happened on Sunday
  • The incidents’ number got increased after Covid-19 lockdown
  • Medical emergency was the highest occurring incident which was followed by traffic collision and fire alarm
  • Montgomery County, Milwaukee Fire, and Columbus Fire were the top active agencies during the five monthly period