As a security/defense analyst, we will try to find hot zone of terrorism across the world. We will create a Power BI Dashboard using the dataset Global Terrorism.
-
data preprocessing using Python Notebook: where we will split our one large 181691 records and 128 wide columns into star schema for better analysis and performance.
-
-
where we tried to provide general insights such as:
- total events.
- top attacks type and most used weapons.
- most active groups.
- top targeted nations and units, and reasons behind these attacks.
- Property Loss in USD per regions and countries.
- Time Trend Analysis on attacks per region and countries between years [1970 - 2017]. | 2014 was the most active year across all Time Period.
-
secondly we presented Countries Insights:
-
also we produced groups insights such as
- Active Years
- Total Attacks
- Success Rate
- Is the group use suicidal attacks as a mean for accomplishing goals by checking suicide attacks rate to total attacks.
- Top attacks type and favorite weapons per group.
- Most Targeted Institutions.
|
ETA
is one of the oldest Groups working between the years 1972 and 2010, working across all ofEurope
, with a total attacks of 1650 and a success Rate of 85.45%, they mostly attacked theCivil Guard and Police
Targets while mostly using Explosives and Firearms as their main weapons.
-
After Exploring the data and modeling the Data here is the star schema used for developing the dataset
Jupyter Notebook
import pandas as pd
import numpy as np
df = pd.read_csv('./Dataset/globalterrorismdb_0718dist.csv', encoding = "ISO-8859-1")
print(*df.columns, sep=' , ')
/var/folders/00/0s56_3496fb7660g8b7bwcyh0000gn/T/ipykernel_70881/2654467892.py:4: DtypeWarning: Columns (4,6,31,33,61,62,63,76,79,90,92,94,96,114,115,121) have mixed types. Specify dtype option on import or set low_memory=False.
df = pd.read_csv('./Dataset/globalterrorismdb_0718dist.csv', encoding = "ISO-8859-1")
eventid , iyear , imonth , iday , approxdate , extended , resolution , country , country_txt , region , region_txt , provstate , city , latitude , longitude , specificity , vicinity , location , summary , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , attacktype1 , attacktype1_txt , attacktype2 , attacktype2_txt , attacktype3 , attacktype3_txt , targtype1 , targtype1_txt , targsubtype1 , targsubtype1_txt , corp1 , target1 , natlty1 , natlty1_txt , targtype2 , targtype2_txt , targsubtype2 , targsubtype2_txt , corp2 , target2 , natlty2 , natlty2_txt , targtype3 , targtype3_txt , targsubtype3 , targsubtype3_txt , corp3 , target3 , natlty3 , natlty3_txt , gname , gsubname , gname2 , gsubname2 , gname3 , gsubname3 , motive , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , weaptype1 , weaptype1_txt , weapsubtype1 , weapsubtype1_txt , weaptype2 , weaptype2_txt , weapsubtype2 , weapsubtype2_txt , weaptype3 , weaptype3_txt , weapsubtype3 , weapsubtype3_txt , weaptype4 , weaptype4_txt , weapsubtype4 , weapsubtype4_txt , weapdetail , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , propcomment , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , ransomnote , hostkidoutcome , hostkidoutcome_txt , nreleased , addnotes , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related
df.shape
(181691, 135)
dim_location = df.loc[:,
['country_txt',
'region_txt',
'provstate',
'city']
]
replace_regex = '\(((\s*\w*\s*)*)\)|-*|\,|\'|\/|\.*|\(*\)*|\?|\=|\*'
dim_location['city'] = dim_location['city'].\
replace(to_replace=replace_regex, value="", regex=True).\
fillna('Unkonwn').\
str.replace('&', 'and').\
str.replace('Unkknown', 'Unknown').\
str.title().\
str.strip()
dim_location['provstate'] = df['provstate'].\
replace(to_replace=replace_regex, value="", regex=True).\
fillna('Unknown').\
str.replace('é', 'e').\
str.replace('&', 'and').\
str.replace('temouchent', 'tecmouchent').\
str.replace('Bosnia-Herzegovina', 'Bosnia And Herzegovina').\
str.replace('ñ', 'n').\
str.title().\
str.strip()
dim_location.drop_duplicates(inplace=True)
dim_location['Location Id'] = dim_location.reset_index(drop=True).index + 1
dim_location.rename(columns={
'country_txt': 'Country',
'region_txt': 'Region',
'provstate': 'Provenance/State',
'city': 'City',}, inplace=True)
print(dim_location.shape)
dim_location.head()
(40676, 5)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Country | Region | Provenance/State | City | Location Id | |
---|---|---|---|---|---|
0 | Dominican Republic | Central America & Caribbean | Unknown | Santo Domingo | 1 |
1 | Mexico | North America | Federal | Mexico City | 2 |
2 | Philippines | Southeast Asia | Tarlac | Unknown | 3 |
3 | Greece | Western Europe | Attica | Athens | 4 |
4 | Japan | East Asia | Fukouka | Fukouka | 5 |
df = df.merge(dim_location,
how='left',
left_on=['country_txt', 'region_txt', 'provstate', 'city'],
right_on=['Country', 'Region', 'Provenance/State', 'City',])
df.drop([
'country',
'country_txt',
'region',
'region_txt',
'provstate',
'city',
'latitude',
'longitude',
'Country',
'Region',
'Provenance/State',
'City'
], axis=1, inplace=True)
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eventid | iyear | imonth | iday | approxdate | extended | resolution | specificity | vicinity | location | ... | scite1 | scite2 | scite3 | dbsource | INT_LOG | INT_IDEO | INT_MISC | INT_ANY | related | Location Id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 197000000001 | 1970 | 7 | 2 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | NaN | NaN | NaN | PGIS | 0 | 0 | 0 | 0 | NaN | NaN |
1 | 197000000002 | 1970 | 0 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | NaN | NaN | NaN | PGIS | 0 | 1 | 1 | 1 | NaN | NaN |
2 | 197001000001 | 1970 | 1 | 0 | NaN | 0 | NaN | 4.0 | 0 | NaN | ... | NaN | NaN | NaN | PGIS | -9 | -9 | 1 | 1 | NaN | 3.0 |
3 | 197001000002 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | NaN | NaN | NaN | PGIS | -9 | -9 | 1 | 1 | NaN | 4.0 |
4 | 197001000003 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | NaN | NaN | NaN | PGIS | -9 | -9 | 1 | 1 | NaN | 5.0 |
5 rows × 128 columns
print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 128)
eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , location , summary , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , attacktype1 , attacktype1_txt , attacktype2 , attacktype2_txt , attacktype3 , attacktype3_txt , targtype1 , targtype1_txt , targsubtype1 , targsubtype1_txt , corp1 , target1 , natlty1 , natlty1_txt , targtype2 , targtype2_txt , targsubtype2 , targsubtype2_txt , corp2 , target2 , natlty2 , natlty2_txt , targtype3 , targtype3_txt , targsubtype3 , targsubtype3_txt , corp3 , target3 , natlty3 , natlty3_txt , gname , gsubname , gname2 , gsubname2 , gname3 , gsubname3 , motive , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , weaptype1 , weaptype1_txt , weapsubtype1 , weapsubtype1_txt , weaptype2 , weaptype2_txt , weapsubtype2 , weapsubtype2_txt , weaptype3 , weaptype3_txt , weapsubtype3 , weapsubtype3_txt , weaptype4 , weaptype4_txt , weapsubtype4 , weapsubtype4_txt , weapdetail , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , propcomment , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , ransomnote , hostkidoutcome , hostkidoutcome_txt , nreleased , addnotes , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id
dim_attack_type = df.loc[:, ['attacktype1_txt']]
dim_attack_type['attacktype1_txt'].fillna('Unknown', inplace=True)
dim_attack_type.drop_duplicates(inplace=True)
dim_attack_type['Attack Type Id'] = dim_attack_type.reset_index(drop=True).index + 1
dim_attack_type.rename(columns={'attacktype1_txt': 'Attack Type'}, inplace=True)
print(dim_attack_type.shape)
dim_attack_type
(9, 2)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Attack Type | Attack Type Id | |
---|---|---|
0 | Assassination | 1 |
1 | Hostage Taking (Kidnapping) | 2 |
3 | Bombing/Explosion | 3 |
4 | Facility/Infrastructure Attack | 4 |
5 | Armed Assault | 5 |
12 | Hijacking | 6 |
16 | Unknown | 7 |
298 | Unarmed Assault | 8 |
336 | Hostage Taking (Barricade Incident) | 9 |
df = df.merge(dim_attack_type,
how='left',
left_on=['attacktype1_txt'],
right_on=['Attack Type'])
df.drop([
'attacktype1',
'attacktype1_txt',
'attacktype2',
'attacktype2_txt',
'attacktype3',
'attacktype3_txt',
'Attack Type'], axis=1, inplace=True)
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eventid | iyear | imonth | iday | approxdate | extended | resolution | specificity | vicinity | location | ... | scite2 | scite3 | dbsource | INT_LOG | INT_IDEO | INT_MISC | INT_ANY | related | Location Id | Attack Type Id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 197000000001 | 1970 | 7 | 2 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | NaN | NaN | PGIS | 0 | 0 | 0 | 0 | NaN | NaN | 1 |
1 | 197000000002 | 1970 | 0 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | NaN | NaN | PGIS | 0 | 1 | 1 | 1 | NaN | NaN | 2 |
2 | 197001000001 | 1970 | 1 | 0 | NaN | 0 | NaN | 4.0 | 0 | NaN | ... | NaN | NaN | PGIS | -9 | -9 | 1 | 1 | NaN | 3.0 | 1 |
3 | 197001000002 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | NaN | NaN | PGIS | -9 | -9 | 1 | 1 | NaN | 4.0 | 3 |
4 | 197001000003 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | NaN | NaN | PGIS | -9 | -9 | 1 | 1 | NaN | 5.0 | 4 |
5 rows × 123 columns
print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 123)
eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , location , summary , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , targtype1 , targtype1_txt , targsubtype1 , targsubtype1_txt , corp1 , target1 , natlty1 , natlty1_txt , targtype2 , targtype2_txt , targsubtype2 , targsubtype2_txt , corp2 , target2 , natlty2 , natlty2_txt , targtype3 , targtype3_txt , targsubtype3 , targsubtype3_txt , corp3 , target3 , natlty3 , natlty3_txt , gname , gsubname , gname2 , gsubname2 , gname3 , gsubname3 , motive , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , weaptype1 , weaptype1_txt , weapsubtype1 , weapsubtype1_txt , weaptype2 , weaptype2_txt , weapsubtype2 , weapsubtype2_txt , weaptype3 , weaptype3_txt , weapsubtype3 , weapsubtype3_txt , weaptype4 , weaptype4_txt , weapsubtype4 , weapsubtype4_txt , weapdetail , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , propcomment , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , ransomnote , hostkidoutcome , hostkidoutcome_txt , nreleased , addnotes , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id
dim_target_type = df.loc[:, ['targtype1_txt', 'targsubtype1_txt']]
dim_target_type['targtype1_txt'].fillna('Unknown', inplace=True)
dim_target_type['targsubtype1_txt'].fillna('Unknown', inplace=True)
dim_target_type.drop_duplicates(inplace=True)
dim_target_type['Target Type Id'] = dim_target_type.reset_index(drop=True).index + 1
dim_target_type.rename(columns={
'targtype1_txt': 'Target Type',
'targsubtype1_txt': 'Target Subtype',
}, inplace=True)
print(dim_target_type.shape)
dim_target_type.head()
(139, 3)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Target Type | Target Subtype | Target Type Id | |
---|---|---|---|
0 | Private Citizens & Property | Named Civilian | 1 |
1 | Government (Diplomatic) | Diplomatic Personnel (outside of embassy, cons... | 2 |
2 | Journalists & Media | Radio Journalist/Staff/Facility | 3 |
3 | Government (Diplomatic) | Embassy/Consulate | 4 |
5 | Police | Police Building (headquarters, station, school) | 5 |
dim_target = df.loc[:, [
'target1',
'corp1',
'natlty1_txt'
]]
dim_target['target1'].fillna('Unknown', inplace=True)
dim_target['corp1'].fillna('Unknown', inplace=True)
dim_target['corp1'] = dim_target['corp1'].str.replace('\"', '')
dim_target['natlty1_txt'].fillna('Unknown', inplace=True)
dim_target.drop_duplicates(inplace=True)
dim_target['Target Id'] = dim_target.reset_index(drop=True).index + 1
dim_target.rename(columns={
'target1': 'Target',
'corp1': 'Corporate',
'natlty1_txt': 'Nationality'
}, inplace=True)
print(dim_target.shape)
dim_target.head()
(114866, 4)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Target | Corporate | Nationality | Target Id | |
---|---|---|---|---|
0 | Julio Guzman | Unknown | Dominican Republic | 1 |
1 | Nadine Chaval, daughter | Belgian Ambassador Daughter | Belgium | 2 |
2 | Employee | Voice of America | United States | 3 |
3 | U.S. Embassy | Unknown | United States | 4 |
4 | U.S. Consulate | Unknown | United States | 5 |
df = df.merge(dim_target_type,
how='left',
left_on=['targtype1_txt', 'targsubtype1_txt'],
right_on=['Target Type', 'Target Subtype'])
df = df.merge(dim_target,
how='left',
left_on=['target1', 'corp1', 'natlty1_txt'],
right_on=['Target', 'Corporate', 'Nationality'])
df.drop([
'targtype1', 'targtype1_txt', 'targsubtype1', 'targsubtype1_txt',
'corp1', 'target1', 'natlty1', 'natlty1_txt',
'targtype2', 'targtype2_txt', 'targsubtype2', 'targsubtype2_txt',
'corp2', 'target2', 'natlty2', 'natlty2_txt',
'targtype3', 'targtype3_txt', 'targsubtype3', 'targsubtype3_txt',
'corp3', 'target3', 'natlty3', 'natlty3_txt',
'Target Type', 'Target Subtype',
'Target', 'Corporate', 'Nationality'
], axis=1, inplace=True)
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eventid | iyear | imonth | iday | approxdate | extended | resolution | specificity | vicinity | location | ... | dbsource | INT_LOG | INT_IDEO | INT_MISC | INT_ANY | related | Location Id | Attack Type Id | Target Type Id | Target Id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 197000000001 | 1970 | 7 | 2 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | PGIS | 0 | 0 | 0 | 0 | NaN | NaN | 1 | 1.0 | NaN |
1 | 197000000002 | 1970 | 0 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | PGIS | 0 | 1 | 1 | 1 | NaN | NaN | 2 | 2.0 | 2.0 |
2 | 197001000001 | 1970 | 1 | 0 | NaN | 0 | NaN | 4.0 | 0 | NaN | ... | PGIS | -9 | -9 | 1 | 1 | NaN | 3.0 | 1 | 3.0 | 3.0 |
3 | 197001000002 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | PGIS | -9 | -9 | 1 | 1 | NaN | 4.0 | 3 | 4.0 | NaN |
4 | 197001000003 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | PGIS | -9 | -9 | 1 | 1 | NaN | 5.0 | 4 | 4.0 | NaN |
5 rows × 101 columns
print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 101)
eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , location , summary , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , gname , gsubname , gname2 , gsubname2 , gname3 , gsubname3 , motive , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , weaptype1 , weaptype1_txt , weapsubtype1 , weapsubtype1_txt , weaptype2 , weaptype2_txt , weapsubtype2 , weapsubtype2_txt , weaptype3 , weaptype3_txt , weapsubtype3 , weapsubtype3_txt , weaptype4 , weaptype4_txt , weapsubtype4 , weapsubtype4_txt , weapdetail , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , propcomment , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , ransomnote , hostkidoutcome , hostkidoutcome_txt , nreleased , addnotes , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id , Target Type Id , Target Id
dim_weapon = df.loc[:, [
'weaptype1_txt',
'weapsubtype1_txt'
]]
dim_weapon['weaptype1_txt'].fillna('Unknown', inplace=True)
dim_weapon['weapsubtype1_txt'].fillna('Unknown', inplace=True)
dim_weapon.drop_duplicates(inplace=True)
dim_weapon['Weapon Id'] = dim_weapon.reset_index(drop=True).index + 1
dim_weapon.rename(columns={
'weaptype1_txt': 'Weapon Type',
'weapsubtype1_txt': 'Weapon Subtype'
}, inplace=True)
print(dim_weapon.shape)
dim_weapon.head()
(45, 3)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Weapon Type | Weapon Subtype | Weapon Id | |
---|---|---|---|
0 | Unknown | Unknown | 1 |
3 | Explosives | Unknown Explosive Type | 2 |
4 | Incendiary | Unknown | 3 |
5 | Firearms | Unknown Gun Type | 4 |
6 | Firearms | Automatic or Semi-Automatic Rifle | 5 |
df = df.merge(dim_weapon,
how='left',
left_on=['weaptype1_txt','weapsubtype1_txt'],
right_on=['Weapon Type', 'Weapon Subtype'])
df.drop([
'weaptype1', 'weaptype1_txt', 'weapsubtype1', 'weapsubtype1_txt',
'weaptype2', 'weaptype2_txt', 'weapsubtype2', 'weapsubtype2_txt',
'weaptype3', 'weaptype3_txt', 'weapsubtype3', 'weapsubtype3_txt',
'weaptype4', 'weaptype4_txt', 'weapsubtype4', 'weapsubtype4_txt',
'Weapon Type', 'Weapon Subtype'
], axis=1, inplace=True)
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eventid | iyear | imonth | iday | approxdate | extended | resolution | specificity | vicinity | location | ... | INT_LOG | INT_IDEO | INT_MISC | INT_ANY | related | Location Id | Attack Type Id | Target Type Id | Target Id | Weapon Id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 197000000001 | 1970 | 7 | 2 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | 0 | 0 | 0 | 0 | NaN | NaN | 1 | 1.0 | NaN | NaN |
1 | 197000000002 | 1970 | 0 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | 0 | 1 | 1 | 1 | NaN | NaN | 2 | 2.0 | 2.0 | NaN |
2 | 197001000001 | 1970 | 1 | 0 | NaN | 0 | NaN | 4.0 | 0 | NaN | ... | -9 | -9 | 1 | 1 | NaN | 3.0 | 1 | 3.0 | 3.0 | NaN |
3 | 197001000002 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | -9 | -9 | 1 | 1 | NaN | 4.0 | 3 | 4.0 | NaN | 2.0 |
4 | 197001000003 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | -9 | -9 | 1 | 1 | NaN | 5.0 | 4 | 4.0 | NaN | NaN |
5 rows × 86 columns
print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 86)
eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , location , summary , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , gname , gsubname , gname2 , gsubname2 , gname3 , gsubname3 , motive , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , weapdetail , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , propcomment , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , ransomnote , hostkidoutcome , hostkidoutcome_txt , nreleased , addnotes , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id
dim_group = df.loc[:, ['gname']]
dim_group['gname'].fillna('Unknown', inplace=True)
dim_group.drop_duplicates(inplace=True)
dim_group['Group Id'] = dim_group.reset_index(drop=True).index + 1
dim_group.rename(columns={
'gname': 'Group',
}, inplace=True)
print(dim_group.shape)
dim_group.head()
(3537, 2)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Group | Group Id | |
---|---|---|
0 | MANO-D | 1 |
1 | 23rd of September Communist League | 2 |
2 | Unknown | 3 |
5 | Black Nationalists | 4 |
6 | Tupamaros (Uruguay) | 5 |
df = df.merge(dim_group, how='left', left_on=['gname'], right_on=['Group'])
df.drop([
'gname',
'gsubname',
'gname2',
'gsubname2',
'gname3',
'gsubname3',
'Group'
], axis=1, inplace=True)
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eventid | iyear | imonth | iday | approxdate | extended | resolution | specificity | vicinity | location | ... | INT_IDEO | INT_MISC | INT_ANY | related | Location Id | Attack Type Id | Target Type Id | Target Id | Weapon Id | Group Id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 197000000001 | 1970 | 7 | 2 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | 0 | 0 | 0 | NaN | NaN | 1 | 1.0 | NaN | NaN | 1 |
1 | 197000000002 | 1970 | 0 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | 1 | 1 | 1 | NaN | NaN | 2 | 2.0 | 2.0 | NaN | 2 |
2 | 197001000001 | 1970 | 1 | 0 | NaN | 0 | NaN | 4.0 | 0 | NaN | ... | -9 | 1 | 1 | NaN | 3.0 | 1 | 3.0 | 3.0 | NaN | 3 |
3 | 197001000002 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | -9 | 1 | 1 | NaN | 4.0 | 3 | 4.0 | NaN | 2.0 | 3 |
4 | 197001000003 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | NaN | ... | -9 | 1 | 1 | NaN | 5.0 | 4 | 4.0 | NaN | NaN | 3 |
5 rows × 81 columns
print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 81)
eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , location , summary , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , motive , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , weapdetail , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , propcomment , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , ransomnote , hostkidoutcome , hostkidoutcome_txt , nreleased , addnotes , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id , Group Id
dim_event_desc = df.loc[:, [
'eventid',
'location',
'summary',
'motive',
'weapdetail',
'ransomnote',
'propcomment',
'addnotes'
]]
dim_event_desc.rename(columns={
'location': 'Location',
'summary': 'Summary',
'motive': 'Motive',
'weapdetail': 'Weapon Details',
'ransomnote': 'Ransom Note',
'propcomment': 'Property Damage Comment',
'addnotes': 'Notes'
}, inplace=True)
print(dim_event_desc.shape)
dim_event_desc.head()
(181691, 8)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eventid | Location | Summary | Motive | Weapon Details | Ransom Note | Property Damage Comment | Notes | |
---|---|---|---|---|---|---|---|---|
0 | 197000000001 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 197000000002 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 197001000001 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 197001000002 | NaN | NaN | NaN | Explosive | NaN | NaN | NaN |
4 | 197001000003 | NaN | NaN | NaN | Incendiary | NaN | NaN | NaN |
df.drop([
'location',
'summary',
'motive',
'weapdetail',
'ransomnote',
'propcomment',
'addnotes'
], axis=1, inplace=True)
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eventid | iyear | imonth | iday | approxdate | extended | resolution | specificity | vicinity | crit1 | ... | INT_IDEO | INT_MISC | INT_ANY | related | Location Id | Attack Type Id | Target Type Id | Target Id | Weapon Id | Group Id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 197000000001 | 1970 | 7 | 2 | NaN | 0 | NaN | 1.0 | 0 | 1 | ... | 0 | 0 | 0 | NaN | NaN | 1 | 1.0 | NaN | NaN | 1 |
1 | 197000000002 | 1970 | 0 | 0 | NaN | 0 | NaN | 1.0 | 0 | 1 | ... | 1 | 1 | 1 | NaN | NaN | 2 | 2.0 | 2.0 | NaN | 2 |
2 | 197001000001 | 1970 | 1 | 0 | NaN | 0 | NaN | 4.0 | 0 | 1 | ... | -9 | 1 | 1 | NaN | 3.0 | 1 | 3.0 | 3.0 | NaN | 3 |
3 | 197001000002 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | 1 | ... | -9 | 1 | 1 | NaN | 4.0 | 3 | 4.0 | NaN | 2.0 | 3 |
4 | 197001000003 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | 1 | ... | -9 | 1 | 1 | NaN | 5.0 | 4 | 4.0 | NaN | NaN | 3 |
5 rows × 74 columns
print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 74)
eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , hostkidoutcome , hostkidoutcome_txt , nreleased , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id , Group Id
dim_claim_mode = df.loc[:, ['claimmode_txt']]
dim_claim_mode['claimmode_txt'].fillna('Unknown', inplace=True)
dim_claim_mode.drop_duplicates(inplace=True)
dim_claim_mode['Claim Mode Id'] = dim_claim_mode.reset_index(drop=True).index + 1
dim_claim_mode.rename(columns={'claimmode_txt': 'Calim Mode'}, inplace=True)
print(dim_claim_mode.shape)
dim_claim_mode
(10, 2)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Calim Mode | Claim Mode Id | |
---|---|---|
0 | Unknown | 1 |
8 | Letter | 2 |
129 | Call (pre-incident) | 3 |
139 | Other | 4 |
225 | Personal claim | 5 |
606 | Note left at scene | 6 |
786 | Call (post-incident) | 7 |
68253 | Posted to website, blog, etc. | 8 |
68281 | 9 | |
70371 | Video | 10 |
df = df.merge(dim_claim_mode, how='left', left_on=['claimmode_txt'], right_on=['Calim Mode'])
df.drop(['claimmode_txt', 'Calim Mode'], axis=1, inplace=True)
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eventid | iyear | imonth | iday | approxdate | extended | resolution | specificity | vicinity | crit1 | ... | INT_MISC | INT_ANY | related | Location Id | Attack Type Id | Target Type Id | Target Id | Weapon Id | Group Id | Claim Mode Id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 197000000001 | 1970 | 7 | 2 | NaN | 0 | NaN | 1.0 | 0 | 1 | ... | 0 | 0 | NaN | NaN | 1 | 1.0 | NaN | NaN | 1 | NaN |
1 | 197000000002 | 1970 | 0 | 0 | NaN | 0 | NaN | 1.0 | 0 | 1 | ... | 1 | 1 | NaN | NaN | 2 | 2.0 | 2.0 | NaN | 2 | NaN |
2 | 197001000001 | 1970 | 1 | 0 | NaN | 0 | NaN | 4.0 | 0 | 1 | ... | 1 | 1 | NaN | 3.0 | 1 | 3.0 | 3.0 | NaN | 3 | NaN |
3 | 197001000002 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | 1 | ... | 1 | 1 | NaN | 4.0 | 3 | 4.0 | NaN | 2.0 | 3 | NaN |
4 | 197001000003 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | 1 | ... | 1 | 1 | NaN | 5.0 | 4 | 4.0 | NaN | NaN | 3 | NaN |
5 rows × 74 columns
print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 74)
eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , hostkidoutcome , hostkidoutcome_txt , nreleased , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id , Group Id , Claim Mode Id
fact_kidnapping = df.loc[(df['ishostkid'] == 1), [
'eventid',
'nhostkid',
'nhostkidus',
'nhours',
'ndays',
'divert',
'kidhijcountry',
'ransom',
'ransomamt',
'ransomamtus',
'ransompaid',
'ransompaidus',
'hostkidoutcome_txt',
'nreleased'
]]
fact_kidnapping = fact_kidnapping.apply(lambda x: x.fillna(np.nan) if x.dtype.kind in 'biufc' else x.fillna('Unknown'))
fact_kidnapping.drop_duplicates(inplace=True)
fact_kidnapping.rename(columns={
'nhostkid': 'Number Of Hostages Killed',
'nhostkidus': 'Number Of Hostages Killed (US)',
'nhours': 'Hours of Kidnapping',
'ndays': 'Days of Kidnapping',
'divert': 'Country That Kidnappers Diverted To',
'kidhijcountry': 'Country of Kidnapping Resolution',
'ransom': 'Ransom Demanded?',
'ransomamt': 'Ransom Amount',
'ransomamtus': 'Ransom Amount (US)',
'ransompaid': 'Ransom Paid',
'ransompaidus': 'Ransom Paid (US)',
'hostkidoutcome_txt': 'Kidnapping Outcome',
'nreleased': 'Number Released/Escaped/Rescued'
}, inplace=True)
fact_kidnapping
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eventid | Number Of Hostages Killed | Number Of Hostages Killed (US) | Hours of Kidnapping | Days of Kidnapping | Country That Kidnappers Diverted To | Country of Kidnapping Resolution | Ransom Demanded? | Ransom Amount | Ransom Amount (US) | Ransom Paid | Ransom Paid (US) | Kidnapping Outcome | Number Released/Escaped/Rescued | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 197000000002 | 1.0 | 0.0 | NaN | NaN | Unknown | Mexico | 1.0 | 800000.0 | NaN | NaN | NaN | Unknown | NaN |
27 | 197001220001 | 1.0 | 0.0 | NaN | NaN | Unknown | Venezuela | 1.0 | 110000.0 | NaN | 110000.0 | NaN | Hostage(s) released by perpetrators | 1.0 |
103 | 197003030001 | 1.0 | 0.0 | NaN | NaN | Unknown | Spain | 1.0 | 0.0 | NaN | NaN | NaN | Unknown | NaN |
112 | 197003060001 | 1.0 | 1.0 | 0.0 | 2.0 | Unknown | Guatemala | 1.0 | 0.0 | NaN | NaN | NaN | Hostage(s) released by perpetrators | 1.0 |
123 | 197003110001 | 1.0 | 0.0 | 0.0 | 4.0 | Unknown | Brazil | 1.0 | 0.0 | NaN | NaN | NaN | Hostage(s) released by perpetrators | 1.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
181620 | 201712280040 | 3.0 | 0.0 | NaN | 2.0 | Unknown | Unknown | 0.0 | NaN | NaN | NaN | NaN | Successful Rescue | 3.0 |
181630 | 201712290008 | 1.0 | 0.0 | NaN | -99.0 | Unknown | Unknown | 0.0 | NaN | NaN | NaN | NaN | Unknown | -99.0 |
181636 | 201712290017 | 12.0 | 0.0 | NaN | -99.0 | Unknown | Unknown | 0.0 | NaN | NaN | NaN | NaN | Unknown | -99.0 |
181660 | 201712300018 | 11.0 | 0.0 | -99.0 | NaN | Unknown | Unknown | 0.0 | NaN | NaN | NaN | NaN | Successful Rescue | 11.0 |
181684 | 201712310019 | 1.0 | 0.0 | 14.0 | NaN | Unknown | Unknown | 0.0 | NaN | NaN | NaN | NaN | Successful Rescue | 1.0 |
13572 rows × 14 columns
df.drop([
'ishostkid' ,
'nhostkid',
'nhostkidus',
'nhours',
'ndays',
'divert',
'kidhijcountry',
'ransom',
'ransomamt',
'ransomamtus',
'ransompaid',
'ransompaidus',
'hostkidoutcome',
'hostkidoutcome_txt',
'nreleased'
], axis=1, inplace=True)
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eventid | iyear | imonth | iday | approxdate | extended | resolution | specificity | vicinity | crit1 | ... | INT_MISC | INT_ANY | related | Location Id | Attack Type Id | Target Type Id | Target Id | Weapon Id | Group Id | Claim Mode Id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 197000000001 | 1970 | 7 | 2 | NaN | 0 | NaN | 1.0 | 0 | 1 | ... | 0 | 0 | NaN | NaN | 1 | 1.0 | NaN | NaN | 1 | NaN |
1 | 197000000002 | 1970 | 0 | 0 | NaN | 0 | NaN | 1.0 | 0 | 1 | ... | 1 | 1 | NaN | NaN | 2 | 2.0 | 2.0 | NaN | 2 | NaN |
2 | 197001000001 | 1970 | 1 | 0 | NaN | 0 | NaN | 4.0 | 0 | 1 | ... | 1 | 1 | NaN | 3.0 | 1 | 3.0 | 3.0 | NaN | 3 | NaN |
3 | 197001000002 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | 1 | ... | 1 | 1 | NaN | 4.0 | 3 | 4.0 | NaN | 2.0 | 3 | NaN |
4 | 197001000003 | 1970 | 1 | 0 | NaN | 0 | NaN | 1.0 | 0 | 1 | ... | 1 | 1 | NaN | 5.0 | 4 | 4.0 | NaN | NaN | 3 | NaN |
5 rows × 59 columns
print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 59)
eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id , Group Id , Claim Mode Id
df.drop([
'iyear', 'imonth', 'iday', 'approxdate',
'extended', 'resolution', 'specificity', 'vicinity',
'crit1', 'crit2', 'crit3', 'doubtterr', 'alternative', 'alternative_txt', 'multiple',
'guncertain1', 'guncertain2', 'guncertain3', 'compclaim', 'propextent',
'claimmode', 'claim2' , 'claimmode2' , 'claimmode2_txt' , 'claim3' , 'claimmode3' , 'claimmode3_txt',
'scite1' , 'scite2' , 'scite3' , 'dbsource' , 'INT_LOG' , 'INT_IDEO' , 'INT_MISC' , 'INT_ANY' , 'related'
], axis=1, inplace=True)
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eventid | success | suicide | individual | nperps | nperpcap | claimed | nkill | nkillus | nkillter | ... | property | propextent_txt | propvalue | Location Id | Attack Type Id | Target Type Id | Target Id | Weapon Id | Group Id | Claim Mode Id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 197000000001 | 1 | 0 | 0 | NaN | NaN | NaN | 1.0 | NaN | NaN | ... | 0 | NaN | NaN | NaN | 1 | 1.0 | NaN | NaN | 1 | NaN |
1 | 197000000002 | 1 | 0 | 0 | 7.0 | NaN | NaN | 0.0 | NaN | NaN | ... | 0 | NaN | NaN | NaN | 2 | 2.0 | 2.0 | NaN | 2 | NaN |
2 | 197001000001 | 1 | 0 | 0 | NaN | NaN | NaN | 1.0 | NaN | NaN | ... | 0 | NaN | NaN | 3.0 | 1 | 3.0 | 3.0 | NaN | 3 | NaN |
3 | 197001000002 | 1 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1 | NaN | NaN | 4.0 | 3 | 4.0 | NaN | 2.0 | 3 | NaN |
4 | 197001000003 | 1 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1 | NaN | NaN | 5.0 | 4 | 4.0 | NaN | NaN | 3 | NaN |
5 rows × 23 columns
print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 23)
eventid , success , suicide , individual , nperps , nperpcap , claimed , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent_txt , propvalue , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id , Group Id , Claim Mode Id
df.rename(columns={
'success': 'Successful Attack?',
'suicide': 'Suicide Attack?',
'individual': 'Unaffiliated Individual(s)',
'nperps': 'Number of Perpetrators',
'nperpcap': 'Number of Perpetrators Captured',
'claimed': 'Claim of Responsibility?',
'nkill': 'Number of Fatalities',
'nkillus': 'Number of Fatalities (US)',
'nkillter': 'Number of Perpetrator Fatalities',
'nwound': 'Number of Injured',
'nwoundus': 'Number of Injured (US)',
'nwoundte': 'Number of Perpetrators Injured',
'property': 'Property Damage?',
'propextent_txt': 'Extent of Property Damage',
'propvalue': 'Value of Property Damage (USD)'
}, inplace=True)
print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 23)
eventid , Successful Attack? , Suicide Attack? , Unaffiliated Individual(s) , Number of Perpetrators , Number of Perpetrators Captured , Claim of Responsibility? , Number of Fatalities , Number of Fatalities (US) , Number of Perpetrator Fatalities , Number of Injured , Number of Injured (US) , Number of Perpetrators Injured , Property Damage? , Extent of Property Damage , Value of Property Damage (USD) , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id , Group Id , Claim Mode Id
# reorder columns
df = df.loc[:,[
'eventid',
'Location Id',
'Attack Type Id',
'Group Id',
'Target Type Id',
'Target Id',
'Weapon Id',
'Claim Mode Id',
'Successful Attack?',
'Suicide Attack?',
'Unaffiliated Individual(s)',
'Number of Perpetrators',
'Number of Perpetrators Captured',
'Claim of Responsibility?',
'Number of Fatalities',
'Number of Fatalities (US)',
'Number of Perpetrator Fatalities',
'Number of Injured',
'Number of Injured (US)',
'Number of Perpetrators Injured',
'Property Damage?',
'Extent of Property Damage',
'Value of Property Damage (USD)'
]]
df = df.apply(lambda x: x.fillna(np.nan) if x.dtype.kind in 'biufc' else x.fillna('Unknown'))
df.loc[
(df['Number of Perpetrators'] == -99.0) | (df['Number of Perpetrators'] == -9.0),
'Number of Perpetrators'] = np.nan
df.loc[
(df['Number of Perpetrators Captured'] == -99.0) | (df['Number of Perpetrators Captured'] == -9.0),
'Number of Perpetrators Captured'] = np.nan
df.loc[(df['Value of Property Damage (USD)'] == -99.0),'Value of Property Damage (USD)'] = np.nan
df['Claim of Responsibility?'].replace({-9.0: 'Unknown', np.nan: 'Unknown', 0: 'No', 1: 'Yes'}, inplace=True)
df['Property Damage?'].replace({-9.0: 'Unknown', np.nan: 'Unknown', 0: 'No', 1: 'Yes'}, inplace=True)
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
eventid | Location Id | Attack Type Id | Group Id | Target Type Id | Target Id | Weapon Id | Claim Mode Id | Successful Attack? | Suicide Attack? | ... | Claim of Responsibility? | Number of Fatalities | Number of Fatalities (US) | Number of Perpetrator Fatalities | Number of Injured | Number of Injured (US) | Number of Perpetrators Injured | Property Damage? | Extent of Property Damage | Value of Property Damage (USD) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 197000000001 | NaN | 1 | 1 | 1.0 | NaN | NaN | NaN | 1 | 0 | ... | Unknown | 1.0 | NaN | NaN | 0.0 | NaN | NaN | No | Unknown | NaN |
1 | 197000000002 | NaN | 2 | 2 | 2.0 | 2.0 | NaN | NaN | 1 | 0 | ... | Unknown | 0.0 | NaN | NaN | 0.0 | NaN | NaN | No | Unknown | NaN |
2 | 197001000001 | 3.0 | 1 | 3 | 3.0 | 3.0 | NaN | NaN | 1 | 0 | ... | Unknown | 1.0 | NaN | NaN | 0.0 | NaN | NaN | No | Unknown | NaN |
3 | 197001000002 | 4.0 | 3 | 3 | 4.0 | NaN | 2.0 | NaN | 1 | 0 | ... | Unknown | NaN | NaN | NaN | NaN | NaN | NaN | Yes | Unknown | NaN |
4 | 197001000003 | 5.0 | 4 | 3 | 4.0 | NaN | NaN | NaN | 1 | 0 | ... | Unknown | NaN | NaN | NaN | NaN | NaN | NaN | Yes | Unknown | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
181686 | 201712310022 | 40675.0 | 5 | 2700 | 123.0 | 87673.0 | 4.0 | 1.0 | 1 | 0 | ... | Yes | 1.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | Unknown | Unknown | NaN |
181687 | 201712310029 | 28343.0 | 3 | 599 | 10.0 | 114865.0 | 17.0 | NaN | 1 | 0 | ... | No | 2.0 | 0.0 | 0.0 | 7.0 | 0.0 | 0.0 | Yes | Unknown | NaN |
181688 | 201712310030 | 40676.0 | 4 | 2977 | 18.0 | 103106.0 | 9.0 | NaN | 1 | 0 | ... | No | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Yes | Unknown | NaN |
181689 | 201712310031 | 1592.0 | 3 | 3 | 9.0 | 114866.0 | 24.0 | NaN | 0 | 0 | ... | No | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Unknown | Unknown | NaN |
181690 | 201712310032 | 1795.0 | 3 | 3 | NaN | 52139.0 | 2.0 | NaN | 0 | 0 | ... | No | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | No | Unknown | NaN |
181691 rows × 23 columns
dim_location.to_csv('dim_location.csv', index=False)
dim_attack_type.to_csv('dim_attack_type.csv', index=False)
dim_target_type.to_csv('dim_target_type.csv', index=False)
dim_target.to_csv('dim_target.csv', index=False)
dim_weapon.to_csv('dim_weapon.csv', index=False)
dim_group.to_csv('dim_group.csv', index=False)
dim_event_desc.to_csv('dim_event_desc.csv', index=False)
dim_claim_mode.to_csv('dim_claim_mode.csv', index=False)
fact_kidnapping.to_csv('fact_kidnapping.csv', index=False)
df.to_csv('fact_attack_event.csv', index=False)
Due to limitation of sharing Power BI dashboard, I have will share the screenshots of the dashboard and the link to the dashboard. GlobalTerrorismDashboard.pbix
Please refer to the
- GlobalTerrorismDashboard.pbix for more details.
- The dashboard is interactive, you can click on any of the charts to filter the data.
- The Codebook for more details about the data, its collection process and the data dictionary.
P.S: This Dashboard was developed as part of the Spark Foundation Internship Program, doesn't represent the views of the Analyst. and it is not intended to be used for any other purpose except as a show case study.