Skip to content

neelriyer/IAG-Insurance-Fraud-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IAG Insurance Fraud Prediction

Predicting Insurance fraud using a simulated dataset from Insurance Australia Group.

Notebook availiable here.

Notebook

log = pd.read_csv('Dataset/data.csv')
log.head(20)
message timestamp
0 8f70c7577be8483 - mobile_browser - Quote Started for customer: 99ccf1 1483192800.00
1 1368d40a4f6e455 - mobile_browser - Quote Completed for customer: 99ccf1 with json payload {'name': 'Nicole Berry', 'email': 'Nicole [email protected]', 'gender': 'male', 'age': 29, 'home': {'type': 1, 'square_footage': 311.80361967382737, 'number_of_bedrooms': 2, 'number_of_floors': 1}, 'household': [{'name': 'Oscar Berry', 'age': 25, 'gender': 'female'}, {'name': 'Mark Berry', 'age': 10, 'gender': 'female'}, {'name': 'Jacqueline Berry', 'age': 14, 'gender': 'male'}], 'address': '66 Lake Jamieview,PSC '} 1483193676.51
2 90527688b31d445 - mobile_browser - Claim Started for customer: 99ccf1 1483193794.69
3 c4013f44ea6d40c - mobile_browser - Payment Completed for customer: 99ccf1 1483193794.69
4 8045614075e7466 - pc_browser - Quote Started for customer: 9bae09 1483196400.00
5 6859e40fdc3f40d - pc_browser - Quote Completed for customer: 9bae09 with json payload {'name': 'Brandi Harris', 'email': 'Brandi [email protected]', 'gender': 'male', 'age': 62, 'home': {'type': 1, 'square_footage': 523.4329572865342, 'number_of_bedrooms': 2, 'number_of_floors': 1}, 'household': [{'name': 'Michael Harris', 'age': 12, 'gender': 'male'}, {'name': 'Michael Harris', 'age': 7, 'gender': 'male'}], 'address': '60 West Lisaside, Jamie Port Suite '} 1483197036.47
6 4c9ab2942b484f2 - pc_browser - Claim Started for customer: 9bae09 1483197184.51
7 07ba7defa6444ba - pc_browser - Payment Completed for customer: 9bae09 1483197184.51
8 cb6d5db3f2ce478 - pc_browser - Quote Started for customer: 12fdce 1483199674.68
9 0ddb305e024d49f - pc_browser - Quote Started for customer: b7aab4 1483200000.00
10 be4398c940284fe - pc_browser - Quote Completed for customer: b7aab4 with json payload {'name': 'Christopher Moody', 'email': 'Christopher [email protected]', 'gender': 'male', 'age': 40, 'home': {'type': 1, 'square_footage': 221.63326677793447, 'number_of_bedrooms': 3, 'number_of_floors': 1}, 'household': [], 'address': '120 Danielmouth,Unit Box '} 1483200799.03
11 cf1d5d9af6d54ef - pc_browser - Claim Started for customer: b7aab4 1483200904.55
12 7a1836f6aacf4ec - pc_browser - Payment Completed for customer: b7aab4 1483200904.55
13 697990436eb5484 - pc_browser - Quote Incomplete for customer: 12fdce with json payload {'name': 'Latoya Johnston', 'email': 'Latoya [email protected]', 'gender': 'male', 'age': 28, 'home': {'type': 1}, 'household': [], 'address': '136 Jonesberg,Unit Box '} 1483203274.68
14 e6cd8ce31a1d4d6 - mobile_browser - Claim Denied for customer: 99ccf1 - reason : fraud 1483203294.05
15 f21cd7a34cbc412 - pc_browser - Quote Started for customer: 2bc68b 1483203600.00
16 84618ef8bc28479 - mobile_browser - Claim Started for customer: 983092 1483203691.28
17 e67b69c9b4554c0 - pc_browser - Claim Denied for customer: b7aab4 - reason : fraud 1483204320.48
18 0baaef67fe8a458 - pc_browser - Quote Completed for customer: 2bc68b with json payload {'name': 'Loretta Steele', 'email': 'Loretta [email protected]', 'gender': 'female', 'age': 48, 'home': {'type': 1, 'square_footage': 301.854949906065, 'number_of_bedrooms': 3, 'number_of_floors': 1}, 'household': [{'name': 'Zachary Steele', 'age': 48, 'gender': 'male'}, {'name': 'Nicholas Steele', 'age': 9, 'gender': 'male'}, {'name': 'Michael Steele', 'age': 12, 'gender': 'female'}], 'address': '73 Edwardfurt, Michelle Crossing Suite '} 1483204668.62
19 a79b052cd8f2480 - pc_browser - Payment Completed for customer: 2bc68b 1483204804.68
# the log is sorted in icreasing order 
log['timestamp'].is_monotonic 
True

split the message column into multiple columns, transaction_id, action, customer_id, action, message

#%%timeit
start_time = time.time()

# create msgLog for manipulating message data
msgLog = pd.DataFrame()

#msgLog['message'] = log['message']
msgLog['transaction_id'], msgLog['device'], msgLog['message'] = log['message'].str.split('-',2).str
msgLog['action'], msgLog['message'] = msgLog['message'].str.split(':',1).str
msgLog['customer_id'], msgLog['message'] = msgLog['message'].str.split(n=1).str
msgLog['action'] = msgLog['action'].str.strip()

#msgLog['message'] = msgLog['message'].str.split('json payload',1).str[0]
msgLog['message'], msgLog['json_payload'] = msgLog['message'].str.split('json payload',1).str

msgLog['timestamp'] = log['timestamp']
print("%s seconds" %(time.time() - start_time))
65.64751410484314 seconds
msgLog.head()
transaction_id device message action customer_id json_payload timestamp
0 8f70c7577be8483 mobile_browser NaN Quote Started for customer 99ccf1 NaN 1483192800.00
1 1368d40a4f6e455 mobile_browser with Quote Completed for customer 99ccf1 {'name': 'Nicole Berry', 'email': 'Nicole [email protected]', 'gender': 'male', 'age': 29, 'home': {'type': 1, 'square_footage': 311.80361967382737, 'number_of_bedrooms': 2, 'number_of_floors': 1}, 'household': [{'name': 'Oscar Berry', 'age': 25, 'gender': 'female'}, {'name': 'Mark Berry', 'age': 10, 'gender': 'female'}, {'name': 'Jacqueline Berry', 'age': 14, 'gender': 'male'}], 'address': '66 Lake Jamieview,PSC '} 1483193676.51
2 90527688b31d445 mobile_browser NaN Claim Started for customer 99ccf1 NaN 1483193794.69
3 c4013f44ea6d40c mobile_browser NaN Payment Completed for customer 99ccf1 NaN 1483193794.69
4 8045614075e7466 pc_browser NaN Quote Started for customer 9bae09 NaN 1483196400.00

Drop the transaction column cos its fucking useless, can count number of transactions by actions

# all transaction id's are unique, not very useful
# msgLog.transaction_id.value_counts()
msgLog = msgLog.drop(columns=['transaction_id'])
# 621123 unique ids, note that some customer ids are repeated
len(msgLog.customer_id.value_counts())
621123

Sort values by customer id (thereby grouping by customer id, but not reducing rows)

msgLog = msgLog.sort_values(by=['customer_id', 'timestamp'], kind = 'mergesort')
msgLog = msgLog.reset_index(drop=True)
# add paid dataframe to msgLog
paid = pd.DataFrame()
msgLog['message'], msgLog['paid'] = msgLog['message'].str.split('$').str
msgLog['paid'] = pd.to_numeric(msgLog['paid'])
msgLog.message.value_counts()
- paid              816301
with                632998
- reason : fraud     10494
Name: message, dtype: int64
print(np.nanmin(msgLog['paid']), np.nanmax(msgLog['paid']), np.nanmean(msgLog['paid']))
2642.61 27097.44 8710.884242638436
# backup good msgLog
#1*
backupMsgLog = msgLog.copy()
msgLog.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
device message action customer_id json_payload timestamp paid
0 mobile_app NaN Quote Started for customer 000012 NaN 1520515553.72 nan
1 mobile_app with Quote Completed for customer 000012 {'name': 'Kimberly Mathews', 'email': 'Kimberly [email protected]', 'gender': 'male', 'age': 53, 'home': {'type': 1, 'square_footage': 346.20595281589794, 'number_of_bedrooms': 5, 'number_of_floors': 1}, 'household': [{'name': 'Kristen Mathews', 'age': 14, 'gender': 'female'}, {'name': 'Randy Mathews', 'age': 9, 'gender': 'male'}], 'address': '16 Reidville, Richard Pine Suite '} 1520516303.52 nan
2 mobile_app NaN Quote Started for customer 000023 NaN 1496195647.93 nan
3 mobile_app with Quote Completed for customer 000023 {'name': 'James Terrell', 'email': 'James [email protected]', 'gender': 'male', 'age': 30, 'home': {'type': 1, 'square_footage': 404.2134657047327, 'number_of_bedrooms': 2, 'number_of_floors': 1}, 'household': [], 'address': '114 Lake Matthewmouth,PSC '} 1496196488.27 nan
4 phone_call NaN Quote Started for customer 00003a NaN 1496494950.52 nan

See full code here.

About

Predicting Insurance fraud using a simulated dataset for IAG

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published