Skip to content

Latest commit

 

History

History
40 lines (28 loc) · 1.43 KB

readme.md

File metadata and controls

40 lines (28 loc) · 1.43 KB

Input data

Sample data

Sample data overview

  • Input data for sample code after chapter 2
  • Use data from Kaggle competition Prudential Life Insurance Assessment as a reference. Data was made artificially to simulate insurance underwriting data. The data construction was simple, so its structure is simpler than real life data.
  • Total of training and test data is 10000 lines

Sample data items

Column name Notes
age
gender
height
weight
product product type
amount insurance premium
date application date
medical_info_a1/a2/a3 medical information - continuous variable
medical_info_b1/b2/b3 medical information - continuous and catergorical variables
medical_info_c1/c2 medical information - continuous and catergorical variables
medical_keyword_1-10 medical information - binary variable
target target values (binary)

Input data used in chapter 1 (ch01-titanic)

Input data used in chapter 3 (ch03)

  • Data for explanations on how to combine different tables
  • Data for explanations on how to process time series data