A deep learning project to predict if a customer would subsribe a term deposit of a bank or not with Tensorflow in Principles and Practice in Data Mining (Fall 2017)
-
Project Objective
- Prediction based on the classification with logistic regression
- Target Feature
- y: has the client subscribed a term deposit?
- value: binary(yes or no)
-
About the Dataset
- Dataset from UCI Machinie Learning Repository.
- Customer data from May 2008 to November 2010.
- 41188 rows with 20 columns.
-
About the features
- Please refer to the attribute information here
- The target feature was encoded {yes: 1, no: 0}.
- Categorical varibles were all transformed into dummy variables.
- Insignificant features detected were removed as per the model summary result on logidtic regression by R.
- Insignificant featuers again removed as per the result of correlation plot by R.
- All numerical features standardized by Standard Scaler by 'sklearn'.
- Training set and test set of 8:2 ratio.
- 9 input variables
- K input between hidden layers
- 1 output variable
- Xavier Initializer
- Leaky ReLU/ReLU & Sigmoid
- AdamOptimizer
- Cost function for logistic regression
-
Hyper Parameters
Hyper Parameters Value Learning Rate 0.003 Dropout Rate 0.5 ~ 0.7 Threshold 0.65 Number of Layers 5 ~ 7 Number of Inputs 27 ~ 45 Iterations 1000 -
Finding the Optimal Cutoff Value (ROC Curve)
- Optimal cutoff value of 0.65
-
Hyperparameter Tunning
-
The Significance of the Project
- Bank profit = loan interest - deposit interest
- Optimize the profit by offering customized financial product
-
Further Improvements
- Batch traning would allows us to efficiently reduce the cost value and the accuracy
- Visualization of the training process with Tensorboard