Skip to content

Kaggle competition - Santander customer transaction prediction. Programs written both with and without parallelization (Spark) to compare their performances

Notifications You must be signed in to change notification settings

Tejas1415/Santander-Customer-Transaction-Prediction-Pyspark

Repository files navigation

Santander-Customer-Transaction-Prediction-Pyspark

Authors - Tejas Krishna Reddy, Issac and Bruno.

Data can be downloaded in "Tejas Random Forest and Data" folder in the link

https://drive.google.com/file/d/1RH4b6dCuWl4ulPL7Yq-dwsL86rbGZpNT/view?usp=sharing.

Linear Regression

To split the data into five folds, import splitData.py and run the following command: splitData.makeFolds('filename', sparkContext)

To run Linear Regression run the following command, where $1 is 1 for linear regression and 0 for logistic regression: spark-submit --executor-memory 100G --driver-memory 100G ParallelRegression_kfoldCV.py dataFolder 5 $1 --N 40 --maxiter numIterations --beta betaOutputFolder --perfOut performaceOutputFolder --lam LambdaValue

Alternatively, run sbatch regression.bash $1 $2 $3 where $1 is 1 for linear regression and 0 for logistic regression, $2 is the max number of iterations, and $3 is the lambda value.

To loop multiple values of lambda, run sbatch loopLam.bash. To change the range of lambda values, edit the file.

Random Forests

Go into the random_forest directory. There you will find two files: RandomForestWithSpark.py and RandomForestWithoutSpark.py. You can run these files with:

python RandomForestWithSpark.py
python RandomForestWithoutSpark.py

Neural Networks

Change directory into the neural networks. Start by processing the data with:

python read_equal_data.py $PATH_TO_CSV

Once you have created the x_train_eq.pkl and y_train_eq.pkl files, you can run the model:

python main_file.py

For further doubts contact me at [email protected]

About

Kaggle competition - Santander customer transaction prediction. Programs written both with and without parallelization (Spark) to compare their performances

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published