GitHub - Tejas1415/Santander-Customer-Transaction-Prediction-Pyspark: Kaggle competition - Santander customer transaction prediction. Programs written both with and without parallelization (Spark) to compare their performances

Santander-Customer-Transaction-Prediction-Pyspark

Authors - Tejas Krishna Reddy, Issac and Bruno.

Data can be downloaded in "Tejas Random Forest and Data" folder in the link

https://drive.google.com/file/d/1RH4b6dCuWl4ulPL7Yq-dwsL86rbGZpNT/view?usp=sharing.

Linear Regression

To split the data into five folds, import splitData.py and run the following command: splitData.makeFolds('filename', sparkContext)

To run Linear Regression run the following command, where $1 is 1 for linear regression and 0 for logistic regression: spark-submit --executor-memory 100G --driver-memory 100G ParallelRegression_kfoldCV.py dataFolder 5 $1 --N 40 --maxiter numIterations --beta betaOutputFolder --perfOut performaceOutputFolder --lam LambdaValue

Alternatively, run sbatch regression.bash $1 $2 $3 where $1 is 1 for linear regression and 0 for logistic regression, $2 is the max number of iterations, and $3 is the lambda value.

To loop multiple values of lambda, run sbatch loopLam.bash. To change the range of lambda values, edit the file.

Random Forests

Go into the random_forest directory. There you will find two files: RandomForestWithSpark.py and RandomForestWithoutSpark.py. You can run these files with:

python RandomForestWithSpark.py
python RandomForestWithoutSpark.py

Neural Networks

Change directory into the neural networks. Start by processing the data with:

python read_equal_data.py $PATH_TO_CSV

Once you have created the x_train_eq.pkl and y_train_eq.pkl files, you can run the model:

python main_file.py

For further doubts contact me at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data Analytics Project - Report.pdf		Data Analytics Project - Report.pdf
LinearRegression.zip		LinearRegression.zip
ParallelProcessingFinalPresentation.pptx		ParallelProcessingFinalPresentation.pptx
README.md		README.md
neural_network.zip		neural_network.zip
random_forest.zip		random_forest.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Santander-Customer-Transaction-Prediction-Pyspark

Authors - Tejas Krishna Reddy, Issac and Bruno.

Linear Regression

Random Forests

Neural Networks

About

Releases

Packages

Tejas1415/Santander-Customer-Transaction-Prediction-Pyspark

Folders and files

Latest commit

History

Repository files navigation

Santander-Customer-Transaction-Prediction-Pyspark

Authors - Tejas Krishna Reddy, Issac and Bruno.

Linear Regression

Random Forests

Neural Networks

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages