Bitcoin Price Prediction using Spark Global and self-designed Local Model with Big data preprocessing and manipulation solution.


  • Global Model: Spark build-in MLlib, model can benefit from all the data.
  • Local Model: Utilize the ML algorithm from third party(eg. scikit-learn), model only can benefit from a subset of the data, but could be faster.


  • Packages:
    • python >= 3.8.8
    • pyspark >= 3.1.1
    • numpy >= 1.19.2
    • pandas >= 1.2.3
    • plotly >= 4.14.3
    • scikit-learn >= 0.24.1
    • statsmodels >= 0.12.2
    • pmdarima >= 1.8.2

Built With

  • Spark - Lightning-fast unified analytics engine
  • Python - Programming language



  • global_mode.ipynb : Global Model Prediction on Spark.
  • local_LR.ipynb : Local Model design(Linear Regression) on Spark to make predictions.
  • local_autoReg.ipynb : Local Model design(ARIMA/VectorARIMA) on Spark to make predictions.
  • preprocess_bitcoin_pyspark.ipynb : Data imputation and resampling by big data solution(Spark).
  • blockChain_crawler.ipynb : A crawler to get BlockChain Information.
  • feature_engineering.ipynb : Feature Engineering, include Data combination, Label maker, financial indicators maker.
  • : A common code for time series Cross Validation.
  • bitcoin_1m_1min.csv : A subset dataset for functionality test; 1 month(03/2021) 1min interval data of bitcoin.


  • conference_paper.pdf


  • Chi Wang
  • Luer Lyu
  • Joel Ligma
  • Junfeng Wang


This project is licensed under the MIT License - see the LICENSE file for details.
If you want to cooperate or use this project, please contact the author: [email protected]


