Open source platform for the machine learning lifecycle
-
Updated
May 23, 2024 - Python
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Open source platform for the machine learning lifecycle
酷玩 Spark: Spark 源代码解析、Spark 类库等
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Simple and Distributed Machine Learning
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
Apache Spark docker image
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Interactive and Reactive Data Science using Scala and Spark.
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
lakeFS - Data version control for your data lake | Git for data
A curated list of awesome Apache Spark packages and resources.
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
R interface for Apache Spark
Fundamentals of Spark with Python (using PySpark), code examples
Feathr – A scalable, unified data and AI engineering platform for enterprise
Created by Matei Zaharia
Released May 26, 2014