#

apache-spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 1,664 public repositories matching this topic...

mlflow / mlflow

Open source platform for the machine learning lifecycle

machine-learning ai apache-spark ml model-management mlflow

Updated May 23, 2024
Python

lw-lin / CoolplaySpark

酷玩 Spark: Spark 源代码解析、Spark 类库等

spark apache-spark sparkcore spark-streaming structured-streaming

Updated May 18, 2022
Scala

kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

kubernetes spark apache-spark kubernetes-operator kubernetes-controller kubernetes-crd google-cloud-dataproc

Updated May 21, 2024
Go

SynapseML

microsoft / SynapseML

Simple and Distributed Machine Learning

Updated May 21, 2024
Scala

intel-analytics / BigDL-2.x

BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray

python scala apache-spark pytorch keras-tensorflow bigdl distributed-deep-learning deep-neural-network analytics-zoo

Updated Apr 26, 2024
Jupyter Notebook

big-data-europe / docker-spark

Apache Spark docker image

docker kubernetes apache-spark spark-kubernetes k8s-spark

Updated Apr 21, 2023
Shell

databricks / LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

spark apache-spark mllib structured-streaming spark-sql spark-mllib mlflow delta-lake

Updated May 8, 2024
Scala

mahmoudparsian / data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

python java machine-learning scala apache-spark distributed-computing design-patterns pyspark mapreduce reducers partitioning hadoop-mapreduce distributed-algorithms mappers data-algorithms apache-hadoop

Updated Apr 21, 2023
Java

spark-notebook / spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

data-science reactive scala spark apache-spark notebook

Updated May 16, 2023
JavaScript

ptyadana / SQL-Data-Analysis-and-Visualization-Projects

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

mysql python postgres sql apache-spark sqlite postgresql challenges pyspark mysql-database data-analysis exercises tableau sql-queries pgadmin mysqlworkbench mysql-notes digital-music-store sql-data-analysis

Updated Jul 18, 2022
Jupyter Notebook

japila-books / apache-spark-internals

The Internals of Apache Spark

spark apache-spark book internals

Updated Apr 16, 2024

OryxProject / oryx

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

java machine-learning kafka apache-spark cloudera apache-kafka lambda-architecture oryx

Updated Aug 16, 2021
Java

miguno / kafka-storm-starter

[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

scala kafka spark apache-spark storm integration avro apache-storm apache-kafka apache-avro

Updated Mar 22, 2022
Scala

lakeFS

treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data

go golang apache-spark aws-s3 google-cloud-storage data-engineering data-lake azure-storage data-version-control object-storage datalake hadoop-filesystem data-quality data-versioning azure-blob-storage apache-sparksql git-for-data lakefs datalakes

Updated May 23, 2024
Go

awesome-spark / awesome-spark

A curated list of awesome Apache Spark packages and resources.

awesome apache-spark pyspark sparkr

Updated Apr 8, 2024
Shell

dotnet / spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Updated Apr 16, 2024
C#

rjurney / Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Updated May 6, 2024
Jupyter Notebook

sparklyr

sparklyr / sparklyr

R interface for Apache Spark

machine-learning r spark apache-spark dplyr ide distributed rstats sparklyr livy remote-clusters

Updated May 8, 2024
R

Spark-with-Python

tirthajyoti / Spark-with-Python

Fundamentals of Spark with Python (using PySpark), code examples

python machine-learning sql database big-data spark apache-spark hadoop analytics parallel-computing distributed-computing apache map-reduce pyspark hdfs dataframe mlib

Updated Oct 29, 2022
Jupyter Notebook

feathr-ai / feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise

data-science machine-learning apache-spark azure artificial-intelligence data-engineering feature-engineering data-quality mlops feature-store feature-management feature-marketplace feature-governance feature-metadata feature-platform

Updated Apr 4, 2024
Scala

Created by Matei Zaharia

Released May 26, 2014

Followers: 415 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics