Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 8,266 public repositories matching this topic...
YTsaurus is a scalable and fault-tolerant open-source big data platform.
-
Updated
May 16, 2024 - C++
SageWorks: An easy to use Python API for creating and deploying AWS SageMaker Models
-
Updated
May 16, 2024 - Python
💜🌈📊 A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Apache Superset, Dbt 🌺
-
Updated
May 16, 2024 - Jupyter Notebook
An open source, standard data file format for graph data storage and retrieval.
-
Updated
May 16, 2024 - C++
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
-
Updated
May 16, 2024 - Python
One ETL tool to rule them all
-
Updated
May 16, 2024 - Python
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
May 16, 2024 - Scala
New generation decentralized data lake and a streaming data pipeline
-
Updated
May 16, 2024 - Rust
Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.
-
Updated
May 16, 2024 - Python
XL-LightHouse是一套支持超大数据量、支持超高并发的通用型流式大数据统计系统。常见的应用场景包括:PV、UV统计;电商销售额、下单用户数统计;日志量统计;接口调用量、异常量、耗时情况统计;服务器运维指标监控等功能。系统支持多维度统计,支持各种复杂的条件筛选和逻辑判断,一键部署,一行代码接入,轻松实现各种海量数据实时统计,帮助企业以更低的成本快速搭建起数据指标体系,是企业降本增效的好帮手!
-
Updated
May 16, 2024 - Java
A large-scale entity and relation database supporting aggregation of properties
-
Updated
May 16, 2024 - Java
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 414 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia