SQL stream processing, analytics, and management. We decouple storage and compute to offer speedy bootstrapping, dynamic scaling, time-travel queries, and efficient joins.
-
Updated
May 17, 2024 - Rust
SQL stream processing, analytics, and management. We decouple storage and compute to offer speedy bootstrapping, dynamic scaling, time-travel queries, and efficient joins.
🏆 Spark4You Design patterns
An open source framework for building data analytic applications.
Data Streaming with Debezium, Kafka, Spark Streaming, Delta Lake, and MinIO
Ophelia a PySpark analytics wrapper.
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates and Next.js for data visualization. as well as a custom built search engine.
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Using various data processing tool for real time data pipeline with Kafka
Discover real-time weather analysis through stream and batch processing with Apache Kafka, Apache Spark, and MySQL. This project seamlessly integrates both techniques to compute essential weather metrics, offering valuable insights into weather patterns. Join us in exploring dynamic weather datasets and uncovering actionable insights
💶Kafka-SparkStreamNLP 是一个基于docker容器化管理的实时金融文本分析平台,通过新闻api,采用 Kafka 进行数据流管理,使用 Spark Streaming 结合微调预训练模型finetuning进行NLP处理,并通过输出流将结果存储在clickhouse以便后续使用可视化平台进行olap分析⭐️⭐️⭐️⭐️⭐️
A real time data enginerring project showcasing kafka, postgres , spark, python, docker in action where we implement a data pipeline for voting system.
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Structured data streaming using Spark’ s Structured Streaming API ,kafka for data ingestion and cassandra for data storing
An application developed to give real-time insights on machine health using Iot sensors by tracking and monitoring parameters such as temperature, pressure, current and humidity.
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Example pipeline to stream the data changes from RDBMS to Apache Iceberg tables
A sample real-time streaming analytics application with Spark Structured Streaming and Kafka.
Add a description, image, and links to the spark-streaming topic page so that developers can more easily learn about it.
To associate your repository with the spark-streaming topic, visit your repo's landing page and select "manage topics."