spark-streaming

Here are 1,012 public repositories matching this topic...

risingwavelabs / risingwave

SQL stream processing, analytics, and management. We decouple storage and compute to offer speedy bootstrapping, dynamic scaling, time-travel queries, and efficient joins.

Updated May 17, 2024
Rust

AlexRogalskiy / spark-patterns

Star

🏆 Spark4You Design patterns

patterns spark ebook spark-streaming spark-sql spark-structured-streaming patterns-design

Updated May 17, 2024
Shell

cdapio / cdap

Star

An open source framework for building data analytic applications.

python java platform middleware spark integration dataset spark-streaming java-8 unified mapreduce cdap

Updated May 16, 2024
Java

trannhatnguyen2 / streaming_data_processing

Star

Data Streaming with Debezium, Kafka, Spark Streaming, Delta Lake, and MinIO

airflow kafka minio spark-streaming debezium delta-lake

Updated May 15, 2024
Python

LuisFalva / ophelia

Star

Ophelia a PySpark analytics wrapper.

spark spark-streaming dask dataframe rdd spark-mllib spark-ml ophelia ophelia-spark

Updated May 14, 2024
Python

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Updated May 13, 2024
C#

Mahmoud-nfz / football-big-data

Star

This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates and Next.js for data visualization. as well as a custom built search engine.

search-engine kafka spark hadoop nextjs rethinkdb spark-streaming hadoop-hdfs t3-stack

Updated May 14, 2024
TypeScript

databrickslabs / dbldatagen

Star

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

python spark faker pyspark spark-streaming data-generation databricks synthetic-data datagen datagenerator deltalake datageneration delta-live-tables

Updated May 16, 2024
Python

Azure / azure-event-hubs-spark

Star

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

microsoft streaming real-time scala kafka spark apache-spark stream connector azure bigdata apache spark-streaming eventhubs ingestion continuous event-hubs databricks structured-streaming

Updated May 9, 2024
Scala

hieuung / Streaming-Kafka

Star

Using various data processing tool for real time data pipeline with Kafka

kafka apache-spark spark-streaming kafka-consumer apache-beam apache-flink kafka-producer spark-streaming-kafka

Updated Apr 30, 2024
Python

Sakthe-Balan / WeatherAnalysis_Spark

Star

Discover real-time weather analysis through stream and batch processing with Apache Kafka, Apache Spark, and MySQL. This project seamlessly integrates both techniques to compute essential weather metrics, offering valuable insights into weather patterns. Join us in exploring dynamic weather datasets and uncovering actionable insights

mysql python kafka end-to-end parallel-computing spark-streaming batch-processing

Updated Apr 24, 2024
Python

HadilHelali / BigDataVisualisation

Star

This is a part of a big data project which encompasses both batch and data stream processing to visualize.

nodejs kafka big-data mongodb hadoop spark-streaming

Updated Apr 23, 2024
EJS

Michel-debug / Kafka-SparkStreamNLP-Finance-Sentiment-Anlaysis

Star

💶Kafka-SparkStreamNLP 是一个基于docker容器化管理的实时金融文本分析平台，通过新闻api，采用 Kafka 进行数据流管理，使用 Spark Streaming 结合微调预训练模型finetuning进行NLP处理，并通过输出流将结果存储在clickhouse以便后续使用可视化平台进行olap分析⭐️⭐️⭐️⭐️⭐️

api docker kafka docker-compose clickhouse vscode pyspark spark-streaming nlp-machine-learning distillbert-model

Updated Apr 22, 2024
Jupyter Notebook

IMRANDIL / RealTimeVotingDataEngineering

Star

A real time data enginerring project showcasing kafka, postgres , spark, python, docker in action where we implement a data pipeline for voting system.

python docker streaming kafka postgresql spark-streaming

Updated Apr 21, 2024
Python

agile-lab-dev / wasp

Star

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

elasticsearch scala kafka akka spark yarn hadoop solr jdbc hbase spark-streaming hdfs parquet

Updated Apr 19, 2024
Scala

00VALAK00 / Structured_data_streaming

Star

Structured data streaming using Spark’ s Structured Streaming API ,kafka for data ingestion and cassandra for data storing

kafka spark-streaming cassandra-database

Updated Apr 17, 2024
Python

sakethmukkanti / Machinery-Moniter-Iot-Streaming-With-Azure

Star

An application developed to give real-time insights on machine health using Iot sensors by tracking and monitoring parameters such as temperature, pressure, current and humidity.

spark-streaming tableau azure-iothub spark-sql azure-data-lake azure-databricks azure-synapse-analytics