#

pyspark

Here are 3,360 public repositories matching this topic...

longNguyen010203 / Youtube-ETLT-Pipeline

💜🌈📊 A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Apache Superset, Dbt 🌺

mysql processing docker dockerfile machine-learning spark docker-compose postgresql pyspark data-engineering minio dbt data-engineer etl-pipeline data-engineering-pipeline cleaning-data dagster

Updated May 17, 2024
Jupyter Notebook

logicalclocks / hopsworks

Hopsworks - Data-Intensive AI platform with a Feature Store

python aws data-science machine-learning serverless azure gcp ml pyspark feature-engineering governance model-serving mlops feature-store feature-management hopsworks kserve

Updated May 17, 2024
Java

MauricioVazquezM / Spark_BigData_Architecture_Project

Final project for the course 'Architecture for Large Data Volumes', taught in the Bachelor's program in Data Science at ITAM

python spark time-series pyspark data-streaming data-stream-processing

Updated May 17, 2024
Python

mitchelllisle / sparkdantic

✨ A Pydantic to PySpark schema library

schema pyspark pydantic

Updated May 17, 2024
Python

ibis-project / ibis

the portable Python dataframe library

mysql python bigquery sqlalchemy sql database clickhouse sqlite impala postgresql snowflake pandas pyspark mssql dask trino pyarrow datafusion duckdb polars

Updated May 17, 2024
Python

ev2900 / Glue_Aggregate_Small_Files

PySpark script to aggregate small parquet files in a prefix into larger files. Designed to be run on AWS Glue

aws s3 glue pyspark small-files

Updated May 16, 2024
Python

ev2900 / Glue_Examples

PySpark code samples designed for AWS Glue

aws glue pyspark aws-glue

Updated May 16, 2024
Python

canimus / cuallee

Possibly the fastest DataFrame-agnostic quality check library in town.

unit-testing bigdata pandas python3 performance-metrics pyspark data-quality-checks data-quality dataquality snowpark pydeequ

Updated May 16, 2024
Python

FranzDiebold / docker-datascience-ultimate

Customized Jupyter Spark Docker images with everything you need

python docker spark jupyter pyspark jupyterlab polars

Updated May 16, 2024
Dockerfile

SynapseML

microsoft / SynapseML

Simple and Distributed Machine Learning

Updated May 16, 2024
Scala

big-data-team / big-data-course

Practice course on Big Data

big-data spark cassandra yarn hive nosql pyspark hdfs mapreduce

Updated May 16, 2024
Jupyter Notebook

YeonwooSung / DevOpsMisc

Miscellaneous codes and writings for DevOps

nginx aws devops sql spark serverless gcp pyspark infra devops-pipeline devop

Updated May 16, 2024
Jupyter Notebook

apache / incubator-graphar

An open source, standard data file format for graph data storage and retrieval.

big-data spark etl graph pyspark graph-analysis data-orchestration graph-storage

Updated May 16, 2024
C++

rickyschools / dltflow

A library for authoring DLT pipelines via meta-programming patterns and deploying to Databricks workspaces.

python pyspark data-engineering cicd databricks data-quality-checks meta-programming dlt ml-engineering delta-live-tables

Updated May 16, 2024
Python

KevinShindel / MachineLearning

Pandas, Sci-kit, SparkML

scikit-learn pandas pyspark

Updated May 16, 2024
Jupyter Notebook

Saquibtechlotraining / Electric_Scooters_Project

This project analyzes data from 91wheels website (as of Nov 10, 2023) on electric scooters in India, reflecting the rising popularity of EVs. With 85 companies offering 288 models across 436 variants, it explores the evolving landscape, consumer preferences, and scooter specifications amidst the transition to electric mobility.

python sqlalchemy pyspark powerbi webscraping pyodbc etl-pipeline azuresql

Updated May 16, 2024
HTML

karim-sharkawy / Data-Mine

Work I did during the data mine :)

python bash xml tensorflow sklearn pandas pyspark cleaning-data

Updated May 16, 2024
Jupyter Notebook

rhejos / ipl_data_analysis

This project explores data analysis of the Indian Premier League utilizing Apache Spark, python, and SQL.

sql apache-spark aws-s3 pyspark databricks-notebooks

Updated May 16, 2024
Python

spark-nlp

JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing

Updated May 17, 2024
Scala

Dimitrov-S-Dev / resume

Dimitrov-S-Dev Resume/ Portfolio

javascript css python html airflow sql postgresql snowflake pandas python3 pyspark mssql tableau powerbi azure-databricks azure-synapse-analytics snowpark

Updated May 15, 2024
CSS

Improve this page

Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."