💜🌈📊 A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Apache Superset, Dbt 🌺
-
Updated
May 17, 2024 - Jupyter Notebook
💜🌈📊 A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Apache Superset, Dbt 🌺
Hopsworks - Data-Intensive AI platform with a Feature Store
Final project for the course 'Architecture for Large Data Volumes', taught in the Bachelor's program in Data Science at ITAM
the portable Python dataframe library
PySpark script to aggregate small parquet files in a prefix into larger files. Designed to be run on AWS Glue
Possibly the fastest DataFrame-agnostic quality check library in town.
Simple and Distributed Machine Learning
An open source, standard data file format for graph data storage and retrieval.
A library for authoring DLT pipelines via meta-programming patterns and deploying to Databricks workspaces.
This project analyzes data from 91wheels website (as of Nov 10, 2023) on electric scooters in India, reflecting the rising popularity of EVs. With 85 companies offering 288 models across 436 variants, it explores the evolving landscape, consumer preferences, and scooter specifications amidst the transition to electric mobility.
Work I did during the data mine :)
This project explores data analysis of the Indian Premier League utilizing Apache Spark, python, and SQL.
State of the Art Natural Language Processing
Dimitrov-S-Dev Resume/ Portfolio
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."