dataproc

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.

redis bigquery aws scala spark etl s3 gcp gcs zio etl-framework dataproc etl-pipeline

Updated Aug 26, 2024
Scala

jehiah / gomrjob

Star

gomrjob - a Go Framework for Hadoop Map Reduce Jobs

go hadoop mapreduce mrjob dataproc

Updated Aug 5, 2024
Go

debussy-labs / debussy_concert

Star

Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and pipelines.

mysql bigquery workflow airflow sql spark postgresql gcp google-cloud data-engineering mssql dbt data-pipeline airflow-plugin big-data-platform dataproc dataform data-architecture airflow-operators

Updated Mar 20, 2023
Python

MarcosMJD / ghcn-d

Star

Data Pipeline from the Global Historical Climatology Network DataSet

docker bigquery airflow spark etl terraform iac dbt elt datalake dataengineering dataproc googlecloudstorage googledatastudio datawharehouse

Updated Dec 20, 2022
Jupyter Notebook

Wittline / pyDag

Sponsor

Star

Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

bigquery cloud big-data workflow-engine google-cloud data-engineering task-scheduler google-cloud-platform dataproc-cluster dag parallel-processing data-pipeline dataengineering dataproc directed-acyclic-graph task-scheduling

Updated Sep 19, 2022
Python

googleapis / nodejs-dataproc

Star

This repository is deprecated. All of its content and history has been moved to googleapis/google-cloud-node.

google cloud spark hadoop apache dataproc

Updated Jul 13, 2023

imehrdadmahdavi / map-reduce-inverted-index

Star

Creating an Inverted Index of words occurring in a large set of documents extracted from web pages using Hadoop MapReduce and Google Dataproc

search-engine information-retrieval big-data hadoop clustering bigdata gcp map-reduce inverted-index mapreduce googlecloud dataprocessing dataproc

Updated Oct 28, 2019
Java

GoogleCloudPlatform / dataproc-scala-examples

Star

Dataproc Scala Examples is an effort to assist in the creation of Spark jobs written in Scala to run on Dataproc.

airflow composer scala spark gcp dataproc

Updated May 3, 2024
Scala

prakashdontaraju / google-cloud-ecommerce

Star

ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau

Updated Mar 9, 2022
Python

dsc-umass / social-insights

Star

A search engine to query social media insights with political theme

react python search-engine visualizations dataproc social-insights

Updated Sep 22, 2021
Jupyter Notebook

MarieeCzy / METAR-Data-Engineering-and-Machine-Learning-Project

Star

An educational project to build an end-to-end pipline for near real-time and batch processing of data further used for visualisation and a machine learning model.

python docker bigquery machine-learning looker big-data spark terraform pyspark dataproc-cluster googlecloudplatform dataproc prefect streamlit

Updated May 19, 2023
Python

anjijava16 / GCP_Data_Enginner_Utils

Star

GCP_Data_Enginner

python bigquery scala notebook gcp pubsub pyspark dataflow shell-script dataproc-cluster dataproc gcp-storage big-data-processing

Updated Sep 4, 2021
Shell

spotify / dataprocxy

Star

opens a chrome browser to a dataproc cluster

dataproc

Updated Jan 23, 2018
Python

garystafford / dataproc-workflow-templates

Star

Demonstration of Google Cloud Dataproc Workflow Templates

spark hadoop gcp pyspark google-cloud-platform dataproc

Updated Dec 17, 2018

Improve this page

Add a description, image, and links to the dataproc topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dataproc topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataproc

Here are 92 public repositories matching this topic...

dataflint / spark

GoogleCloudPlatform / data-analytics-golden-demo

lynnlangit / learning-hadoop-and-spark

spotify / spydra

allegro / bigflow

GoogleCloudPlatform / serverless-spark-workshop

tharwaninitin / etlflow

jehiah / gomrjob

debussy-labs / debussy_concert

MarcosMJD / ghcn-d

Wittline / pyDag

googleapis / nodejs-dataproc

imehrdadmahdavi / map-reduce-inverted-index

GoogleCloudPlatform / dataproc-scala-examples

prakashdontaraju / google-cloud-ecommerce

dsc-umass / social-insights

MarieeCzy / METAR-Data-Engineering-and-Machine-Learning-Project

anjijava16 / GCP_Data_Enginner_Utils

spotify / dataprocxy

garystafford / dataproc-workflow-templates

Improve this page

Add this topic to your repo