spark-dataframes

Here are 42 public repositories matching this topic...

RahulGupta16 / Pyspark-Theory-and-Code-Basics

Pyspark serves as a Python interface to Apache Spark, enabling the execution of Python and SQL-like instructions for the manipulation and analysis of data within a distributed processing framework.

sql apache-spark python3 pyspark data-engineering sparksql rdd spark-dataframes

Updated Dec 12, 2023
Jupyter Notebook

amanjeetsahu / Apache-Spark-using-Scala

Star

This repo contains my learnings and practices Zepplin notebooks on Spark using Scala. All the notebooks in the repo can be used as template code for most of the ML algorithms and can be built upon it for more complex problems.

machine-learning scala big-data spark machine-learning-algorithms bigdata mllib zeppelin zeppelin-notebook spark-dataframes spark-ml spark-dataset

Updated Jul 15, 2020

Wasencroll / big-data-programming

Star

Assignments in R programming (data analysis, clustering) and Spark within Big Data Programming course in my master's program.

ggplot2 r big-data spark data-analysis r-programming spark-dataframes

Updated Mar 16, 2018
XSLT

rhl-gupta / pyspark---Basics

Star

Explains the implementation of spark concepts using pyspark API from jupyter notebook

spark-sql spark-dataframes pyspark-api sprakcontext pyspark-in-jupter

Updated Jun 28, 2018

LucasDLee / CMPT-353-Final-Project

Star

This is our final project for SFU's CMPT 353 taught by Greg Baker during Summer 2023

python data-science statistics university-project spark-dataframes

Updated Aug 23, 2023
Python

milesgranger / pontem

Star

Treat Spark like pandas.

pandas pyspark dataframes dataframe-api spark-dataframes distributed-dataframe

Updated Sep 3, 2017
Python

thenickben / SplitCSV-Spark

Star

Big Data - Split a large CSV file into N smaller ones and save them into the local disk

scala big-data spark spark-dataframes

Updated Nov 3, 2018
Scala

anshul1004 / MutualFriends

Star

Implementation of Hadoop and Spark

Updated May 11, 2020
Java

WazirRohiman / Apache_Spark_Basics

Star

This series explores the basics of Apache Spark with the application of some practical elements of Spark, PySpark & SparkSQL

kubernetes apache-spark docker-compose jupyter-notebook python3 pyspark spark-sql spark-dataframes

Updated Feb 20, 2023
Jupyter Notebook

AliElsaeid / Predicting-Kickstarter-Campaign-Success-Using-Machine-Learning

Star

Predict the success of Kickstarter campaigns using machine learning. Analyze project data including financial goals, pledge amounts, categories, and outcomes. Perform data cleaning, queries, visualizations, and build models to forecast campaign success, helping entrepreneurs optimize their funding strategies

big-data data-engineering spark-dataframes spark-machine-learning

Updated May 22, 2024
Jupyter Notebook

Vivek-Murali / CarCrashAnalysis

Star

BCG GAMMA CASE STUDY

etl pyspark data-engineering spark-dataframes

Updated Jan 27, 2023
Jupyter Notebook

zaha2020 / Big_Data

Star

This repository contains the implementation of a wide variety of BigData Projects in different applications of NoSQL databases, Spark, Data Pipelines, and map-reduce. These projects include university projects and projects implemented due to interest in BigData.

Updated Sep 28, 2023
Jupyter Notebook

spu-bigdataanalytics-193 / assignment5

Star

Spark Even More! (Bonus)

data-transformation eda spark-sql spark-dataframes

Updated Feb 21, 2020

lalithvenkat / Analysis-of-M50-Highway-data-using-Spark

Star

This Repo contains analysis of large data using Spark

hadoop spark-streaming hdfs spark-sql spark-dataframes

Updated Oct 30, 2021
Jupyter Notebook

NashTech-Labs / spark-dataframes-meetup

Star

meetup scala spark sbt spark-dataframes knoldus

Updated Apr 4, 2016
Scala

afzals2000 / spark-bigquery-parallel

Star

Spark BigQuery Parallel

bigquery spark apache-spark pyspark google-cloud-platform spark-sql spark-dataframes spark-scala pyspark-python

Updated Jan 24, 2019
Scala

Bcromas / pyspark_projects

Star

A collection of small projects exploring PySpark features and functionality including packages and modules, algorithms, and general data science techniques.

pyspark spark-streaming spark-sql spark-mllib spark-dataframes

Updated Nov 24, 2020
Jupyter Notebook

maziyarpanahi / spark-quickie

Star

Getting started with Apache Spark

spark spark-dataframes

Updated Feb 16, 2024

mayankrawat / CSVJoin

Star

Use this project to join data from multiple csv files. Currently in this project we support one to one and one to many join. Along with this you can find how to use kafka producer efficiently with spark.

Updated Jul 1, 2022
Java

psanghal / bosch_manufacturing_line

Star

UMSI-Bosch Manufacturing Line Failure Analysis

altair spark-sql datavisualization spark-dataframes pandas-python

Updated Aug 23, 2022
Jupyter Notebook

Improve this page

Add a description, image, and links to the spark-dataframes topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spark-dataframes topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-dataframes

Here are 42 public repositories matching this topic...

RahulGupta16 / Pyspark-Theory-and-Code-Basics

amanjeetsahu / Apache-Spark-using-Scala

Wasencroll / big-data-programming

rhl-gupta / pyspark---Basics

LucasDLee / CMPT-353-Final-Project

milesgranger / pontem

thenickben / SplitCSV-Spark

anshul1004 / MutualFriends

WazirRohiman / Apache_Spark_Basics

AliElsaeid / Predicting-Kickstarter-Campaign-Success-Using-Machine-Learning

Vivek-Murali / CarCrashAnalysis

zaha2020 / Big_Data

spu-bigdataanalytics-193 / assignment5

lalithvenkat / Analysis-of-M50-Highway-data-using-Spark

NashTech-Labs / spark-dataframes-meetup

afzals2000 / spark-bigquery-parallel

Bcromas / pyspark_projects

maziyarpanahi / spark-quickie

mayankrawat / CSVJoin

psanghal / bosch_manufacturing_line

Improve this page

Add this topic to your repo