company tomake appropriate business strategies to enhance their revenue by analyzing customers behaviors and send offers and royalties to customers respectively
-
Updated
May 23, 2023 - Jupyter Notebook
company tomake appropriate business strategies to enhance their revenue by analyzing customers behaviors and send offers and royalties to customers respectively
A machine learning model is built using PySpark's MLlib library to automatically flag suspicious job postings on Indeed.com. The dataset includes 18,000 job descriptions, out of which about 800 are fake.
Writing dummy snippets of code to read, manipulate, and build a simple ML model with PySpark.
Given a set of documents and the minimum required similarity threshold find the number of document pairs that exceed the threshold
This notebook contains detailed code for spark and machine learning and databricks
A laboratory to carry out experiments with PySpark
An ETL pipeline for I94 immigration, global land temperatures and US demographics datasets is created to form an analytics database on immigration events. A data model is established with pandas and pyspark to find patterns of immigration to the United States.
Trying best case apache spark working environment for robust data pipelines
An academic project carried out for the Distributed Data Analysis and Mining course (a. y. 2022/2023)
MapReduce Job Development, RDDs Programming, Medical Data Management, Sales Analysis, And Efficient Data Integration For Big Data Analysis. Spark: Big Data Processing, SQOOP Integration, And Spark Structured Streaming For Real-Time Data.
Assignments as given in the course of CSE545. All assignments are part of this course
This project focuses on analyzing the questions on askubuntu.com to find the most common topics asked about in order to better understand what areas of Ubuntu may need more attention for bug fixing and also what features might be good to add in future releases of Ubuntu. To do this, I analyzed public data from askubuntu.com using Azure HDInsight…
1061Data Mining Research and Practice Homeworks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."