Data-Science-Salary-Prediction-Project : Overview

Collected 1000 Data Science job description of glassdoor which is scrapped using selenium.
Engineered features from the text of each job description to quantify the value companies put on a bachelor's degree, python, excel, aws, statistics and spark.
Optimized Linear, Lasso, and Random Forest Regressors using GridsearchCV to reach the best model.
Created a tool that predicts data science salaries worldwide.

Data Cleaning

Created new columns minimum, maximum and average salary for a specific job.
Created new columns for employer-provided salary and hourly wages.
Removed row without salary information.
Simplified company name.
Parsed jobs state and created a new column which consists of states' abbreviation name.
Perceived age of a company from the company foundation date.
Created columns for different skills, if required in the job description:
- Python
- R
- Excel
- AWS
- Spark
- Bachelors' Degree
- Statistics
- SQL
Created a new column for description length.
Removed unnecessary columns.

Exploratory Data Analysis

tried to analyze the distribution of qualitative and quantitative values. Also tried to find out the correlation between salaries and other variables.

Model Building

First, transformed the qualitative variables into dummies. Second, applied three different models:

Multiple Linear Regression.
Lasso Regression.
Random Forest

Model Performance

Random Forest : MAE - 18.69
Lasso Regression : MAE 24.20
Multiple Linear Regression: 122.02

Resources

Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, seaborn, pickle
Scraper Github: https://github.com/arapfaik/scraping-glassdoor-selenium
Article:

Guide: https://www.youtube.com/playlist?list=PL2zq7klxX5ASFejJj80ob9ZAnBHdz5O1t

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Data Cleaning		Data Cleaning
Data Modeling		Data Modeling
Exploratory Data Analysis		Exploratory Data Analysis
image		image
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Science-Salary-Prediction-Project : Overview

Data Cleaning

Exploratory Data Analysis

Model Building

Model Performance

Resources

About

Releases

Packages

Languages

shuchita-rahman/Data-Science-Salary-Prediction-Project

Folders and files

Latest commit

History

Repository files navigation

Data-Science-Salary-Prediction-Project : Overview

Data Cleaning

Exploratory Data Analysis

Model Building

Model Performance

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages