Skip to content

shuchita-rahman/Data-Science-Salary-Prediction-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Science-Salary-Prediction-Project : Overview

  • Collected 1000 Data Science job description of glassdoor which is scrapped using selenium.
  • Engineered features from the text of each job description to quantify the value companies put on a bachelor's degree, python, excel, aws, statistics and spark.
  • Optimized Linear, Lasso, and Random Forest Regressors using GridsearchCV to reach the best model.
  • Created a tool that predicts data science salaries worldwide.
  • Created new columns minimum, maximum and average salary for a specific job.
  • Created new columns for employer-provided salary and hourly wages.
  • Removed row without salary information.
  • Simplified company name.
  • Parsed jobs state and created a new column which consists of states' abbreviation name.
  • Perceived age of a company from the company foundation date.
  • Created columns for different skills, if required in the job description:
    • Python
    • R
    • Excel
    • AWS
    • Spark
    • Bachelors' Degree
    • Statistics
    • SQL
  • Created a new column for description length.
  • Removed unnecessary columns.

tried to analyze the distribution of qualitative and quantitative values. Also tried to find out the correlation between salaries and other variables.

alt text alt text alt text alt text

First, transformed the qualitative variables into dummies. Second, applied three different models:

  • Multiple Linear Regression.
  • Lasso Regression.
  • Random Forest

Model Performance

  • Random Forest : MAE - 18.69
  • Lasso Regression : MAE 24.20
  • Multiple Linear Regression: 122.02

Resources

Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, seaborn, pickle
Scraper Github: https://github.com/arapfaik/scraping-glassdoor-selenium
Article:

Guide: https://www.youtube.com/playlist?list=PL2zq7klxX5ASFejJj80ob9ZAnBHdz5O1t

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published