Merge pull request #690 from adwityac/adwityac

Added Job Recommmendation System
UppuluriKalyani · Oct 31, 2024 · d52427f · d52427f
2 parents 97feae5 + ff56035
commit d52427f
Show file tree

Hide file tree

Showing 86 changed files with 92,729 additions and 0 deletions.
diff --git a/Recommendation Systems/Job-Recommendation-System/README.md b/Recommendation Systems/Job-Recommendation-System/README.md
@@ -0,0 +1,58 @@
+# Job Recommendation System using Machine Learning
+ The system is designed to provide personalized job recommendations based on user preferences and historical job data. The data for this project is scraped from Glassdoor, and the system is deployed using the Azure cloud platform.
+
+## Business Understanding
+The goal of this project is to develop a job recommendation system that helps users find relevant job opportunities based on their preferences and historical data. By leveraging machine learning techniques, we aim to provide personalized recommendations that align with the user's skills, interests, and career goals. The system will take into account various factors such as job title, salary estimate, company rating, location, industry, and more to generate accurate recommendations.
+
+## Data Scraping
+To collect the necessary data for training our recommendation system, we will scrape job-related information from Glassdoor. The following columns will be extracted:
+
+Job Title
+Salary Estimate
+Job Description
+Rating
+Company Name
+Location
+Headquarters
+Size
+Founded
+Type of Ownership
+Industry
+Sector
+Revenue
+Competitors
+
+## Feature Engineering
+Once the data is collected, we will perform feature engineering to preprocess and transform the raw data into a suitable format for training our recommendation model. This step includes:
+
+Handling Missing Data: Deal with missing values in the dataset by either imputing them or removing the corresponding rows/columns.
+Encoding Categorical Variables: Convert categorical variables such as job title, location, industry, and sector into numerical representations using techniques like one-hot encoding or label encoding.
+Feature Scaling: Normalize numerical features, such as salary estimate and company rating, to ensure they have a similar scale and prevent dominance of certain features in the model.
+
+## Machine Learning Techniques:
+To provide personalized job recommendations, we employ the TF-IDF (Term Frequency-Inverse Document Frequency) vectorization technique. The "job_recommender.py" component plays a crucial role in this process. It utilizes the TF-IDF vectorizer from the scikit-learn library to transform job descriptions and user preferences into numerical feature vectors. These vectors capture the importance of each word in the documents, enabling the system to find similar job opportunities based on user preferences. The Nearest Neighbors algorithm is then used to identify the most relevant job recommendations.
+
+skill extractor segment provides functions and utilities to extract skills from a PDF file using the Spacy library and perform text processing and matching operations. These extracted skills can be used for further analysis and processing in the job recommendation system.
+
+## Streamlit Application
+To make the job recommendation system easily accessible and user-friendly, we have developed a Streamlit application. Streamlit provides an intuitive web interface where users can upload their resumes. The application processes the user input, applies the machine learning models, and displays the top-recommended jobs based on the user's preferences and historical data.
+
+## Model Deployment using Azure Cloud
+To make the job recommendation system accessible to users, we will deploy the model on the Azure cloud platform. The deployment process involves the following steps:
+
+* Model Serialization: Serialize the trained model to a format compatible with the Azure cloud deployment.
+* Model Containerization: Package the serialized model along with the necessary dependencies and environment specifications into a container using tools like Docker.
+* Azure Container Registry: Create a container registry on Azure to store the model container and related artifacts securely.
+* Azure Kubernetes Service (AKS): Deploy the model container as a scalable microservice using AKS, which provides orchestration and management capabilities.
+* API Development: Develop an API that allows users to interact with the deployed model and request personalized job recommendations.
+* Integration and Testing: Integrate the API with other components of the job recommendation system, and perform thorough testing to ensure its functionality and performance.
+* Deployment Monitoring: Monitor the deployed model and API to track usage, and performance metrics, and address any potential issues or errors.
+
+## Usage
+To use the job recommendation system, follow the instructions below:
+
+* Install the required dependencies: pip install -r requirements.txt
+* Run the command: streamlit run __init__.py ( For Local Server )
+* Access the deployed job recommendation API and make requests to receive personalized recommendations.
+
+
diff --git a/Recommendation Systems/Job-Recommendation-System/__init__.py b/Recommendation Systems/Job-Recommendation-System/__init__.py
@@ -0,0 +1,64 @@
+import streamlit as st
+import pandas as pd
+import PyPDF2
+from pyresparser import ResumeParser
+from sklearn.neighbors import NearestNeighbors
+from src.components.job_recommender import ngrams,getNearestN,jd_df
+import src.notebook.skills_extraction as skills_extraction
+from sklearn.feature_extraction.text import TfidfVectorizer
+
+
+# Function to process the resume and recommend jobs
+def process_resume(file_path):
+    # Extract text from PDF resume
+    resume_skills=skills_extraction.skills_extractor(file_path)
+
+    # Perform job recommendation based on parsed resume data
+    skills=[]
+    skills.append(' '.join(word for word in resume_skills))
+
+
+    # Feature Engineering:
+    vectorizer = TfidfVectorizer(min_df=1, analyzer=ngrams, lowercase=False)
+    tfidf = vectorizer.fit_transform(skills)
+
+
+    nbrs = NearestNeighbors(n_neighbors=1, n_jobs=-1).fit(tfidf)
+    jd_test = (jd_df['Processed_JD'].values.astype('U'))
+
+    distances, indices = getNearestN(jd_test)
+    test = list(jd_test) 
+    matches = []
+
+    for i,j in enumerate(indices):
+        dist=round(distances[i][0],2)
+        temp = [dist]
+        matches.append(temp)
+
+    matches = pd.DataFrame(matches, columns=['Match confidence'])
+
+    # Following recommends Top 5 Jobs based on candidate resume:
+    jd_df['match']=matches['Match confidence']
+
+    return jd_df.head(5).sort_values('match')
+
+# Streamlit app
+def main():
+    st.title("Job Recommendation App")
+    st.write("Upload your resume in PDF format")
+
+    # File uploader
+    uploaded_file = st.file_uploader("Choose a file", type=['pdf'])
+
+    if uploaded_file is not None:
+        # Process resume and recommend jobs
+        file_path=uploaded_file.name
+        df_jobs = process_resume(file_path)
+
+        # Display recommended jobs as DataFrame
+        st.write("Recommended Jobs:")
+        st.dataframe(df_jobs[['Job Title','Company Name','Location','Industry','Sector','Average Salary']])
+
+# Run the Streamlit app
+if __name__ == '__main__':
+    main()