Skip to content

Nour-Sadek/HR-Data-Analyst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

HR Data Analyst

About

You work as an analyst in a company. The company's HR boss provided you with three datasets. The first two contain information about employees' performance in offices A and B: how much they work, their salaries, the number of their projects, departments, and so on. The third one is an extensive dataset with information on the employees' satisfaction with their jobs, their latest evaluation metrics, and the current status in the company. Your task is to analyze the data and answer some of the HR’s questions.

Learning Outcomes of the Project:

Conduct data analysis and handle a case that resembles the actual tasks a data analyst may encounter at their job. Master data merging, grouping, aggregation functions, and draw up pivot tables using the pandas functionality.

Learning Outcomes of Each Stage of the Project:

Stage 1 : Learn how to load data from the XML format, explore, and reindex it properly.

Stage 2 : Practice how to merge several datasets into a big one.

Stage 3 : Master the pandas methods to extract insights from the data.

Stage 4 : Let's try aggregating Pandas DataFrames, which allows you to quickly find different metrics, such as the mean or standard deviation across other columns.

Stage 5 : Explore how to generate pivot tables with Pandas in order to summarize data.

General Info

To learn more about this project, please visit HyperSkill Website - HR Data Analyst.

This project's difficulty has been labelled as Hard where this is how HyperSkill describes each of its four available difficulty levels:

  • Easy Projects - if you're just starting
  • Medium Projects - to build upon the basics
  • Hard Projects - to practice all the basic concepts and learn new ones
  • Challenging Projects - to perfect your knowledge with challenging tasks

This Repository contains one .py file and one folder:

code.py - Contains the code used to complete the data analysis requirements

Data repository - Contains the three .xml files that contain the data: A_office_data.xml, B_office_data.xml and hr_data.xml

Project was built using python version 3.11.3

Description of Data Sets

For A_office_data.xml and B_office_data.xml:

  • number_project — number of projects an employee has worked on;
  • average_monthly_hours — typical workload per month in hours;
  • time_spend_company — how many years an employee has worked in the company;
  • Work_accident — whether an employee has had an injury at work;
  • promotion_last_5years — whether an employee has had any promotions during the last five years;
  • Department — employee's department;
  • salary — employee's salary rate;
  • employee_office_id — employee's ID (1, 2, 3, etc.).

For hr_data.xml:

  • satisfaction_level — how well an employee performs their job;
  • last_evaluation — the last evaluation score of an employee;
  • left — whether an employee has left the company;
  • employee_id — employee's ID in the company (A125 — from the A office; 125 in this case, is employee_office_id).

How to Run

Download the files to your local repository and open the project in your choice IDE and run the project. The different data frames and their dictionary form will be printed on the console according to the requirements stated in each stage's docstring. Please read each Stage's docstring to know the requirements.

Releases

No releases published

Packages

No packages published

Languages