Skip to content

ChristianJavierMelo/EDA-visualization-with-Tableau

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍Exploratory Data Analysis of diamonds

The main objective of this project is to present a report based on a dataset about main features of diamonds and his price. In this first approach, the data will be analyzed and how some variables are related to others.

Alt Text

Table of Contents

  1. Requirements
  2. Analysis
  3. Dashboard
  4. Troubleshooting
  5. Licence

Requirements 📄

Clone this repository

git clone https://github.com/ChristianJavierMelo/ih_datamadpt0420_project_m2.git

Exploratory analysis is done in a jupyter notebook file named data_analysis_report.ipynb based on the dataset located into data folder and named diamonds_train.csv.

Technology stack

it is necessary to have the following libraries installed to make use of the exploratory analysis

  • Python
  • Pandas
  • Matplotlib
  • Seaborn

External source

see the next website to have a knowledge about the feature of diamonds. In summarize:

  1. Carat: The term carat actually refers to the diamond's total weight and not its size.
  2. Cut: The most important of the 4Cs is cut because it has the greatest influence on a diamond's sparkle.
  3. Color: The second most important of the 4Cs is color, which refers to a diamond's lack of color. The less color, the higher the grade.
  4. Clarity: Often the least important of the 4Cs because the tiny imperfections are often microscopic.
  5. Depth: The height of a gemstone measured from the culet to the table
  6. Table: The largest facet of a gemstone

Alt Text

Analysis 🔍

The analysis that has been carried out is based on a statistical study on the data belonging to the diamonds sample dataset from kaggle. The analysis has been structured in two parts:

  • Basic analysis: It is the first approach based on a study to know the data we have
  • Exploratory analysis: It is the second approach and it is focused on the price variable compared with the other features to have a predictive model in the future. In this second approach, the values are represented graphically to better understand the data.

Basic analysis

I have made a first approach in the analysis, being a basic analysis to know the type of data we have in the whole dataset. To carry out this test, I have made a statistical analysis to find out the most common values when it comes to showing the characteristics of the dataset, classifying the categorical and numerical variables. I have also carried out a study of the maximum and minimum values for each variable and thus better understand the extreme values of the dataset.

Exploratory analysis

In this second approach, I have focused the analysis surround the price variable compared with the other variables to know if exist any correlations between them. For it, I have built a correlation matrix plot to know the correlation between variables. after that I compared the numerical variables and the categorical variables to know one by one the established relationship to know the best relations with price variable. With those data I will predict the price of the diamonds according tho his main features.

Conclusion

It is deduced in these approximations that the price has a strong correlation with the carat variable and relevant information is extracted with the combination of the attributes classified as categorical for diamonds.

Dashboard 📈

Alt Text

Dashboards are powerful tools for communicating important information at-a-glance. The goal of this challenge is to build a data dashboard using our diamonds dataset that will help myself to perform better during Module 3 project.

In the following link you will find a single interactive interface built around a specific objetive understanding a group of relationship between diamonds attributes (features) and its price.

please visit my public tableau profile and take a look the project named diamonds_project2

Troubleshooting

Browse issues: https://github.com/ChristianJavierMelo/ih_datamadpt0420_project_m2/issues

If you have any suggest or doubt, you can contact with me via email: "[email protected]"

Licence

Licence: The Unlicense