In this project, we will combine and practice implementing what we have learned throughout this course, including:
-
SQL queries using pgadmin
-
Cleaning data using python
-
Creating an interactive dashboard with tableau
-
Apply your bootcamp learnings into a single end-to-end project: Data retrieval, EDA / cleaning, Statistical modeling (optional), Tableau
-
Main deliverable: Tableau dashboard(s) and presentation, Jupyter notebooks if used
├── data <- Data Folder
│ ├── cleaned_data_india_credit.csv <- Cleaned data source
│ ├── credit_card_india.csv <- Oringinal data source
│
├── output <- Images Used in the Project and output files
│ ├── Credit Dash.png <- dashboard snapshot
│ ├── sql_output.md <- output of sql queries
│ ├── ERD_diagram.png <- Entity relationship diagram
│
├── src/modules <- Source Code
│ ├── cleaning_data.ipynb <- code for data cleaning
│ ├── SQL_Credit_card.ipynb <- code for sql queries
│ └── EDA.ipynb <- EDA file
│ └── credit-card-spending-ml.ipynb <-machine learning files
│ └── Credit_viz.twbx <- tableau file with dashboard
│ ├── streamlit <- Folder For All The Streamlit App Code
│ ├── app.py <- viz app for different plots
│ ├── eda_app.py <- pandas profiling
│ ├── requirements.txt<- list of all dependencies
│
├── __init__.py <- Package Initializer
└── README.md <- Project Documentation
- Dataset - https://www.kaggle.com/datasets/thedevastator/analyzing-credit-card-spending-habits-in-india
-
Step 1: Aquire dataset and import into jupyter notebook, clean dataset and export file
-
Step 2: Use cleaned file and import into tableau and SQL
-
Step 3: Run sql queries to answer questions like top 5: Cities with fraud, which gender has most fraud, what credit card had most fraud
-
Step 4: Make an interactive dashboard based on city and date on tableau
- Please refer to the following:
- SQL Queries MD File
- ERD Diagram
- This gif showcases the pandas profiling module deployed on the streamlit that allows you to do EDA by uploading a dataset
- This allows you to do EDA with any dataset uploaded, it will create various plots for you to conduct analysis, also deployed on streamlit
- Test out the app here: EDA app deployment on streamlit cloud
- Note: the app does not use the uploaded dataset for some reason, it is supposed to do what is in the gif above, but you can use it with local host
To run the Streamlit App, run the following command:
streamlit run app.py
- Note for the above, you need to be in the correct folder
https://docs.google.com/presentation/d/1zzXzLE6kJSKPbSglUbs9xWFQATFGJms0aH-3kAz8Uek/edit#slide=id.p
https://public.tableau.com/app/profile/gurmol.sohi/vizzeshttps://public.tableau.com/app/profile/gurmol.sohi/viz/Credit_viz/Dashboard1
- If you click on the city, it will adjust all the other graphs, you can also select specific months to gain insight
- Mumbai and Bengaluru had greatest spending,
- Most spending was on bills, food, fuel
- Highest percentage of gold type was in Zira
- Lowest gold card type was in Achalpur
- The highest spend month was in August with a Platinum card
- Greater Mumbai had the highest expense type with Bills and lowest with entertainment
- Greater Mumbai had 14% of the total spending of the whole dataset
- 01/2015 had the highest amount of spending
- It was a challenge to create interactivty on tableau
- had lots of issues creating a db with sql-lite to begin with and running queries
- importing the file from a different folder into cleaning ipynb caused some issues
- streamlit was not working at first, issues with pandas profiling library
- adding a state column in excel online to make the tableau dashboard better
- develop a ML model and make predictions on credit fraud
- compare to other datasets and other spending habits in different countries
- customer segmentation, credit risk, credit fraud detection (anomalies), credit approval projects coming soon