31-Day-of-Data-Analytic-Project

Day	Project	Tool Used
1	BlinkIt Sales Analysis	SQL
2	IMDB Movie Analysis	Python
3	Analysis of Highest Paying IT Jobs In India	Python
4	Understanding of Excel	Excel
5	World Mining Commodities	SQL
6	Vrinda Store Sales Analysis	Excel & SQL
7	Pizza Sales Analysis	PowerBI & SQL
8	Customer Churn Analysis	Python
9	BlinkIt Grocery Sales Analysis	Excel
10	Electric Vehicle Data Analysis	Tableau
11	Automobile Pricing Analysis	Python(Data Wrangling, Model Development)
12	Diwali Sales Analysis	Excel, SQL, Python, Power BI
13	Laptop Pricing Analysis	Python(EDA, Model Development)
14	Model Evaluation and Refinement	Python(Model Evaluation, Over-fitting, Under-fitting and Model Selection, Ridge Regression,Grid Search)
15	Heart Failure Prediction	Machine Learning (Random Forest, Desicion Tree, SVM, K-nearest Neighbor, Naive Bayes)
16	Target Sales Analysis	SQL & Python
17	Coffee Sales Report	Power BI, Excel & SQL
18	Bank Loan Report	Excel & Power BI
19	CarDekho Price Prediction	Python
20	HR Analytic Dashboard	Tableau
21	Spotify Music Streaming Dataset Analysis	SQL
22	Black Friday Sales Prediction	Machine Learning (Decision Tree, Random Forest, Extra Trees, XGBRegressor, Ridge, Linear Regression)
23	IPL Data Analysis 2008-2022	Power BI
24	WebScraping	Python
25	Credit Card Transaction Report	Power BI, SQL
26	OCD Dataset Analysis	SQL, Power BI
27	Book Recommendation System	Python
28	HR Summary Dashboard	Tableau
29	Advance SQL Queries	SQL
30	Retail Store Inventory and Demand Forecasting	SQL, Python

Day 1: BlinkIt E-commerce Sales Analysis

Shop online for groceries and get your order delivered at your doorstep in minutes. Enjoy instant delivery with blinkit. Blinkit primarily delivers groceries, fresh fruits, vegetables, meat, stationery, bakery items, personal care, baby care and pet care products, snacks, flowers, etc.

Features

The data is available in 8 csv files:

customers.csv
orderdetails.csv
orders.csv

Objectives

This project focused on addressing key business challenges for Blinkit, including revenue optimization, customer retention, and operational efficiency. The main goals were:

Revenue Insights: Pinpoint top-performing products and cities generating the highest revenue.
Customer Analysis: Understand customer behavior by ranking customers based on spending and identifying inactive customers.
Order Trends: Uncover patterns in order statuses (delivered, canceled) and customer behavior, like placing multiple orders in a day.
Operational Insights: Optimize service delivery and improve operational efficiency by analyzing location-specific purchase data.

Dataset Schema

Day 2: IMDB Movie Analysis

IMDb (an initialism for Internet Movie Database) is an online database of information related to films, television series, podcasts, home videos, video games, and streaming content online – including cast, production crew and personal biographies, plot summaries, trivia, ratings, and fan and critical reviews.

Dataset

The dataset used in this analysis can be found in this repository files : imdb_movies.csv.

Objective

This project analyzes the IMDB Movie Dataset to uncover trends, popular genres, and factors influencing movie success. Using Python and libraries like Pandas, Numpy, Matplotlib and Seaborn, the analysis delivers actionable insights through comprehensive data exploration and visualization.

Task Covered

The task description of this analysis can be found in this repository files : Task Covered.

Project Setup and Data Loading
Data Overview and Cleaning
Univariate and Bivariate Analysis
Genre-Specific Analysis
Yearly and Decadal Trends
Insights and Summary
Documentation

Day 3: Analyis of Highest Paying IT Jobs in India

This project anlayzed the highest paying IT jobs in India, focusing on job positions, location, salaries, education, and experience levels. It provides insights into salary trends, correltations and distributions, helping to understand the IT sector's job market and the factors influencing compensation.

Dataset

The dataset used in this analysis can be found in this repository file: position_salary.csv

Objective

This project provides key insights into the IT job market in India, analyzing factors like job positions, salary trends, and the impact of education and experience on compensation. It helps professionals understand what roles offer the highest pay and the essential skills required for those positions. The analysis will focus on job positions, locations, salaries, educational qualifications, and experience levels.

Task Covered

Project Setup and Data Loading
Exploratory Data Analysis
Data Cleaning
Solved Basic Level Quesitons and Interpretation
Solved Intermediate Level Question and Interpretation

Day 4: Understanding of Excel

Microsoft Excel is a spreadsheet editor developed by Microsoft.

This section content the understanding of Excel and its capability. (PS: I used my own created data for understanding and using of different excel formulas)

Excel Formula: SUM, SUMIF, SUMIFs, COUNT, COUNTIF, COUNTIFs, MAX, MIN, CONCATENATE, RIGHT, LEFT, LEN, NETWORKDAYS
Condition Formula: IF, IFs,Condition Formatting
Date Formattion; Date-to-Text, Text-to-Date
XLOOKUP & VLOOKUP
Pivot Table

Day 5: World Mining Commodities

This project is a comprehensive SQL-based analysis of the mining industry ideal for beginners and intermediate learners. This project is centered around analyzing them mining industry using real-world dataset from the World Mining Commodities dataset repository. It involoes exploring the mining companies, their operaitonal data, and country-level mining statistics to generate insights and develop SQL proficiency.

Objective

To analyze global mining trends, understand commodity production levels, and evaluate company performance using SQL.

Dataset

The dataset used in this analysis can be found in this repository file: i. 116_world_mining_companies_clean.csv ii. world_mining_commodities_clean.csv

Day 6: Vrinda Store Sales Anlaysis (Excel & SQL)

I analyzed 30,000+ records from Vrinda Store, an online clothing retailer to develop a data-driven annual sales report. This project present insights from the analysis of Vrinda Store's 2022 data, integrating sales performance, customer demographics, and operational metrics.

Dataset

The dataset used in this analysis can be found in this repository file: Vrinda Store Data Analysis Excel.xlsx

Objective

The objective is to create a comprehensive annual sales report to guide strategic decisions and enhance sales in 2023.

Steps and Method:

🔹Data Cleaning and Preprocessing: Removed duplicates, corrected errors, structured sales data for analysis 🔹Data Analysis: Identified sales trends, top-performing products, and customer preference 🔹Data Visualization & Reports: Created interactive dashboard using hashtag#Excel (pivot tables, charts, & advanced formulas) 🔹SQL (CTE, Group By, Aggregate Functions, Rank Functions) 🔹Insights & Documentation: Provided actionable insights for data-driven decision-making

Key Insight

Adults are the largest consumer group, followed by young and the elderly ones.
Sales decline month over month, singaling a need for proactive stategies.
Amazon tops the charts as the best-selling partner.
Maharashtra emerges as the top-performing state with sales of ₹2,990,221.
March recorded the highest revenue of ₹1,928,066 with 2,819 orders, indicating seasonal trends.
Order Status: 28,641 orders delivered successfully, while only 844 were cancelled and 517 were refunded, showcasing high delivery efficiency

Day 7: Pizza Sales Analysis (PowerBI & SQL)

This project involves leveraging SQL queries & building PowerBI Dashboard to analyze 𝐩𝐢𝐳𝐳𝐚 𝐬𝐚𝐥𝐞𝐬 data, aiming to uncover valuable insights that drive strategic decision-making and optimize business operations.

Dataset

The dataset used in this analysis can be found in this repository file: pizza_sales.csv

𝐎𝐛𝐣𝐞𝐜𝐭𝐢𝐯𝐞:-
The objective of this project is to analyze 𝐩𝐢𝐳𝐳𝐚 🍕 𝐬𝐚𝐥𝐞𝐬 data to identify trends and provide actionable insights that can help to increase sales and aim to uncover key metrics and patterns within the sales data by leveraging SQL queries & by building PowerBI Dashboard

Key Highlights:

Total revenue: $817860.05
Expensive pizza: The Greek pizza
Top Category: Classic
Busiest Hour: 12:00
Average Order Value: $17
Highest Revenue Contribution: Classic Pizza

Insights:

Large pizzas are the most commonly ordered size, accounting for 𝟏𝟖𝟓𝟐𝟔 orders.
The Classic Deluxe Pizza is the most ordered type, totaling 𝟐𝟒𝟓𝟑 orders.
Classic pizza dominates total revenue with a contribution of 𝟐𝟔.𝟗𝟏%.
The 𝐓𝐡𝐚𝐢 𝐂𝐡𝐢𝐜𝐤𝐞𝐧 𝐏𝐢𝐳𝐳𝐚 emerges as a top revenue generator.
𝐀𝐯𝐞𝐫𝐚𝐠𝐞 𝐩𝐢𝐳𝐳𝐚 sales stand at 𝟏𝟑𝟖.
The Thai Chicken Pizza ($43434.25), Barbecue Chicken Pizza ($42768) and California Chicken Pizza ($41409.5) generate the highest revenue.

Day 8: Techno Customer Churn Analysis(Python)

Objective:

The analysis explores customer churn patterns, focusing on various factors such as payment methods, contract types, tenure, and demographic attributes. The goal is to identify which factors are most strongly associated with higher churn rates to guide customer retention strategies.

Dataset

The dataset used in this analysis can be found in this repository file: Customer Churn.csv

Key Insights & Findings:

● Contract Type and Churn: - Customers on month-to-month contracts exhibit the highest churn rate, with 42% of such customers likely to churn. - In contrast, customers on one-year and two-year contracts have churn rates of 11% and 3%, respectively. - Implication: Longer contract periods serve as a strong retention tool, as customers with extended commitments are far less likely to leave.

● Payment Methods and Churn:

Customers paying via electronic checks show the highest churn rate at 45%, while those using credit cards, bank transfers, or mailed checks have significantly lower churn rates, averaging around 15-18%.
Implication: The convenience, security, and trust issues related to electronic payments might be contributing factors. Encouraging customers to switch to more stable payment methods could reduce churn.

● Churn by Tenure:

Customers with less than one year of tenure are the most likely to churn, with a 50% churn rate. Those with 1-3 years of tenure show a decreasing churn trend at 35%, while customers who have been with the company for more than three years have a churn rate of just 15%.
Implication: Engaging customers early in their journey, especially within the first year, is critical for retention.

● Churny Internet Service Type:

Customers using Fiber Optic services show a higher churn rate of 30%, compared to DSL customers with a churn rate of 20%.
Implication: This could be due to increased competition or dissatisfaction with service quality. Understanding customer satisfaction with service speed and reliability may help retain fiber optic users.

● Senior Citizens and Churn:

The analysis reveals that senior citizens (aged 65+) have a churn rate of 41%, compared to a 26% churn rate among non-senior citizens.
Implication: Special retention programs and targeted customer service for senior customers may help reduce churn in this demographic.

Recommendations:

- PromoteLong-Term Contracts: Offer incentives for customers to commit to longer contracts to reduce churn. 
- AddressPayment Method Concerns: Implement campaigns encouraging customers to switch from electronic checks to more reliable payment methods. 
- CustomerEngagement in Early Tenure: Focus on improving the customer experience within the first year, as churn is highest in this period. 
- Special Senior Citizen Retention Programs: Create personalized offers or assistance programs to retain the senior demographic.

Day 9: Blinkit Grocery Store Sales Analysis (Excel)

Using Excel, I designed a dynamic and interactive dashboard providing a deep dive into Blinkit's Grocery sales data. Here's what the dashboard highlights:

Excel Worksheet: BlinkIT Grocery Data Excel

Read Documentation: Blinkit Grocery Sales Analysis Documentation

Key Insights and Recommendations

Focus on High-Selling Categories: Increase marketing efforts for Fruits and Vegetables and Snack Foods, which together contribute nearly 30% of total sales.
Expand Regular Fat Content Offerings: As this category constitutes 64.6% of total sales, diversifying products within this segment can further boost revenue.
Tier 1 Locations and High-Size Outlets: Invest in these high-performing areas and outlet types to maximize returns.
Stabilize Seasonal Trends: Identify reasons behind sales fluctuations and implement strategies like discounts or promotional campaigns to sustain consistent revenue across all years.
Customer Retention: Maintain the high average customer rating of 4.0 by focusing on quality and timely delivery, ensuring continued customer satisfaction.

Day 10: Electric Vehicle Data Analysis(tableau)

The is an in-depth analysis of electric vehicle (EV) adoption across different dimensions, including EV types, vehicle make, model, and state-wise distribution. With 149,771 total EVs(2011 - 2023), the dataset highlights significant trends and insights about the growing EV market.

Dataset

Download Dataset from kaggle: Dataset

Key Insights and Market Trends

Battery Electric Vehicles (BEVs) account for 116,474 units, making up 77.8% of the total EVs whereas Plug-in Hybrid Electric Vehicles (PHEVs) contribute 33,297 units or 22.2% of the total EVs.
EV adoption has grown significantly since 2011. Total vehicle registrations rose from 0.8K in 2011 to 37.1K in 2023, representing a 4,538% increase.
Tesla dominates the market with 68,939 vehicles (57.6%) of the total EVs.
41.87% (62,711 vehicles) are CAFV eligible, and 11.77% (17,634 vehicles) are not CAFV eligible, indicating room for improvement in meeting CAFV standards.

Recommendations

Diversify Model Offerings:
- Encourage other manufacturers to innovate and compete with Tesla to diversify market options.
- Promote more affordable models for increased accessibility.
Range Optimization:
- Invest in R&D to improve EV range and address consumer concerns about distance limitations.
- Highlight vehicles with exceptional range to attract new customers.

Day 11: Automobiles Pricing Analysis (Python: Data Wrangling & Model Development)

Part 1: Data Wrangling

I used data wrangling to covert raw data from initial format to a format that may be better for analysis and future model development.

Objectives:

Handling missing values
Correct data formatting
Normalize data

Part 2: Model Development

In this task, I developed several models that predicted the price of the car using the variables or features. This is just an estimate but should give us an objective idea of how much the car should cost. A model will help us understand the exact relationship between different variables and how these variables are used to predict the result.

Objectives:

Develop prediction models

Day 12: Diwali Sales Report (Excel, Python, Power BI, SQL)

I delved deep into the fascinating world of festival sales to uncover valuable insights. from analyzing gender and occupation preferences to exploring the impact of marital status and age group on product categories, I thoroughly explored Diwali shopping trends to understand them completely.

The first part of this project is Excel: I used Excel to clean my dataset and make it ready for Visualization. Through this visualization, one can simply know what hero products are, for what age group, what kind of occupation, and location of customer.

The second part of my project is using of SQL: I rewind my SQL knowledge to clean the dataset and make it ready for analysis. Objectives: The objective of this process was to clean and prepare the raw Diwali sales dataset for analysis by addressing missing values, duplicates, and other inconsistencies.

Key Recommendations:

Target Marketing Campaigns:

Focus on female customers and the 18–35 age group.
Offer exclusive promotions for IT and healthcare professionals.

Geographical Expansion:

Strengthen presence in top-performing states while improving reach in underperforming regions.

Product Strategy:

Stock up on high-demand products like food and clothing during festive seasons.
Promote low-performing categories like Tupperware and stationery through bundled offers.

Loyalty Programs:

Develop loyalty initiatives for top customers to ensure retention and increased lifetime value.

Day 15: Heart Failure Prediction

Using a variety of machine learning models such as Random Forest, K-Nearest Neighbor, Decision Tree and others, I analyzed the dataset and identified the Decision Tree models as the best-performing algorithm with an impressive accuracy of 88.9%.

Key Highlights:

Data Analysis: Segmented the dataset into categorical and continuous variables for targeted analysis.
Visualization Insights: Leveraged donut chart to analyze categorical data and highlight trends.
Model Evaluation: Tested multiple models and compared their performance to select the most accurate predictor.
Technical Stack: Python, Pandas, NumPy, Matplotlib, Seaborn, Scikit-Learn

Day 17: Coffee Sales Report(Power BI, Excel, & SQL)

Documentation

The goal of analyzing transaction data is to get important insights regarding customer behavior, product popularity, sales patterns, and operational efficiencies. The purpose is to optimize inventory management, improve decision-making processes, and find possible cross-selling opportunities.

Project Task:

Data collection, cleaning, and preparation
Analyse monthly, daily, and hourly sales patterns with Power PivotTables.
Determine high-performing days and times.
Develop compelling reports and visualization
Create an Interactive dashboard

Tools used:

Excel
Power BI
MS SQL

Dataset:

Download Dataset

Key Takeaways:

Top Sellers: Coffee leads with 25% of total sales, followed by Bakery and Tea.
Store Performance: Our Astoria and Hell’s Kitchen locations are top performers, each contributing over 34% of total sales.
Monthly Trends: We find the steady growth in each monthly continuously increasing their sales.
Customer Habits: Peak sales times are between 8 AM and 11 AM on weekdays, aligning with the morning coffee rush.
Product Trends: Coffee beans and branded products saw the highest month-on-month growth, while sustainably grown coffee remains our top-seller.

Conclusion:

Peak transaction times, high-performing days, and areas for improvement are identified. By leveraging these findings,you can enhance customer experiences, refine inventory management, and boost overall sales efficiency for a thriving coffee shop venture.

Day 18: Bank Loan Report (Excel & Power BI)

This dashboard provides a comprehensive analysis of the bank’s loan performance, highlighting key metrics such as total loan applications, loan amounts, amount received, interest rates, and debt-to-income ratio (DTI).

Dashboard Summary

Dataset

Download Dataset

Key Highlights

Total Loan Applications: 38.6K, with 4.3K applications received MTD (Month-to-Date), reflecting 6.9% growth.
Total Loan Amount Funded: $435.8M, with $54.0M funded MTD, indicating 13.0% increase.
Total Amount Received: $473.1M, with $58.1M received MTD, showing 15.8% increase.
Average Interest Rate: 12.05%, slightly lower than MTD rate of 12.36%.
Average Debt-to-Income Ratio (DTI): 13.33%, with an MTD reduction to 13.67% (-2.7%).

Loan Performance Analysis

1. Good Loan Issued: Strong Performance
- 85.9% of total loans are classified as good loans, demonstrating high-quality lending practices.
- 33.2K good loan applications resulted in a total funded amount of $370.2M, with $435.8M successfully received.
- Key Strengths:
  - Strong repayment trends.
  - Efficient fund disbursement and collection.
  - High proportion of fully paid loans.
1. Bad Loan Issued: Areas of Concern
- 14.1% of loans are bad loans, indicating some degree of credit risk.
- 5.3K applications resulted in a funded amount of $65.5M, but only $37.3M has been recovered, leading to potential losses.
- Key Concerns:
  - Higher non-performing loan (NPL) ratio in certain segments.
  - Potential need for stricter credit risk assessments.

Areas of Improvement & Recommendations

1. Reduce Bad Loan Issuance (14.1%)
- Implement stricter credit checks and borrower risk assessments.
- Strengthen loan approval processes to minimize high-risk lending.
- Offer financial literacy programs to borrowers.
1. Improve Loan Recovery & Reduce Charge-Offs
- Enhance debt collection strategies.
- Introduce flexible repayment options for struggling borrowers.
- Use predictive analytics to detect potential defaults early.
1. Optimize Interest Rates & DTI Ratio
- Adjust interest rates based on borrower credit scores and risk factors.
- Maintain DTI ratio below 13% to ensure borrower affordability.
1. Enhance Loan Portfolio Diversification
- Expand lending to low-risk segments with stable income sources.
- Diversify loan products to cater to different borrower profiles.
1. Increase Efficiency in Loan Disbursement & Collection
- Automate loan approval and repayment tracking for faster processing.
- Leverage AI-based fraud detection to prevent high-risk lending.

Day 20: HR Analytic Dashboard (Tableau & Power BI)

I've developed an interactive Tableau & Power BI dashboard that focuses on employee attrition trends. This dashboard provides a deep dive into attrition data, enabling HR teams to identify key factors contributing to turnover and take proactive steps to improve retention strategies.

Tableau :

Download Dataset

This dashboard provides a clear and concise view for HR teams, aiding in decision-making process related to promotions, retrenchments, and workforce distribution.

Day 21: Spotify Data Analysis (SQL)

This project involves analyzing a spotify dataset having various attributes about albums, tracks, streams, views, likes, artists and other components using SQL. It covers the end-to-end process of normalizing a denormalized dataset, performing SQL queries of varying complexity (easy, medium, and advanced), and optimizing query performance. The primary goal of the project is to practice advanced SQL skills and generate valuable insights from the dataset.

Download Dataset

15 Practice Questions

Easy Level

Retrieve the names of all tracks that have more than 1 billion streams.
List all albums along with their respective artists.
Get the total number of comments for tracks where licensed = TRUE.
Find all tracks that belong to the album type single.
Count the total number of tracks by each artist.

Medium Level

Calculate the average danceability of tracks in each album.
Find the top 5 tracks with the highest energy values.
List all tracks along with their views and likes where official_video = TRUE.
For each album, calculate the total views of all associated tracks.
Retrieve the track names that have been streamed on Spotify more than YouTube.

Advanced Level

Find the top 3 most-viewed tracks for each artist using window functions.
Write a query to find tracks where the liveness score is above the average.
Use a WITH clause to calculate the difference between the highest and lowest energy values for tracks in each album.
Find tracks where the energy-to-liveness ratio is greater than 1.2.
Calculate the cumulative sum of likes for tracks ordered by the number of views, using window functions.

Day 25: Credit Card Transaction Report (Power BI)

Steps:

Data Extraction: Retrieved customer and transaction data from an SQL database.
Data Preparation: Imported the cleaned data into Power BI for visualization.
Data Visualization: Created interactive dashboards in Power BI to visualize key metrics and trends.

Week-over-Week Changes: 📅

Revenue increased by 28.8%
Total Transaction Amount & Count rose by 35% & 12.8%
Customer count grew by 12.8%

Customer Report

Year-to-Date Overview: 💻

Overall revenue: $56.5M
Total interest: $8M
Total transaction amount: $45.5M
Male customers contributed $30.9M in revenue, while female customers contributed $25.6M
Blue & Silver credit cards accounted for 93% of overall transactions
TX, NY, & CA contributed 68% to the total transactions
Overall Activation rate: 57.5%
Overall Delinquent rate: 6.06%

Insights: Revenue Analysis: 💠 🌻

Quarterly Trends: Stable revenue across quarters, with slight decrease in Q3 and Q4.
Expenditure Types: Highest revenue from bills, entertainment, fuel, groceries, and travel.
Demographics: Higher revenue from graduates, High-School, businessmen, and white-collar workers.
Card Types: Blue cardholders contribute the most revenue.
Transaction Methods: Swipe transactions generate the most revenue.
Customer Job: Businessmen contribute the most revenue followed by White-collar

Customer Details

Overview

Developed an interactive dashboard using transaction and customer data from a SQL database, to provide real-time insights.
Streamlined data processing & analysis to monitor key performance metrics and trends.
Shared actionable insights with stakeholders based on dashboard findings to support decision-making process.

Income Report

Day 26 OCD Health Dataset Analysis (SQL, Power BI)

Day 28: HR Dashboard (Tableau)

Overview

The Overview section should provide a snapshot of the overall HR metrics, including:

Display the total number of hired employees, active employees, and terminated employees.
Visualize the total number of hired and terminated employees over the years.
Present a breakdown of total employees by department and job titles.
Compare total employees between headquarters (HQ) and branches (New York is the HQ)
Show the distribution of employees by city and state.

Demographics

The Demographics section should offer insights into the composition of the workforce, including:

Present the gender ratio in the company.
Visualize the distribution of employees across age groups and education levels.
Show the total number of employees within each age group.
Show the total number of employees within each education level.
Present the correlation between employees’s educational backgrounds and their performance ratings.

Income

The income analysis section should focus on salary-related metrics, including:

Compare salaries across different education levels for both genders to identify any discrepancies or patterns.
Present how the age correlate with the salary for employees in each department.

Day 29 Advnace SQL Queries

A comprehensive collection of SQL scripts for data exploration, analytics, and reporting. These scripts cover various analyses such as database exploration, measures and metrics, time-based trends, cumulative analytics, segmentation, and more.

This repository contains Advace SQL queries like CTE, window function, subqueries, & report designed to help data analysts and BI professionals quickly explore, segment, and analyze data within a relational database. Each script focuses on a specific analytical theme and demonstrates best practices for SQL queries.

Cumulative Analysis

Purpose:
    - To calculate running totals or moving averages for key metrics.
    - To track performance over time cumulatively.
    - Useful for growth analysis or identifying long-term trends.

SQL Functions Used:
    - Window Functions: SUM() OVER(), AVG() OVER()
===============================================================================

-- Calculate the total sales per month and the running total of sales over time 
SELECT
	order_date,
	total_sales,
	SUM(total_sales) OVER (ORDER BY order_date) AS running_total_sales,
	AVG(avg_price) OVER (ORDER BY order_date) AS moving_average_price
FROM
(
    SELECT 
        DATETRUNC(year, order_date) AS order_date,
        SUM(sales_amount) AS total_sales,
        AVG(price) AS avg_price
    FROM DataWarehouseAnalytic..[gold.fact_sales]
    WHERE order_date IS NOT NULL
    GROUP BY DATETRUNC(year, order_date)
) Y;

Performance Analysis (Year-over-Year, Month-over-Month)

Purpose:
    - To measure the performance of products, customers, or regions over time.
    - For benchmarking and identifying high-performing entities.
    - To track yearly trends and growth.

SQL Functions Used:
    - LAG(): Accesses data from previous rows.
    - AVG() OVER(): Computes average values within partitions.
    - CASE: Defines conditional logic for trend analysis.
===============================================================================
/* Analyze the yearly performance of products by comparing their sales 
to both the average sales performance of the product and the previous year's sales */

WITH yearly_product_sales AS (
    SELECT
        YEAR(f.order_date) AS order_year,
        p.product_name,
        SUM(f.sales_amount) AS current_sales
    FROM DataWarehouseAnalytic..[gold.fact_sales] f
    LEFT JOIN DataWarehouseAnalytic..[gold.dim_products] p
        ON f.product_key = p.product_key
    WHERE f.order_date IS NOT NULL
    GROUP BY 
        YEAR(f.order_date),
        p.product_name
)
SELECT
    order_year,
    product_name,
    current_sales,
    AVG(current_sales) OVER (PARTITION BY product_name) AS avg_sales,
    current_sales - AVG(current_sales) OVER (PARTITION BY product_name) AS diff_avg,
    CASE 
        WHEN current_sales - AVG(current_sales) OVER (PARTITION BY product_name) > 0 THEN 'Above Avg'
        WHEN current_sales - AVG(current_sales) OVER (PARTITION BY product_name) < 0 THEN 'Below Avg'
        ELSE 'Avg'
    END AS avg_change,
    -- Year-over-Year Analysis
    LAG(current_sales) OVER (PARTITION BY product_name ORDER BY order_year) AS py_sales,
    current_sales - LAG(current_sales) OVER (PARTITION BY product_name ORDER BY order_year) AS diff_py,
    CASE 
        WHEN current_sales - LAG(current_sales) OVER (PARTITION BY product_name ORDER BY order_year) > 0 THEN 'Increase'
        WHEN current_sales - LAG(current_sales) OVER (PARTITION BY product_name ORDER BY order_year) < 0 THEN 'Decrease'
        ELSE 'No Change'
    END AS py_change
FROM yearly_product_sales
ORDER BY product_name, order_year;

Data Segmentation Analysis

Purpose:
    - To group data into meaningful categories for targeted insights.
    - For customer segmentation, product categorization, or regional analysis.

SQL Functions Used:
    - CASE: Defines custom segmentation logic.
    - GROUP BY: Groups data into segments.
===============================================================================

/*Segment products into cost ranges and count how many products fall into each segment*/

WITH product_segments AS (
    SELECT
        product_key,
        product_name,
        cost,
        CASE 
            WHEN cost < 100 THEN 'Below 100'
            WHEN cost BETWEEN 100 AND 500 THEN '100-500'
            WHEN cost BETWEEN 500 AND 1000 THEN '500-1000'
            ELSE 'Above 1000'
        END AS cost_range
    FROM DataWarehouseAnalytic..[gold.dim_products]
)
SELECT 
    cost_range,
    COUNT(product_key) AS total_products
FROM product_segments
GROUP BY cost_range
ORDER BY total_products DESC;

/*Group customers into three segments based on their spending behavior:
	- VIP: Customers with at least 12 months of history and spending more than €5,000.
	- Regular: Customers with at least 12 months of history but spending €5,000 or less.
	- New: Customers with a lifespan less than 12 months.
And find the total number of customers by each group
*/
WITH customer_spending AS (
    SELECT
        c.customer_key,
        SUM(f.sales_amount) AS total_spending,
        MIN(order_date) AS first_order,
        MAX(order_date) AS last_order,
        DATEDIFF(month, MIN(order_date), MAX(order_date)) AS lifespan
    FROM DataWarehouseAnalytic..[gold.fact_sales] f
    LEFT JOIN DataWarehouseAnalytic..[gold.dim_customers] c
        ON f.customer_key = c.customer_key
    GROUP BY c.customer_key
)
SELECT 
    customer_segment,
    COUNT(customer_key) AS total_customers
FROM (
    SELECT 
        customer_key,
        CASE 
            WHEN lifespan >= 12 AND total_spending > 5000 THEN 'VIP'
            WHEN lifespan >= 12 AND total_spending <= 5000 THEN 'Regular'
            ELSE 'New'
        END AS customer_segment
    FROM customer_spending
) AS segmented_customers
GROUP BY customer_segment
ORDER BY total_customers DESC;

Part-to-Whole Analysis

/*
Purpose:
    - To compare performance or metrics across dimensions or time periods.
    - To evaluate differences between categories.
    - Useful for A/B testing or regional comparisons.

SQL Functions Used:
    - SUM(), AVG(): Aggregates values for comparison.
    - Window Functions: SUM() OVER() for total calculations.
===============================================================================
*/
-- Which categories contribute the most to overall sales?

WITH category_sales AS (
    SELECT
        p.category,
        SUM(f.sales_amount) AS total_sales
    FROM DataWarehouseAnalytic..[gold.fact_sales] f
    LEFT JOIN DataWarehouseAnalytic..[gold.dim_products] p
        ON p.product_key = f.product_key
    GROUP BY p.category
)
SELECT
    category,
    total_sales,
    SUM(total_sales) OVER () AS overall_sales,
    ROUND((CAST(total_sales AS FLOAT) / SUM(total_sales) OVER ()) * 100, 2) AS percentage_of_total
FROM category_sales
ORDER BY total_sales DESC;

Customer Report

Purpose:
    - This report consolidates key customer metrics and behaviors

Highlights:
    1. Gathers essential fields such as names, ages, and transaction details.
	2. Segments customers into categories (VIP, Regular, New) and age groups.
    3. Aggregates customer-level metrics:
	   - total orders
	   - total sales
	   - total quantity purchased
	   - total products
	   - lifespan (in months)
    4. Calculates valuable KPIs:
	    - recency (months since last order)
		- average order value
		- average monthly spend
===============================================================================

CREATE VIEW gold.report_customers AS

WITH base_query AS(
/*---------------------------------------------------------------------------
1) Base Query: Retrieves core columns from tables
---------------------------------------------------------------------------*/
SELECT
f.order_number,
f.product_key,
f.order_date,
f.sales_amount,
f.quantity,
c.customer_key,
c.customer_number,
CONCAT(c.first_name, ' ', c.last_name) AS customer_name,
DATEDIFF(year, c.birthdate, GETDATE()) age
FROM DataWarehouseAnalytic..[gold.fact_sales] f
LEFT JOIN DataWarehouseAnalytic..[gold.dim_customers] c
ON c.customer_key = f.customer_key
WHERE order_date IS NOT NULL)

, customer_aggregation AS (
/*---------------------------------------------------------------------------
2) Customer Aggregations: Summarizes key metrics at the customer level
---------------------------------------------------------------------------*/
SELECT 
	customer_key,
	customer_number,
	customer_name,
	age,
	COUNT(DISTINCT order_number) AS total_orders,
	SUM(sales_amount) AS total_sales,
	SUM(quantity) AS total_quantity,
	COUNT(DISTINCT product_key) AS total_products,
	MAX(order_date) AS last_order_date,
	DATEDIFF(month, MIN(order_date), MAX(order_date)) AS lifespan
FROM base_query
GROUP BY 
	customer_key,
	customer_number,
	customer_name,
	age
)
SELECT
customer_key,
customer_number,
customer_name,
age,
CASE 
	 WHEN age < 20 THEN 'Under 20'
	 WHEN age between 20 and 29 THEN '20-29'
	 WHEN age between 30 and 39 THEN '30-39'
	 WHEN age between 40 and 49 THEN '40-49'
	 ELSE '50 and above'
END AS age_group,
CASE 
    WHEN lifespan >= 12 AND total_sales > 5000 THEN 'VIP'
    WHEN lifespan >= 12 AND total_sales <= 5000 THEN 'Regular'
    ELSE 'New'
END AS customer_segment,
last_order_date,
DATEDIFF(month, last_order_date, GETDATE()) AS recency,
total_orders,
total_sales,
total_quantity,
total_products
lifespan,
-- Compuate average order value (AVO)
CASE WHEN total_sales = 0 THEN 0
	 ELSE total_sales / total_orders
END AS avg_order_value,
-- Compuate average monthly spend
CASE WHEN lifespan = 0 THEN total_sales
     ELSE total_sales / lifespan
END AS avg_monthly_spend
FROM customer_aggregation

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
Day 1 Blinkit SQL Project		Day 1 Blinkit SQL Project
Day 10 Electric Vehicle Data Analysis		Day 10 Electric Vehicle Data Analysis
Day 11 Car Pricing Analysis Python		Day 11 Car Pricing Analysis Python
Day 12 Diwali Sales Analysis		Day 12 Diwali Sales Analysis
Day 13 Laptop Pricing Dataset		Day 13 Laptop Pricing Dataset
Day 14 Model Evaluation and Refinement Automobile Pricing		Day 14 Model Evaluation and Refinement Automobile Pricing
Day 15 Hearth Failure Prediction		Day 15 Hearth Failure Prediction
Day 16 Target Sales Analysis		Day 16 Target Sales Analysis
Day 17 Coffee Sales Analysis		Day 17 Coffee Sales Analysis
Day 18 Bank Loan Report Excel		Day 18 Bank Loan Report Excel
Day 19 CarDekho Price Prediction		Day 19 CarDekho Price Prediction
Day 2 IMDB Movie Analysis Python Project		Day 2 IMDB Movie Analysis Python Project
Day 20 HR Dashboard Tableau		Day 20 HR Dashboard Tableau
Day 21 Spotify Data Analysis using SQL		Day 21 Spotify Data Analysis using SQL
Day 22 Black Friday Sales Prediction		Day 22 Black Friday Sales Prediction
Day 23 IPL Game Analysis		Day 23 IPL Game Analysis
Day 24 Scraping Data from websites		Day 24 Scraping Data from websites
Day 25 Credit Card Transaction Report		Day 25 Credit Card Transaction Report
Day 26 OCD Dataset		Day 26 OCD Dataset
Day 27 Book Recommendation System		Day 27 Book Recommendation System
Day 28 Tableau HR Dashboard		Day 28 Tableau HR Dashboard
Day 29 Advance SQL Analytics		Day 29 Advance SQL Analytics
Day 3 Analysis of Highest IT Paying Jobs in India		Day 3 Analysis of Highest IT Paying Jobs in India
Day 30 Retail Store Inventory and Demand Forecasting		Day 30 Retail Store Inventory and Demand Forecasting
Day 4 Understanding of Excel		Day 4 Understanding of Excel
Day 5 World Minning Commodities		Day 5 World Minning Commodities
Day 6 Store Sales Analysis using Excel		Day 6 Store Sales Analysis using Excel
Day 7 Pizza Sales Analysis		Day 7 Pizza Sales Analysis
Day 8 Teco Customer Churn Anlaysis Python		Day 8 Teco Customer Churn Anlaysis Python
Day 9 BlinkIt Grocery Sales Excel		Day 9 BlinkIt Grocery Sales Excel
README.md		README.md

Aayush-Basnet/31-Day-of-Data-Analytic-Project

Folders and files

Latest commit

History

Repository files navigation

31-Day-of-Data-Analytic-Project

Day 1: BlinkIt E-commerce Sales Analysis

Features

Objectives

Dataset Schema

Day 2: IMDB Movie Analysis

Dataset

Objective

Task Covered

Day 3: Analyis of Highest Paying IT Jobs in India

Dataset

Objective

Task Covered

Day 4: Understanding of Excel

Day 5: World Mining Commodities

Objective

Dataset

Day 6: Vrinda Store Sales Anlaysis (Excel & SQL)

Dataset

Objective

Steps and Method:

Key Insight

Day 7: Pizza Sales Analysis (PowerBI & SQL)

Dataset

Key Highlights:

Insights:

Day 8: Techno Customer Churn Analysis(Python)

Objective:

Dataset

Key Insights & Findings:

Recommendations:

Day 9: Blinkit Grocery Store Sales Analysis (Excel)

Key Insights and Recommendations

Day 10: Electric Vehicle Data Analysis(tableau)

Dataset

Key Insights and Market Trends

Recommendations

Day 11: Automobiles Pricing Analysis (Python: Data Wrangling & Model Development)

Part 1: Data Wrangling

Objectives:

Table of Contents

Part 2: Model Development

Objectives:

Table of Contents

Day 12: Diwali Sales Report (Excel, Python, Power BI, SQL)

Key Recommendations:

Day 15: Heart Failure Prediction

Day 17: Coffee Sales Report(Power BI, Excel, & SQL)

Project Task:

Tools used:

Dataset:

Key Takeaways:

Conclusion:

Day 18: Bank Loan Report (Excel & Power BI)

Dashboard Summary

Dataset

Key Highlights

Loan Performance Analysis

Areas of Improvement & Recommendations

Day 20: HR Analytic Dashboard (Tableau & Power BI)

Day 21: Spotify Data Analysis (SQL)

15 Practice Questions

Easy Level

Medium Level

Advanced Level

Day 25: Credit Card Transaction Report (Power BI)

Steps:

Week-over-Week Changes: 📅

Customer Report

Year-to-Date Overview: 💻

Insights: Revenue Analysis: 💠 🌻

Customer Details

Overview

Income Report

Day 26 OCD Health Dataset Analysis (SQL, Power BI)

Packages