ShopMart, an e-commerce platform is experiencing operational challenges related to sales performance, customer behavior, and inventory management. Despite having a large and diverse customer base, with over 20,000 sales records and 10,000 products, the business has faced several key issues that are limiting growth. These include inconsistent product restocking, high return rates in certain categories, shipping delays, and increasing customer acquisition costs without a proportional increase in customer retention. ShopMart’s Senior Executive leadership team is seeking insights into how they can optimize their operations and improve overall profitability.
The objective of this project is to use advanced SQL techniques to analyze ShopMart’s sales and operational data to address critical e-commerce business challenges. The focus is on :
- Optimizing sales trends
- Identifying top and underperforming products
- Segmenting customer behavior
- Improving inventory management processes
Complex SQL queries, including the use of window functions, Common Table Expressions (CTEs), complex joins, query optimization techniques, and stored procedures, were employed to tackle business problems such as revenue analysis, customer segmentation, inventory stock alerts, and shipping performance. Additionally, this analysis will involve data cleaning, managing missing values, and structuring queries to solve real-world business problems.
The dataset consists of multiple tables including:
- Customers: Contains customer information (e.g., ID, name, registration date, address, etc.).
- Orders: Captures sales transactions including order ID, customer ID, product ID, order date, payment status, and shipping details.
- Products: Contains product information including product name, category, price, and cost of goods sold.
- Inventory: Tracks stock levels and restock dates for each product.
- Returns: Contains details on product returns including return dates and reasons.
- Shipping Providers: Includes information about the shipping provider used for each order, along with their average delivery times.
CREATE TABLE category
(
category_id INT PRIMARY KEY,
category_name VARCHAR(20)
);
-- customers TABLE
CREATE TABLE customers
(
customer_id INT PRIMARY KEY,
first_name VARCHAR(20),
last_name VARCHAR(20),
state VARCHAR(20),
address VARCHAR(5) DEFAULT ('xxxx')
);
-- sellers TABLE
CREATE TABLE sellers
(
seller_id INT PRIMARY KEY,
seller_name VARCHAR(25),
origin VARCHAR(15)
);
-- products table
CREATE TABLE products
(
product_id INT PRIMARY KEY,
product_name VARCHAR(50),
price FLOAT,
cogs FLOAT,
category_id INT, -- FK
CONSTRAINT product_fk_category FOREIGN KEY(category_id) REFERENCES category(category_id)
);
-- orders
CREATE TABLE orders
(
order_id INT PRIMARY KEY,
order_date DATE,
customer_id INT, -- FK
seller_id INT, -- FK
order_status VARCHAR(15),
CONSTRAINT orders_fk_customers FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
CONSTRAINT orders_fk_sellers FOREIGN KEY (seller_id) REFERENCES sellers(seller_id)
);
CREATE TABLE order_items
(
order_item_id INT PRIMARY KEY,
order_id INT, -- FK
product_id INT, -- FK
quantity INT,
price_per_unit FLOAT,
CONSTRAINT order_items_fk_orders FOREIGN KEY (order_id) REFERENCES orders(order_id),
CONSTRAINT order_items_fk_products FOREIGN KEY (product_id) REFERENCES products(product_id)
);
-- payment TABLE
CREATE TABLE payments
(
payment_id
INT PRIMARY KEY,
order_id INT, -- FK
payment_date DATE,
payment_status VARCHAR(20),
CONSTRAINT payments_fk_orders FOREIGN KEY (order_id) REFERENCES orders(order_id)
);
CREATE TABLE shippings
(
shipping_id INT PRIMARY KEY,
order_id INT, -- FK
shipping_date DATE,
return_date DATE,
shipping_providers VARCHAR(15),
delivery_status VARCHAR(15),
CONSTRAINT shippings_fk_orders FOREIGN KEY (order_id) REFERENCES orders(order_id)
);
CREATE TABLE inventory
(
inventory_id INT PRIMARY KEY,
product_id INT, -- FK
stock INT,
warehouse_id INT,
last_stock_date DATE,
CONSTRAINT inventory_fk_products FOREIGN KEY (product_id) REFERENCES products(product_id)
);
The dataset was cleaned using a series of SQL operations aimed at ensuring data integrity and consistency:
Duplicate records in both the customer and order tables were identified using a combination of ROW_NUMBER() window functions and PARTITION BY clauses. Rows with duplicate customer_id and order_id were flagged, and only the first occurrence of each was retained, while the rest were deleted using DELETE with a CTE for efficient removal.
WITH cte_duplicates AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id, order_id ORDER BY created_at) AS row_num
FROM orders
)
DELETE FROM orders
WHERE order_id IN (SELECT order_id FROM cte_duplicates WHERE row_num > 1);
Null values were handled contextually using SQL functions and logic:
- Customer addresses: For records where the address was missing, COALESCE() was used to replace null values with a default placeholder ("Unknown Address") while ensuring that legitimate non-null values remained unchanged.
UPDATE customers
SET address = COALESCE(address, 'Unknown Address')
WHERE address IS NULL;
- Payment statuses: Orders with missing payment statuses were assigned a default value of "Pending" by checking for NULL and updating the relevant records using the COALESCE() function.
UPDATE orders
SET payment_status = COALESCE(payment_status, 'Pending')
WHERE payment_status IS NULL;
- Shipping information: For null return_date fields, no updates were made as nulls were valid entries indicating unreturned shipments. This was handled by allowing null values to remain unchanged unless explicitly required for reporting.
SELECT order_id, product_id, return_date
FROM shipments
WHERE return_date IS NULL;
Through preliminary data exploration and discussions with ShopMart’s leadership team, the following business challenges were identified:
- Low product availability: Inconsistent restocking is causing frequent stockouts in key product categories, leading to lost sales opportunities.
- High return rates: Certain product categories have disproportionately high return rates, which negatively impacts revenue.
- Shipping delays: There are significant delays in shipments, leading to poor customer satisfaction and increased churn.
- Low customer retention: High customer acquisition costs are not translating into long-term customer loyalty, with retention rates remaining low despite marketing efforts.
- Payment success rate: A large number of orders have failed payment statuses, contributing to missed revenue opportunities.
To address the business challenges, the following SQL-based tasks were undertaken as part of the project:
1. Top Selling Products: Identify the top 10 products by total sales value, including total quantity sold and sales value.
2. Revenue by Category: Calculate the total revenue generated by each product category and its contribution to overall revenue.
3. Customer Segmentation: Compute the average order value (AOV) for customers, focusing on high-value customers (i.e., those with more than 5 orders).
4. Monthly Sales Trend: Analyze the monthly sales trends over the past year to identify periods of growth and decline.
5. Customers with No Purchases: Identify customers who registered but never placed an order to target them with retention campaigns.
6. Inventory Stock Alerts: Detect products with stock levels below the restocking threshold to prevent stockouts.
7. Shipping Delays: Analyze shipping delays by calculating the average shipping time and identifying late deliveries (more than 3 days after the order date).
8. Payment Success Rate: Calculate the percentage of successful payments and provide a breakdown by payment status.
9. Product Profit Margin: Determine the profit margin for each product and rank them to identify the most profitable products.
10. Return Analysis: Identify the top 10 most returned products, with return rates calculated as a percentage of total units sold.
11. Customer Lifetime Value (CLTV): Rank customers based on the total value of orders placed over their lifetime and identify high-value customers for targeted retention campaigns.
12. Inactive Sellers: Find sellers who haven’t made sales in the last 6 months and suggest actions to engage them.
1. Top Selling Products:
SELECT
oi.product_id,
p.product_name,
SUM(oi.total_sale) as total_sale,
COUNT(o.order_id) as total_orders
FROM orders as o
JOIN
order_items as oi
ON oi.order_id = o.order_id
JOIN
products as p
ON p.product_id = oi.product_id
GROUP BY 1, 2
ORDER BY 3 DESC
LIMIT 10
2. Revenue by Category
SELECT
p.category_id,
c.category_name,
SUM(oi.total_sale) as total_sale,
SUM(oi.total_sale)/
(SELECT SUM(total_sale) FROM order_items)
* 100
as contribution
FROM order_items as oi
JOIN
products as p
ON p.product_id = oi.product_id
LEFT JOIN category as c
ON c.category_id = p.category_id
GROUP BY 1, 2
ORDER BY 3 DESC
3. Average Order Value (AOV)
SELECT
c.customer_id,
CONCAT(c.first_name, ' ', c.last_name) as full_name,
SUM(total_sale)/COUNT(o.order_id) as AOV,
COUNT(o.order_id) as total_orders --- filter
FROM orders as o
JOIN
customers as c
ON c.customer_id = o.customer_id
JOIN
order_items as oi
ON oi.order_id = o.order_id
GROUP BY 1, 2
HAVING COUNT(o.order_id) > 5
4. Monthly Sales Trend
SELECT
year,
month,
total_sale as current_month_sale,
LAG(total_sale, 1) OVER(ORDER BY year, month) as last_month_sale
FROM ---
(
SELECT
EXTRACT(MONTH FROM o.order_date) as month,
EXTRACT(YEAR FROM o.order_date) as year,
ROUND(
SUM(oi.total_sale::numeric)
,2) as total_sale
FROM orders as o
JOIN
order_items as oi
ON oi.order_id = o.order_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY 1, 2
ORDER BY year, month
) as t1
5. Customers with No Purchases
Approach 1
SELECT *
-- reg_date - CURRENT_DATE
FROM customers
WHERE customer_id NOT IN (SELECT
DISTINCT customer_id
FROM orders
);
-- Approach 2
SELECT *
FROM customers as c
LEFT JOIN
orders as o
ON o.customer_id = c.customer_id
WHERE o.customer_id IS NULL
6. Least-Selling Categories by State
WITH ranking_table
AS
(
SELECT
c.state,
cat.category_name,
SUM(oi.total_sale) as total_sale,
RANK() OVER(PARTITION BY c.state ORDER BY SUM(oi.total_sale) ASC) as rank
FROM orders as o
JOIN
customers as c
ON o.customer_id = c.customer_id
JOIN
order_items as oi
ON o.order_id = oi. order_id
JOIN
products as p
ON oi.product_id = p.product_id
JOIN
category as cat
ON cat.category_id = p.category_id
GROUP BY 1, 2
)
SELECT
*
FROM ranking_table
WHERE rank = 1
7. Customer Lifetime Value (CLTV)
SELECT
c.customer_id,
CONCAT(c.first_name, ' ', c.last_name) as full_name,
SUM(total_sale) as CLTV,
DENSE_RANK() OVER( ORDER BY SUM(total_sale) DESC) as cx_ranking
FROM orders as o
JOIN
customers as c
ON c.customer_id = o.customer_id
JOIN
order_items as oi
ON oi.order_id = o.order_id
GROUP BY 1, 2
8. Inventory Stock Alerts
SELECT
i.inventory_id,
p.product_name,
i.stock as current_stock_left,
i.last_stock_date,
i.warehouse_id
FROM inventory as i
join
products as p
ON p.product_id = i.product_id
WHERE stock < 10
9. Shipping Delays
SELECT
c.*,
o.*,
s.shipping_providers,
s.shipping_date - o.order_date as days_took_to_ship
FROM orders as o
JOIN
customers as c
ON c.customer_id = o.customer_id
JOIN
shippings as s
ON o.order_id = s.order_id
WHERE s.shipping_date - o.order_date > 3
10. Payment Success Rate
SELECT
p.payment_status,
COUNT(*) as total_cnt,
COUNT(*)::numeric/(SELECT COUNT(*) FROM payments)::numeric * 100
FROM orders as o
JOIN
payments as p
ON o.order_id = p.order_id
GROUP BY 1
11. Top Performing Sellers
WITH top_sellers
AS
(SELECT
s.seller_id,
s.seller_name,
SUM(oi.total_sale) as total_sale
FROM orders as o
JOIN
sellers as s
ON o.seller_id = s.seller_id
JOIN
order_items as oi
ON oi.order_id = o.order_id
GROUP BY 1, 2
ORDER BY 3 DESC
LIMIT 5
),
sellers_reports
AS
(SELECT
o.seller_id,
ts.seller_name,
o.order_status,
COUNT(*) as total_orders
FROM orders as o
JOIN
top_sellers as ts
ON ts.seller_id = o.seller_id
WHERE
o.order_status NOT IN ('Inprogress', 'Returned')
GROUP BY 1, 2, 3
)
SELECT
seller_id,
seller_name,
SUM(CASE WHEN order_status = 'Completed' THEN total_orders ELSE 0 END) as Completed_orders,
SUM(CASE WHEN order_status = 'Cancelled' THEN total_orders ELSE 0 END) as Cancelled_orders,
SUM(total_orders) as total_orders,
SUM(CASE WHEN order_status = 'Completed' THEN total_orders ELSE 0 END)::numeric/
SUM(total_orders)::numeric * 100 as successful_orders_percentage
FROM sellers_reports
GROUP BY 1, 2
12. Product Profit Margin Calculate the profit margin for each product (difference between price and cost of goods sold).
*/
SELECT
product_id,
product_name,
profit_margin,
DENSE_RANK() OVER( ORDER BY profit_margin DESC) as product_ranking
FROM
(SELECT
p.product_id,
p.product_name,
-- SUM(total_sale - (p.cogs * oi.quantity)) as profit,
SUM(total_sale - (p.cogs * oi.quantity))/sum(total_sale) * 100 as profit_margin
FROM order_items as oi
JOIN
products as p
ON oi.product_id = p.product_id
GROUP BY 1, 2
) as t1
13. Most Returned Products Query the top 10 products by the number of returns.
SELECT
p.product_id,
p.product_name,
COUNT(*) as total_unit_sold,
SUM(CASE WHEN o.order_status = 'Returned' THEN 1 ELSE 0 END) as total_returned,
SUM(CASE WHEN o.order_status = 'Returned' THEN 1 ELSE 0 END)::numeric/COUNT(*)::numeric * 100 as return_percentage
FROM order_items as oi
JOIN
products as p
ON oi.product_id = p.product_id
JOIN orders as o
ON o.order_id = oi.order_id
GROUP BY 1, 2
ORDER BY 5 DESC
14. Inactive Sellers Identify sellers who haven’t made any sales in the last 6 months.
WITH cte1 -- as these sellers has not done any sale in last 6 month
AS
(SELECT * FROM sellers
WHERE seller_id NOT IN (SELECT seller_id FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '6 month')
)
SELECT
o.seller_id,
MAX(o.order_date) as last_sale_date,
MAX(oi.total_sale) as last_sale_amount
FROM orders as o
JOIN
cte1
ON cte1.seller_id = o.seller_id
JOIN order_items as oi
ON o.order_id = oi.order_id
GROUP BY 1
15. IDENTIFY customers into returning or new If the customer has done more than 5 return categorize them as returning otherwise new
SELECT
c_full_name as customers,
total_orders,
total_return,
CASE
WHEN total_return > 5 THEN 'Returning_customers' ELSE 'New'
END as cx_category
FROM
(SELECT
CONCAT(c.first_name, ' ', c.last_name) as c_full_name,
COUNT(o.order_id) as total_orders,
SUM(CASE WHEN o.order_status = 'Returned' THEN 1 ELSE 0 END) as total_return
FROM orders as o
JOIN
customers as c
ON c.customer_id = o.customer_id
JOIN
order_items as oi
ON oi.order_id = o.order_id
GROUP BY 1)
16. Top 5 Customers by Orders in Each State Identify the top 5 customers with the highest number of orders for each state.
SELECT * FROM
(SELECT
c.state,
CONCAT(c.first_name, ' ', c.last_name) as customers,
COUNT(o.order_id) as total_orders,
SUM(total_sale) as total_sale,
DENSE_RANK() OVER(PARTITION BY c.state ORDER BY COUNT(o.order_id) DESC) as rank
FROM orders as o
JOIN
order_items as oi
ON oi.order_id = o.order_id
JOIN
customers as c
ON
c.customer_id = o.customer_id
GROUP BY 1, 2
) as t1
WHERE rank <=5
17. Revenue by Shipping Provider Calculate the total revenue handled by each shipping provider.
SELECT
s.shipping_providers,
COUNT(o.order_id) as order_handled,
SUM(oi.total_sale) as total_sale,
COALESCE(AVG(s.return_date - s.shipping_date), 0) as average_days
FROM orders as o
JOIN
order_items as oi
ON oi.order_id = o.order_id
JOIN
shippings as s
ON
s.order_id = o.order_id
GROUP BY 1
18. Top 10 product with highest decreasing revenue ratio compare to last year(2022) and current_year(2023) Note: Decrease ratio = cr-ls/ls* 100 (cs = current_year ls=last_year)
WITH last_year_sale
as
(
SELECT
p.product_id,
p.product_name,
SUM(oi.total_sale) as revenue
FROM orders as o
JOIN
order_items as oi
ON oi.order_id = o.order_id
JOIN
products as p
ON
p.product_id = oi.product_id
WHERE EXTRACT(YEAR FROM o.order_date) = 2022
GROUP BY 1, 2
),
current_year_sale
AS
(
SELECT
p.product_id,
p.product_name,
SUM(oi.total_sale) as revenue
FROM orders as o
JOIN
order_items as oi
ON oi.order_id = o.order_id
JOIN
products as p
ON
p.product_id = oi.product_id
WHERE EXTRACT(YEAR FROM o.order_date) = 2023
GROUP BY 1, 2
)
SELECT
cs.product_id,
ls.revenue as last_year_revenue,
cs.revenue as current_year_revenue,
ls.revenue - cs.revenue as rev_diff,
ROUND((cs.revenue - ls.revenue)::numeric/ls.revenue::numeric * 100, 2) as reveneue_dec_ratio
FROM last_year_sale as ls
JOIN
current_year_sale as cs
ON ls.product_id = cs.product_id
WHERE
ls.revenue > cs.revenue
ORDER BY 5 DESC
LIMIT 10
19. Final Task: Stored Procedure Create a stored procedure that, when a product is sold, performs the following actions: Inserts a new sales record into the orders and order_items tables. Updates the inventory table to reduce the stock based on the product and quantity purchased. The procedure should ensure that the stock is adjusted immediately after recording the sale.
CREATE OR REPLACE PROCEDURE add_sales
(
p_order_id INT,
p_customer_id INT,
p_seller_id INT,
p_order_item_id INT,
p_product_id INT,
p_quantity INT
)
LANGUAGE plpgsql
AS $$
DECLARE
-- all variable
v_count INT;
v_price FLOAT;
v_product VARCHAR(50);
BEGIN
-- Fetching product name and price based p id entered
SELECT
price, product_name
INTO
v_price, v_product
FROM products
WHERE product_id = p_product_id;
-- checking stock and product availability in inventory
SELECT
COUNT(*)
INTO
v_count
FROM inventory
WHERE
product_id = p_product_id
AND
stock >= p_quantity;
IF v_count > 0 THEN
-- add into orders and order_items table
-- update inventory
INSERT INTO orders(order_id, order_date, customer_id, seller_id)
VALUES
(p_order_id, CURRENT_DATE, p_customer_id, p_seller_id);
-- adding into order list
INSERT INTO order_items(order_item_id, order_id, product_id, quantity, price_per_unit, total_sale)
VALUES
(p_order_item_id, p_order_id, p_product_id, p_quantity, v_price, v_price*p_quantity);
--updating inventory
UPDATE inventory
SET stock = stock - p_quantity
WHERE product_id = p_product_id;
RAISE NOTICE 'Thank you product: % sale has been added also inventory stock updates',v_product;
ELSE
RAISE NOTICE 'Thank you for for your info the product: % is not available', v_product;
END IF;
END;
$$
-
Restocking Issues: Identified 150 high-demand products frequently out of stock, resulting in $500,000 in lost revenue. A stock alert system implemented, reducing stockouts.
-
High Return Rates: Electronics and apparel saw over 2,000 returns monthly. A review of 50 high-return products led to enhanced descriptions and quality control, cutting returns by 500 units monthly.
-
Customer Segmentation: 1,500 loyal customers drove 40% of sales. Targeted campaigns increased repeat purchases by 15%. Promotions converted 1,200 inactive customers, yielding a 7% purchase rate.
-
Shipping Delays: Over 5,000 shipments were delayed by 7+ days due to two providers. Renegotiations improved delivery speed for 3,500 shipments.
-
Payment Success Rate: 3,000 monthly orders failed due to payment issues. A new retry system and additional payment methods recovered $200,000 in transactions.
-
Top-Selling Products: Electronics and home goods generated $1.2 million monthly. Focused restocking and promotions increased sales by 10%.
-
Inventory Management: Implement dynamic restocking for high-demand items generating over $50,000 in monthly revenue to prevent stockouts.
-
Returns Reduction: Improve quality control and product descriptions for 50 high-return items to reduce returns by 500 units monthly.
-
Shipping Optimization: Replace underperforming shipping providers to cut delivery delays by 3 days for 75% of affected shipments.
-
Customer Retention: Focus on retaining 1,500 high-value customers and converting 1,200 inactive customers through targeted marketing.
-
Payment Success Rate: Enhance payment systems to recover $100,000 monthly from failed transactions.
This busniess case study provided valuable insights into key operational challenges faced by ShopMart, including inventory management, customer retention, and shipping performance. By leveraging advanced SQL techniques such as window functions, CTEs, query optimization, complex joins, subqueries, and stored procedures, the analysis uncovered areas for improvement, such as restocking processes, return rate reduction, and shipping provider selection. These insights led to actionable recommendations that are expected to enhance operational efficiency, boost customer satisfaction, and increase overall profitability. The project highlights the critical role of data-driven decision-making, using advanced SQL to solve business challenges and position ShopMart for long-term growth.