You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Danny was scrolling through his Instagram feed when something really caught his eye - “80s Retro Styling and Pizza Is The Future!”
Danny was sold on the idea, but he knew that pizza alone was not going to help him get seed funding to expand his new Pizza Empire - so he had one more genius idea to combine with it - he was going to Uberize it - and so Pizza Runner was launched!
Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.
📂 Dataset
Danny has shared with you 6 key datasets for this case study:
runners
View table
The runners table shows the registration_date for each new runner.
runner_id
registration_date
1
1/1/2021
2
1/3/2021
3
1/8/2021
4
1/15/2021
customer_orders
View table
Customer pizza orders are captured in the customer_orders table with 1 row for each individual pizza that is part of the order.
order_id
customer_id
pizza_id
exclusions
extras
order_time
1
101
1
44197.75349537037
2
101
1
44197.79226851852
3
102
1
44198.9940162037
3
102
2
null
44198.9940162037
4
103
1
4
44200.558171296296
4
103
1
4
44200.558171296296
4
103
2
4
44200.558171296296
5
104
1
null
1
44204.87533564815
6
101
2
null
null
44204.877233796295
7
105
2
null
1
44204.88922453704
8
102
1
null
null
44205.99621527778
9
103
1
4
1, 5
44206.47429398148
10
104
1
null
null
44207.77417824074
10
104
1
2, 6
1, 4
44207.77417824074
runner_orders
View table
After each orders are received through the system - they are assigned to a runner - however not all orders are fully completed and can be cancelled by the restaurant or the customer.
The pickup_time is the timestamp at which the runner arrives at the Pizza Runner headquarters to pick up the freshly cooked pizzas.
The distance and duration fields are related to how far and long the runner had to travel to deliver the order to the respective customer.
order_id
runner_id
pickup_time
distance
duration
cancellation
1
1
1/1/2021 18:15
20km
32 minutes
2
1
1/1/2021 19:10
20km
27 minutes
3
1
1/3/2021 0:12
13.4km
20 mins
null
4
2
1/4/2021 13:53
23.4
40
null
5
3
1/8/2021 21:10
10
15
null
6
3
null
null
null
Restaurant Cancellation
7
2
1/8/2020 21:30
25km
25mins
null
8
2
1/10/2020 0:15
23.4 km
15 minute
null
9
2
null
null
null
Customer Cancellation
10
1
1/11/2020 18:50
10km
10minutes
null
pizza_names
View table
pizza_id
pizza_name
1
Meat Lovers
2
Vegetarian
pizza_recipes
View table
Each pizza_id has a standard set of toppings which are used as part of the pizza recipe.
pizza_id
toppings
1
1, 2, 3, 4, 5, 6, 8, 10
2
4, 6, 7, 9, 11, 12
pizza_toppings
View table
This table contains all of the topping_name values with their corresponding topping_id value.
topping_id
topping_name
1
Bacon
2
BBQ Sauce
3
Beef
4
Cheese
5
Chicken
6
Mushrooms
7
Onions
8
Pepperoni
9
Peppers
10
Salami
11
Tomatoes
12
Tomato Sauce
♻️ Data Preprocessing
Data Issues
Data issues in the existing schema include:
customer_orders table
null values entered as text
using both NaN and null values
runner_orders table
null values entered as text
using both NaN and null values
units manually entered in distance and duration columns
Data Cleaning
customer_orders
Converting null and NaN values into blanks '' in exclusions and extras
Blanks indicate that the customer requested no extras/exclusions for the pizza, whereas null values would be ambiguous.
Saving the transformations in a temporary table
We want to avoid permanently changing the raw data via UPDATE commands if possible.
runner_orders
Converting 'null' text values into null values for pickup_time, distance and duration
Extracting only numbers and decimal spaces for the distance and duration columns
Use regular expressions and NULLIF to convert non-numeric entries to null values
Converting blanks, 'null' and NaN into null values for cancellation
SELECTCOUNT(*) AS pizza_count
FROM updated_customer_orders;
pizza_count
14
Q2. How many unique customer orders were made?
SELECTCOUNT (DISTINCT order_id) AS order_count
FROM updated_customer_orders;
order_count
10
Q3. How many successful orders were delivered by each runner?
SELECT
runner_id,
COUNT(order_id) AS successful_orders
FROM updated_runner_orders
WHERE cancellation IS NULLOR cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation')
GROUP BY runner_id
ORDER BY successful_orders DESC;
runner_id
successful_orders
1
4
2
3
3
1
Q4. How many of each type of pizza was delivered?
SELECTpn.pizza_name,
COUNT(co.*) AS pizza_type_count
FROM updated_customer_orders AS co
INNER JOINpizza_runner.pizza_namesAS pn
ONco.pizza_id=pn.pizza_idINNER JOINpizza_runner.runner_ordersAS ro
ONco.order_id=ro.order_idWHERE cancellation IS NULLOR cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation')
GROUP BYpn.pizza_nameORDER BYpn.pizza_name;
OR
SELECTpn.pizza_name,
COUNT(co.*) AS pizza_type_count
FROM updated_customer_orders AS co
INNER JOINpizza_runner.pizza_namesAS pn
ONco.pizza_id=pn.pizza_idWHERE EXISTS (
SELECT1FROM updated_runner_orders AS ro
WHEREro.order_id=co.order_idAND (
ro.cancellation IS NULLORro.cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation')
)
)
GROUP BYpn.pizza_nameORDER BYpn.pizza_name;
pizza_name
pizza_type_count
Meatlovers
9
Vegetarian
3
Q5. How many Vegetarian and Meatlovers were ordered by each customer?
SELECT
customer_id,
SUM(CASE WHEN pizza_id =1 THEN 1 ELSE 0 END) AS meat_lovers,
SUM(CASE WHEN pizza_id =2 THEN 1 ELSE 0 END) AS vegetarian
FROM updated_customer_orders
GROUP BY customer_id;
customer_id
meat_lovers
vegetarian
101
2
1
103
3
1
104
3
0
105
0
1
102
2
1
Q6. What was the maximum number of pizzas delivered in a single order?
SELECTMAX(pizza_count) AS max_count
FROM (
SELECTco.order_id,
COUNT(co.pizza_id) AS pizza_count
FROM updated_customer_orders AS co
INNER JOIN updated_runner_orders AS ro
ONco.order_id=ro.order_idWHEREro.cancellation IS NULLORro.cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation')
GROUP BYco.order_id) AS mycount;
max_count
3
Q7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
SELECTco.customer_id,
SUM (CASE WHEN co.exclusionsIS NOT NULLORco.extrasIS NOT NULL THEN 1 ELSE 0 END) AS changes,
SUM (CASE WHEN co.exclusions IS NULLORco.extras IS NULL THEN 1 ELSE 0 END) AS no_change
FROM updated_customer_orders AS co
INNER JOIN updated_runner_orders AS ro
ONco.order_id=ro.order_idWHEREro.cancellation IS NULLORro.cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation')
GROUP BYco.customer_idORDER BYco.customer_id;
customer_id
changes
no_change
101
0
2
102
0
3
103
3
3
104
2
2
105
1
1
Q8. How many pizzas were delivered that had both exclusions and extras?
SELECTSUM(CASE WHEN co.exclusionsIS NOT NULLANDco.extrasIS NOT NULL THEN 1 ELSE 0 END) as pizza_count
FROM updated_customer_orders AS co
INNER JOIN updated_runner_orders AS ro
ONco.order_id=ro.order_idWHEREro.cancellation IS NULLORro.cancellation NOT IN ('Restaurant Cancellation', 'Customer Cancellation')
pizza_count
1
Q9. What was the total volume of pizzas ordered for each hour of the day?
SELECT
DATE_PART('hour', order_time::TIMESTAMP) AS hour_of_day,
COUNT(*) AS pizza_count
FROM updated_customer_orders
WHERE order_time IS NOT NULLGROUP BY hour_of_day
ORDER BY hour_of_day;
hour_of_day
pizza_count
11
1
12
2
13
3
18
3
19
1
21
3
23
1
Q10. What was the volume of orders for each day of the week?
SELECT
TO_CHAR(order_time, 'Day') AS day_of_week,
COUNT(*) AS pizza_count
FROM updated_customer_orders
GROUP BY
day_of_week,
DATE_PART('dow', order_time)
ORDER BY day_of_week;
day_of_week
pizza_count
Friday
1
Saturday
5
Thursday
3
Wednesday
5
Runner and Customer Experience
Q1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)
WITH runner_signups AS (
SELECT
runner_id,
registration_date,
registration_date - ((registration_date -'2021-01-01') % 7) AS start_of_week
FROMpizza_runner.runners
)
SELECT
start_of_week,
COUNT(runner_id) AS signups
FROM runner_signups
GROUP BY start_of_week
ORDER BY start_of_week;
start_of_week
signups
2021-01-01T00:00:00.000Z
2
2021-01-08T00:00:00.000Z
1
2021-01-15T00:00:00.000Z
1
Q2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
WITH runner_pickups AS (
SELECTro.runner_id,
ro.order_id,
co.order_time,
ro.pickup_time,
(pickup_time - order_time) AS time_to_pickup
FROM updated_runner_orders AS ro
INNER JOIN updated_customer_orders AS co
ONro.order_id=co.order_id
)
SELECT
runner_id,
date_part('minutes', AVG(time_to_pickup)) AS avg_arrival_minutes
FROM runner_pickups
GROUP BY runner_id
ORDER BY runner_id;
runner_id
avg_arrival_minutes
1
-4
2
23
3
10
Q3. Is there any relationship between the number of pizzas and how long the order takes to prepare?
WITH order_count AS (
SELECT
order_id,
order_time,
COUNT(pizza_id) AS pizzas_order_count
FROM updated_customer_orders
GROUP BY order_id, order_time
),
prepare_time AS (
SELECTro.order_id,
co.order_time,
ro.pickup_time,
co.pizzas_order_count,
(pickup_time - order_time) AS time_to_pickup
FROM updated_runner_orders AS ro
INNER JOIN order_count AS co
ONro.order_id=co.order_idWHERE pickup_time IS NOT NULL
)
SELECT
pizzas_order_count,
AVG(time_to_pickup) AS avg_time
FROM prepare_time
GROUP BY pizzas_order_count
ORDER BY pizzas_order_count;
pizzas_order_count
avg_time
1
12
2
-6
3
29
Q4. What was the average distance travelled for each runner?
SELECT
runner_id,
ROUND(AVG(distance), 2) AS avg_distance
FROM updated_runner_orders
GROUP BY runner_id
ORDER BY runner_id;
runner_id
avg_distance
1
15.85
2
23.93
3
10.00
Q5. What was the difference between the longest and shortest delivery times for all orders?
SELECTMAX(duration) -MIN(duration) AS difference
FROM updated_runner_orders;
difference
30
Q6. What was the average speed for each runner for each delivery and do you notice any trend for these values?
WITH order_count AS (
SELECT
order_id,
order_time,
COUNT(pizza_id) AS pizzas_count
FROM updated_customer_orders
GROUP BY
order_id,
order_time
)
SELECTro.order_id,
ro.runner_id,
co.pizzas_count,
ro.distance,
ro.duration,
ROUND(60*ro.distance/ro.duration, 2) AS speed
FROM updated_runner_orders AS ro
INNER JOIN order_count AS co
ONro.order_id=co.order_idWHERE pickup_time IS NOT NULLORDER BY speed DESC
order_id
runner_id
pizzas_count
distance
duration
speed
8
2
1
23.4
15
93.60
7
2
1
25
25
60.00
10
1
2
10
10
60.00
2
1
1
20
27
44.44
3
1
2
13.4
20
40.20
5
3
1
10
15
40.00
1
1
1
20
32
37.50
4
2
3
23.4
40
35.10
Finding:
Orders shown in decreasing order of average speed:
While the fastest order only carried 1 pizza and the slowest order carried 3 pizzas, there is no clear trend that more pizzas slow down the delivery speed of an order.
Q7. What is the successful delivery percentage for each runner?
SELECT
runner_id,
COUNT(pickup_time) as delivered,
COUNT(order_id) AS total,
ROUND(100*COUNT(pickup_time) /COUNT(order_id)) AS delivery_percent
FROM updated_runner_orders
GROUP BY runner_id
ORDER BY runner_id;