Average Time to First Purchase
MODERATE
Write a PostgreSQL query to calculate the average number of days between a customer's registration date and their first completed purchase date. Only include customers who have made at least one completed purchase. The final result should be a single value rounded to 2 decimal places.
Related Concepts
Window Functions (ROW_NUMBER(), RANK(), MIN() OVER()) Date Calculations (Subtraction, DATEDIFF-like behavior) JOINs (INNER JOIN, LEFT JOIN) Aggregation (AVG, COUNT) Common Table Expressions (CTEs) Filtering (WHERE clause) Rounding (ROUND) Subqueries
Hint
- Identify First Completed Purchase:
- Filter the
Orderstable fororder_status = 'completed'. - Use a window function like
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date ASC)to rank orders for each customer. The one with rank 1 is the first. - Alternatively, group by
customer_idand find theMIN(order_date)from completed orders.
- Filter the
- Calculate Time Difference:
- Create a CTE (let's call it
FirstPurchases) that containscustomer_idand theirfirst_purchase_date(from completed orders only). - Join this CTE with the
Customerstable oncustomer_idto get theregistration_date. AnINNER JOINhere is appropriate as we only care about customers who made a purchase. - Subtract
registration_datefromfirst_purchase_date. In PostgreSQL, subtracting two dates directly gives an integer representing the number of days.
- Create a CTE (let's call it
- Average the Results:
- In an outer query (or another CTE), calculate the
AVG()of these day differences. - Remember to cast the result of the average to a numeric type that supports decimals (e.g.,
DECIMALorNUMERIC) before rounding, or ensure the division for average happens with decimal numbers. PostgreSQL'sAVG()on integers might return a decimal, but it's good practice to be explicit. - Use
ROUND(average_value, 2)for the final output.
- In an outer query (or another CTE), calculate the
- Structure with CTEs: Using Common Table Expressions will make the logic clearer: one CTE for first purchase dates, another for calculating individual time differences, and then the final aggregation.
Solution Approach (PostgreSQL)
Imagine tracking how quickly new Shopify store sign-ups make their first buy:
- First, for every customer, you need to find the date of their very first completed order. If they made many orders, you only care about the earliest one that wasn't cancelled.
- Once you have this "first completed purchase date" for a customer, you also look up their "registration date" (when they signed up).
- Then, you calculate the difference in days: (First Completed Purchase Date) - (Registration Date). This tells you how many days it took that specific customer to buy something.
- You do this for all customers who have actually made a completed purchase. Customers who only signed up but never bought anything (or only had cancelled orders) are ignored for this calculation.
- Finally, you take all these "days to first purchase" values and calculate their average. This average is then rounded to two decimal places.
PostgreSQL Query (using CTEs)
-- Step 1: Find the first completed purchase date for each customer
WITH first_completed_purchases AS (
SELECT
customer_id,
MIN(order_date) AS first_purchase_date
FROM
Orders
WHERE
order_status = 'completed'
GROUP BY
customer_id
),
-- Step 2: Calculate the number of days from registration to first purchase
customer_days_to_first_purchase AS (
SELECT
c.customer_id,
(fp.first_purchase_date - c.registration_date) AS days_to_purchase -- Date subtraction in PostgreSQL gives days
FROM
Customers c
JOIN
first_completed_purchases fp ON c.customer_id = fp.customer_id
)
-- Step 3: Calculate the average days and round it
SELECT
ROUND(AVG(days_to_purchase), 2) AS avg_days_to_first_purchase
FROM
customer_days_to_first_purchase;
(This is a structural example based on the problem; a detailed solution would populate this fully.)
Explanation Notes:
1. Finding First Completed Purchase (
first_completed_purchases CTE):
- Filters
Ordersfororder_status = 'completed'. - Groups by
customer_idand usesMIN(order_date)to find the earliest completed order date for each customer. - Alternatively, one could use
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date ASC)on the filtered 'completed' orders and then select rows where the rank is 1.
2. Calculating Days to Purchase (
customer_days_to_first_purchase CTE):
INNER JOINs theCustomerstable with thefirst_completed_purchasesCTE. This ensures only customers who have made a completed purchase are included.- Subtracts
registration_datefromfirst_purchase_date. In PostgreSQL, subtracting one date from another directly yields an integer representing the number of days.
3. Final Aggregation:
- Calculates the
AVG(days_to_purchase)from the second CTE. - Rounds the result to 2 decimal places using
ROUND(). PostgreSQL'sAVGfunction on integers will produce a numeric/decimal type, so direct rounding is fine.
Important Considerations for a Full Solution:
- Join Type: An
INNER JOINbetweenCustomersand the first purchase data (or a subquery/CTE deriving it) is appropriate because the problem asks to "Only include customers who have made at least one purchase." - Date Subtraction: PostgreSQL makes date subtraction straightforward (
date1 - date2gives days). Other SQL dialects might require functions likeDATEDIFF(). - Filtering for 'completed': This is a critical business rule and must be applied early, ideally before determining the "first" order.
Take it Further! What are your thoughts on this problem or alternative ways to solve it? Share your own SQL query attempts or insights in the comments below!