Consecutive Day User Activity
Write a PostgreSQL query to identify users who were active (made a post or comment) on at least two consecutive days. Display the user_id and username of these users. Each user should appear only once in the output.
Related Concepts
Hint
This problem involves identifying sequences of dates.
- First, get distinct activity dates for each user. A user is active on a day if they have any activity. Use
SELECT DISTINCT user_id, activity_date FROM Activities. - Use the
LAG()window function to get the previous activity date for each user. Partition byuser_idand order byactivity_date. - Calculate the difference in days between the current activity date and the previous activity date. In PostgreSQL, subtracting two dates gives an integer representing the number of days.
- If this difference is exactly 1, it means the user was active on consecutive days.
- Select the
user_ids for whom such a 1-day difference exists. - Use
DISTINCTor join with theUserstable to get the username and ensure each user appears only once.
Solution (PostgreSQL)
Imagine tracking when users log in to Facebook:
We want to find users who logged in (or posted/commented) on one day, and then *also* on the very next day.
- First, for each user, we list all the unique days they were active. If someone posts 10 times on Jan 5th, it's still just one "active day" for them (Jan 5th).
- Then, for each user and each of their active days, we look at their *previous* active day.
- If the current active day is exactly one day after their previous active day (e.g., current is Jan 6th and previous was Jan 5th), then bingo! That's a consecutive day activity.
- We then list all users who had at least one such consecutive day pair.
PostgreSQL Query (using Window Functions)
-- Find users active on consecutive days
WITH UserDailyActivity AS (
-- First, get distinct days a user was active
SELECT DISTINCT
user_id,
activity_date
FROM
Activities
),
LaggedActivity AS (
-- Then, for each activity day, find the previous activity day for that user
SELECT
user_id,
activity_date,
LAG(activity_date, 1) OVER (
PARTITION BY user_id
ORDER BY activity_date
) AS prev_activity_date
FROM
UserDailyActivity
)
-- Finally, select users who have a next day activity
SELECT DISTINCT
u.user_id,
u.username
FROM
Users u
JOIN
LaggedActivity la ON u.user_id = la.user_id
WHERE
la.prev_activity_date IS NOT NULL
AND (la.activity_date - la.prev_activity_date) = 1 -- Difference of 1 day
ORDER BY
u.user_id;
Explanation of the PostgreSQL Query:
UserDailyActivity CTE:
- Purpose: To get a clean list of unique days each user was active. The problem states, "Multiple activities on the same day count as a single day of activity."
SELECT DISTINCT user_id, activity_date FROM Activities: This ensures that for each user, each date they were active appears only once, regardless of how many posts or comments they made on that day.
LaggedActivity CTE:
- Purpose: For each user's unique activity day, this CTE finds the date of their immediately preceding activity day.
LAG(activity_date, 1) OVER (PARTITION BY user_id ORDER BY activity_date) AS prev_activity_date: This is a window function.PARTITION BY user_id: It processes each user's activity independently.ORDER BY activity_date: Within each user's activity, it orders their active days chronologically.LAG(activity_date, 1): For the current row (an active day), it fetches theactivity_datefrom the previous row (the previous active day for that user). If there's no previous row (i.e., it's the user's first recorded active day),LAGreturnsNULL.
SELECT Statement:
- Purpose: To identify users who had at least one instance of activity on consecutive days and retrieve their details.
SELECT DISTINCT u.user_id, u.username: Selects the user ID and username.DISTINCTis crucial here to ensure each user appears only once in the final output, even if they had multiple pairs of consecutive active days.FROM Users u JOIN LaggedActivity la ON u.user_id = la.user_id: Joins with theUserstable to get theusernameand connects to ourLaggedActivityCTE.WHERE la.prev_activity_date IS NOT NULL: Ensures we are only looking at rows where there *was* a previous activity day (so we can calculate a difference).AND (la.activity_date - la.prev_activity_date) = 1: This is the core condition for consecutive days. In PostgreSQL, subtracting oneDATEfrom another results in an integer representing the number of days. If this difference is 1, the days were consecutive.ORDER BY u.user_id: Orders the final output by user ID for consistency.
Handling Date Differences: The expression (la.activity_date - la.prev_activity_date) = 1 is specific to PostgreSQL for finding a 1-day difference between two DATE type columns. Other SQL dialects might require functions like DATEDIFF(day, prev_activity_date, activity_date) = 1 (SQL Server) or similar.
This problem is a good example of how window functions can simplify complex sequential analysis that would otherwise require more complicated self-joins.
Your Turn! What are your thoughts on this problem or alternative ways to solve it? Share your own SQL query attempts or insights in the comments below!