Untitled — Python Coding — Nerchuko Academy

Consecutive Monthly Purchases

HARD

You are given a dataset (e.g., a Pandas DataFrame) representing customer transactions, with fields like customer_id, transaction_date, and transaction_amount. Write a Python function to identify customers who have made purchases in three consecutive months.

The function should return a list or set of customer_ids for these customers.

Example Input DataFrame (`transactions_df`):

# Assume transaction_date is already in datetime format or can be converted.
data = {
    'customer_id': ['C1', 'C1', 'C1', 'C1', 'C2', 'C2', 'C1', 'C3', 'C3', 'C3', 'C3'],
    'transaction_date': [
        '2023-01-10', '2023-01-20', '2023-02-05', '2023-03-15', # C1: Jan, Feb, Mar
        '2023-01-01', '2023-03-01',                         # C2: Jan, Mar (not consecutive)
        '2023-05-10',                                     # C1: May (breaks sequence)
        '2023-11-15', '2023-12-20', '2024-01-05', '2024-02-01' # C3: Nov, Dec, Jan (consecutive!) then Feb
    ],
    'transaction_amount': [100, 50, 75, 200, 30, 40, 60, 90, 10, 20, 50]
}
# transactions_df = pd.DataFrame(data)
# transactions_df['transaction_date'] = pd.to_datetime(transactions_df['transaction_date'])

Expected Output: {'C1', 'C3'} (or a list ['C1', 'C3'])

Constraints:

Input is a Pandas DataFrame.
transaction_date column can be parsed into datetime objects.

Function Signature (Python):

import pandas as pd
from typing import List, Set

class Solution:
    def find_consecutive_monthly_purchasers(self, transactions_df: pd.DataFrame) -> Set[str]:
        # Your code here
        pass

Solution: Consecutive Monthly Purchases

This problem requires careful data preparation, grouping, and sequential pattern detection within each group. Using Pandas Period objects can greatly simplify handling month arithmetic across year boundaries.

import pandas as pd
from typing import List, Set

# Sample DataFrame (as provided in the problem)
data = {
    'customer_id': ['C1', 'C1', 'C1', 'C1', 'C2', 'C2', 'C1', 'C3', 'C3', 'C3', 'C3'],
    'transaction_date': [
        '2023-01-10', '2023-01-20', '2023-02-05', '2023-03-15',
        '2023-01-01', '2023-03-01',
        '2023-05-10',
        '2023-11-15', '2023-12-20', '2024-01-05', '2024-02-01'
    ],
    'transaction_amount': [100, 50, 75, 200, 30, 40, 60, 90, 10, 20, 50]
}
transactions_df_initial = pd.DataFrame(data)
transactions_df_initial['transaction_date'] = pd.to_datetime(transactions_df_initial['transaction_date'])

print("Original DataFrame with datetime:\n", transactions_df_initial.head())

Implementation Steps:

class Solution:
    def find_consecutive_monthly_purchasers(self, transactions_df: pd.DataFrame) -> Set[str]:
        if transactions_df.empty:
            return set()

        df = transactions_df.copy()

        # 1. Data Preparation: Ensure transaction_date is datetime and extract month period
        if not pd.api.types.is_datetime64_any_dtype(df['transaction_date']):
            df['transaction_date'] = pd.to_datetime(df['transaction_date'])
        
        # Create a 'month_period' column (e.g., Period('2023-01', 'M'))
        # This handles year-month correctly and allows for easy arithmetic with months.
        df['month_period'] = df['transaction_date'].dt.to_period('M')

        consecutive_purchasers = set()

        # 2. Group by customer_id
        for customer_id, customer_group in df.groupby('customer_id'):
            # 3. Identify Unique Monthly Purchases for the customer, sorted
            unique_purchase_months = sorted(customer_group['month_period'].unique())
            
            # We need at least 3 unique purchase months to check for 3 consecutive
            if len(unique_purchase_months) < 3:
                continue 

            # 4. Check for Consecutive Months
            # Iterate up to the third-to-last month to allow checking for two subsequent months
            for i in range(len(unique_purchase_months) - 2):
                month1 = unique_purchase_months[i]
                month2_expected = month1 + 1 # pd.Period addition handles month/year rollovers
                month3_expected = month1 + 2
                
                # Check if the actual next two unique months match the expected consecutive months
                if (unique_purchase_months[i+1] == month2_expected and 
                    unique_purchase_months[i+2] == month3_expected):
                    consecutive_purchasers.add(customer_id)
                    break # Found 3 consecutive for this customer, no need to check further for them
        
        return consecutive_purchasers

# Example Usage:
# sol = Solution()
# result = sol.find_consecutive_monthly_purchasers(transactions_df_initial.copy())
# print(f"\nCustomers with 3 consecutive monthly purchases: {result}")
# Expected: {'C1', 'C3'}

Complexity Analysis:

Let N be the total number of transactions and C be the number of unique customers. Let M_max be the maximum number of unique purchase months for any single customer.

Converting dates to datetime: O(N)
Converting dates to periods: O(N)
groupby('customer_id'): Typically O(N) as it iterates through the data.
Inside the loop for each customer:
- .unique() on month periods for a customer: Roughly O(M_c) where M_c is transactions for that customer.
- sorted() on unique months: O(M_u log M_u) where M_u is unique months for that customer (M_u <= M_c).
- Inner loop for checking consecutive: O(M_u).

The dominant part after the initial O(N) processing is the groupby and the operations within each group. If M_max is small, the per-customer processing is fast. In the worst case, if one customer has almost all transactions, the sorting of unique months for that customer might be the bottleneck after initial O(N) steps. Overall, it's roughly bounded by O(N + C * M_max log M_max) or often simplified to O(N log M_max) if C is not too large or O(N log N) in a very skewed scenario where M_max is large. If sorting the entire DataFrame by customer and date first (O(N log N)), then iterating, the unique month extraction and check per customer is efficient. Using .dt.to_period('M') is generally efficient.

Space Complexity: O(N) for storing the month_period column if added to the DataFrame. O(C + M_max) for storing unique months per customer in the worst case during iteration, and for the consecutive_purchasers set.

Key Discussion Points for an Interview:

Date Handling: Emphasize the conversion to datetime objects and then to a representation of month (year-month). Using pd.Period('M') is elegant for month arithmetic.
Grouping Strategy: groupby('customer_id') is essential.
Uniqueness and Sorting: For each customer, you need unique, sorted purchase months.
Consecutive Check Logic:
- Explain how you iterate through the sorted unique months.
- The core logic: if month[i+1] == month[i] + 1_period AND month[i+2] == month[i] + 2_periods.
- Mention how pd.Period addition simplifies handling year boundaries automatically. If not using Periods (e.g., (year, month) tuples), you'd need to explain the logic for incrementing months and handling year rollovers (e.g., (2023, 12) -> (2024, 1)).
Efficiency: While a loop within .apply() on groups is shown, for very large datasets, one might explore more vectorized approaches if simple, or acknowledge that the current approach is clear and handles the logic well for typical data sizes. The main cost is often the initial data prep and sorting if not already sorted.

Alternative for Consecutive Check (Advanced): One could calculate the difference between consecutive sorted month periods. If there are two consecutive differences of '1 month', then a 3-month sequence exists.

# Inside the customer group loop, after getting unique_purchase_months:
# if len(unique_purchase_months) >= 3:
#    month_series = pd.Series(unique_purchase_months)
#    diffs = month_series.diff().dt.n # Get difference in terms of periods (months)
#    # Check for two consecutive 1s in the differences
#    for j in range(2, len(diffs)): # Start from the 3rd month (2nd diff)
#        if diffs.iloc[j] == 1 and diffs.iloc[j-1] == 1:
#            consecutive_purchasers.add(customer_id)
#            break

This alternative using .diff() is more Pandas-idiomatic within the group processing.

Consecutive Monthly Purchases

Example Input DataFrame (`transactions_df`):

Constraints:

Function Signature (Python):

Related Python Concepts

Hint

Solution: Consecutive Monthly Purchases

Implementation Steps:

Complexity Analysis:

Consecutive Monthly Purchases

Example Input DataFrame (transactions_df):

Constraints:

Function Signature (Python):

Related Python Concepts

Hint

Solution: Consecutive Monthly Purchases

Implementation Steps:

Complexity Analysis:

Example Input DataFrame (`transactions_df`):