Pandas: Sales Analysis with Promotions

SCENARIO

The Chief Marketing Officer (CMO) is interested in understanding how the sales of different product families are affected by promotions originating from a specific source (e.g., "Facebook promotions").

You are given three hypothetical Pandas DataFrames:

  • sales_df: Contains sales transactions (product_id, units_sold, promotion_id).
  • products_df: Contains product information (product_id, product_family).
  • promotions_df: Contains promotion details (promotion_id, source). For this problem, we're interested in promotions where source == 'facebook'.

 

Outline the steps using Pandas to:

  1. Merge the necessary DataFrames to link sales to product families and identify promotion sources.
  2. Create a boolean column to identify sales associated with a "valid Facebook promotion" (i.e., promotion_id is present in the Facebook promotions and is not NA/null).
  3. Group the data by product_family and calculate:
    • Total units sold for sales with a valid Facebook promotion.
    • Total units sold for all sales (regardless of promotion).
  4. Calculate the percentage of units sold with valid Facebook promotions for each product family.
  5. Ensure any potential NaN values resulting from aggregations (if a product family had no promotional sales, for instance) are handled appropriately (e.g., filled with 0).

 

 

Nerchuko Academy · Free DS Interview Prep