Ask Claude about this
 

Myntra Traditional Wear Demand

Predictive Analytics Demand Forecasting Retail/E-commerce Hard

The Challenge: Festive Demand Forecasting

Myntra needs to predict demand for traditional Telugu clothing (like pattu sarees, dhoti-kurtas, panchaalu, half-sarees/lehenga vonis) during major festivals like Dussehra and Diwali. This is crucial for inventory planning, procurement from weavers/artisans, and marketing. What data sources and modeling approach would you use to build this demand forecasting system?

Initial Thoughts & Clarifications

  • Forecasting Granularity: At what level is the forecast needed? (SKU level, product sub-category like "Kanjeevaram Pattu Saree", broader category like "Pattu Sarees", region-specific, or overall for Telugu clothing).
  • Forecast Horizon: How far out does the prediction need to be? (e.g., 1 month, 3 months, 6 months ahead of the festival to allow for procurement lead times).
  • Definition of "Traditional Telugu Clothing": How are these items specifically tagged or identified in Myntra's catalog? Is there a clear taxonomy?
  • Key Festivals: Dussehra and Diwali are mentioned. Are there other Telugu festivals with significant traditional wear demand (e.g., Ugadi, Sankranti, weddings during specific seasons)?
  • Data Availability (Internal): Historical sales data (units sold, GMV, price, discounts), product attributes (fabric, color, work type, occasion tags), inventory levels, customer demographics (location for Telugu-speaking regions), browsing data (page views, add-to-carts, wishlists for these items), marketing campaign data.
  • Data Availability (External): Festival calendars (specific dates vary yearly), weather patterns (can affect shopping), economic indicators, competitor activity, social media trends related to ethnic fashion, search trends (Google Trends).
  • Impact of Promotions: How do Myntra's own festive promotions influence demand for these items versus organic demand?
  • Long Tail Problem: Traditional wear, especially sarees, can have vast variety and unique pieces. How to handle forecasting for items with sparse sales history?
  • New Product Introduction: How to forecast demand for new traditional designs or collections launched for the festival?
Framework to Consider (Demand Forecasting System):
  1. Define Forecasting Objectives & Scope:
    • What specific items/categories? What granularity (SKU, sub-category, region)? What horizon? What accuracy target (e.g., MAPE, WMAPE)?
  2. Data Collection & Preparation:
    • Gather all relevant internal (sales, product, customer, inventory, marketing) and external (festival dates, economic, social, weather) data.
    • Clean data, handle missing values, outliers.
    • Feature engineering: Create lag features, moving averages, festival indicators, promotion flags, trend/seasonality components.
  3. Exploratory Data Analysis (EDA):
    • Analyze historical sales patterns for traditional Telugu wear. Identify seasonality, trends, impact of past festivals and promotions.
    • Understand regional variations in demand.
  4. Modeling Approach Selection:
    • Time Series Models: ARIMA, SARIMA, Exponential Smoothing (ETS), Prophet (good with holidays and multiple seasonalities). Suitable for aggregate forecasts or high-volume SKUs.
    • Machine Learning Regression Models: XGBoost, LightGBM, Random Forest. Can incorporate a wide range of features (exogenous variables) and capture complex non-linear relationships. Often perform better for SKU-level forecasts if enough data/features.
    • Deep Learning Models (Advanced): LSTMs, Transformers if very long-term dependencies or complex sequence patterns are present and data is abundant.
    • Hierarchical Forecasting: Forecast at different levels (e.g., total Telugu wear -> category -> sub-category -> SKU) and reconcile forecasts for consistency.
    • Consider models for new product forecasting (e.g., based on attribute similarity to existing products, or early signals like page views).
  5. Feature Engineering for Festival Demand:
    • Dummy variables for Dussehra, Diwali, and other relevant Telugu festivals.
    • "Days until festival" / "Days since festival" features.
    • Interaction terms (e.g., festival * promotion_active).
    • Social media buzz scores for specific traditional wear trends leading up to festivals.
  6. Model Training & Validation:
    • Use appropriate time-series cross-validation (e.g., rolling origin validation) to avoid look-ahead bias.
    • Evaluate models using metrics like MAPE, WMAPE (weighted by sales volume/value), RMSE.
    • Compare against naive baselines (e.g., "same as last year's festival").
  7. Deployment & Monitoring:
    • Automate data pipelines, model retraining, and forecast generation.
    • Monitor forecast accuracy continuously. Track forecast bias.
    • Set up alerts for significant deviations or model degradation.
    • Incorporate feedback from planning and merchandising teams.
  8. Addressing Specific Challenges:
    • Long-tail SKUs: May need to forecast at a higher aggregation level or use attribute-based forecasting.
    • Cold Start (New Products): Forecast based on similar products' historical performance, or early signals (page views, pre-orders).

Simulated Conversation

Interviewer: Myntra needs to predict demand for traditional Telugu clothing – items like pattu sarees, dhoti-kurtas, panchaalu, and half-sarees – specifically during major festivals like Dussehra and Diwali. This is critical for our inventory planning and procurement, especially since many of these items have long lead times. What data sources would you leverage, and what modeling approach would you recommend for this demand forecasting system?
Candidate: This is a fascinating demand forecasting challenge, blending time series analysis with the strong influence of cultural events and fashion trends. My approach would be to build a robust system that leverages diverse data sources and a hybrid modeling strategy.

First, I'd need to clarify a few things:

  • Forecasting Granularity: Are we forecasting at the individual SKU level, by sub-category (e.g., "Kanjeevaram Pattu Sarees," "Cotton Dhoti-Kurta sets"), or at a broader "Traditional Telugu Wear" level? And is this for specific regions (e.g., AP/Telangana, Bangalore, US NRIs) or an All-India level for these items?
  • Forecast Horizon: How far in advance do we need these forecasts? For items with long artisan lead times, it could be 3-6 months or even more.
  • Definition: How are "Traditional Telugu Clothing" items consistently identified and tagged in Myntra's catalog?
Assuming we need forecasts at a reasonably granular level (e.g., sub-category and region) with a 3-6 month horizon.

 

Clarifying Questions: Candidate starts by asking crucial clarifying questions about granularity, horizon, and definitions.
Interviewer: Good questions. Let's assume we need forecasts at the sub-category level (e.g., "Silk Pattu Sarees," "Men's Cotton Panchaalu") for key Telugu-speaking regions (AP, Telangana, and major metros with significant Telugu populations like Bangalore, Chennai, Mumbai, key US cities). The horizon is indeed 3-6 months out from the festival. And yes, we have a taxonomy to identify these traditional items. Now, tell me about your data sources.
Candidate: Perfect. Given that, here are the key data sources I'd leverage:

Key Data Sources:

A. Internal Myntra Data (Historical & Real-time):

  1. Sales Data:
    • Historical sales (units sold, GMV) at SKU/sub-category/region/day level for at least the past 3-5 years to capture multiple festival cycles.
    • Price points, discounts applied, and promotional activities during those sales.
  2. Product Attribute Data:
    • Detailed attributes for each item: fabric (silk, cotton, blend), work type (zari, embroidery, print), color, brand, price range, occasion tags (festival, wedding, casual).
    • Newness of product (launch date).
  3. Customer Data (Aggregated & Anonymized):
    • Geographic location of purchases to identify demand from target Telugu-speaking regions.
    • Customer segments (e.g., new vs. repeat, loyalty status) and their purchasing patterns for traditional wear.
  4. Inventory Data:
    • Historical stock levels and out-of-stock instances. This is crucial to understand if past sales were supply-constrained, meaning true demand might have been higher.
  5. Browsing & Engagement Data:
    • Page views, add-to-carts, wishlist additions for these product categories leading up to and during past festivals. These are leading indicators of interest.
    • Search query data on Myntra for terms like "pattu saree," "Dussehra collections," "Diwali dhoti."
  6. Marketing Data:
    • Past marketing campaigns for traditional wear, their spend, reach, and attributed sales lift.

B. External Data Sources:

  1. Festival Calendars:
    • Precise dates for Dussehra, Diwali, and other relevant regional festivals (Ugadi, Sankranti, specific wedding seasons) for past and future years. Note that Dussehra/Diwali dates shift.
  2. Macroeconomic Indicators:
    • For target regions: Disposable income trends, inflation, consumer confidence indices. These can affect discretionary spending on higher-value traditional wear.
  3. Social Media & Search Trends:
    • Google Trends data for relevant keywords ("latest pattu saree designs," "Diwali traditional wear") in target regions.
    • Social media listening for buzz around specific styles, colors, or celebrity endorsements related to traditional Telugu attire leading up to festivals.
  4. Competitor Activity (if obtainable):
    • Major promotions or collection launches by competing ethnic wear brands or e-commerce platforms during festival periods.
  5. Weather Data (Less critical for this category but potentially):
    • Extreme weather events could disrupt shopping or shift preferences (e.g., very hot weather might slightly favor lighter fabrics).

Combining these internal and external data sources will allow us to build a rich feature set for the demand forecasting models.

Comprehensive Data Sources: Candidate lists a wide array of relevant internal and external data, including leading indicators like browsing data and social trends. Crucially mentions inventory data for supply constraints.
Interviewer: That's a very comprehensive list of data sources. Now, let's talk about the modeling approach. Given these data sources and the goal of predicting demand for these traditional items 3-6 months out, especially for Dussehra and Diwali, what specific modeling techniques or overall strategy would you employ? And how would you handle the unique characteristics of festival demand, which can be spiky and influenced by many factors?
Candidate: Predicting spiky, festival-driven demand requires a robust modeling approach that can capture seasonality, trends, and the impact of exogenous factors. I'd advocate for a hybrid modeling strategy, likely combining classical time series methods with machine learning regression models.

Modeling Approach:

1. Baseline Forecasting (Time Series Models):

  • For each sub-category/region combination, if there's sufficient historical sales data (e.g., >2-3 years of daily/weekly sales):
    • SARIMA (Seasonal AutoRegressive Integrated Moving Average): Good for capturing multiple seasonalities (e.g., annual festival cycle, weekly patterns within that).
    • Prophet (by Facebook): Particularly well-suited for handling multiple seasonalities, holiday effects (we can explicitly input Dussehra, Diwali dates), and trend changes. It's also robust to some missing data and outliers.
    • ETS (Error, Trend, Seasonality / Exponential Smoothing): Another strong set of models for univariate time series.
  • These models would provide a baseline forecast based on historical patterns and intrinsic seasonality.

2. Machine Learning Regression Models (To Incorporate Exogenous Variables & Complex Interactions):

  • This would be the primary workhorse for capturing the impact of festivals and other drivers.
    • Target Variable: Units sold (or GMV) per sub-category/region/time period (e.g., weekly).
    • Model Choice:
      • Gradient Boosting Machines (XGBoost, LightGBM): Excellent for tabular data, handle complex interactions well, robust to outliers, and provide feature importance. My primary choice.
      • Random Forest: Also strong, good for capturing non-linearities.
    • Key Features (Feature Engineering is critical here):
      • Time-based: Year, month, week of year, day of week, lag features of sales (e.g., sales last week, sales same week last year).
      • Festival Indicators (Crucial):
        • Binary flags: is_Dussehra_period, is_Diwali_period.
        • days_until_Dussehra, days_until_Diwali (captures build-up). Negative values for post-festival.
        • days_from_festival_peak.
        • Interaction terms: e.g., is_Diwali_period * promotion_active_flag.
      • Product Attributes (for SKU-level or if modeling for new products): Price (normalized), discount depth, fabric type encoded, color trends.
      • Promotional Activity: Flags for active Myntra promotions, average discount level in category.
      • Leading Indicators: Lagged page views, add-to-carts, wishlist counts for the category/region. Lagged Google Search Trends for relevant terms.
      • Macroeconomic & External: Inflation, consumer confidence (lagged), competitor promotion index (if available).
      • Inventory Constraint Proxy: Historical out-of-stock rate for the category (if high, indicates past sales were understated).

3. Hybrid/Ensemble Approach:

  • Combine the strengths of time series and ML models.
    • Option 1 (Two-Stage): Use Prophet/SARIMA to forecast a baseline sales trend and deseasonalize the data. Then, use an XGBoost model to predict the residuals or the uplift on top of this baseline using the exogenous festival/promo features. Final Forecast = Prophet_Baseline + XGBoost_Residuals.
    • Option 2 (Ensemble): Train Prophet/SARIMA and XGBoost models independently and then combine their forecasts using a weighted average, where weights could be learned or based on holdout set performance.

4. Hierarchical Forecasting (for Consistency):

  • Forecasts might be needed at different levels (Total Telugu Wear -> Pattu Sarees -> Kanjeevaram Pattu Sarees -> Specific Region).
  • Use hierarchical forecasting techniques (e.g., Bottom-up, Top-down, Middle-out, or optimal reconciliation methods like MinT) to ensure forecasts across levels are consistent and to leverage information from higher aggregation levels to improve lower-level forecasts (especially for sparse data SKUs).

5. Handling Long-Tail & New Products (Cold Start):

  • Long-Tail SKUs: For items with very sparse sales history, forecast at a higher aggregation level (e.g., sub-category or attribute cluster like "Red Silk Pattu Sarees with Zari work") and then disaggregate based on historical proportions or product attribute similarity.
  • New Products: Forecast demand based on attributes of the new product and the historical performance of similar existing products (attribute-based forecasting). Use early signals like page views, wishlist additions, and "notify me" clicks once the product is listed to adjust initial forecasts.

Model Training & Validation:

  • Time-Series Cross-Validation: Use rolling origin or expanding window cross-validation to simulate real-world forecasting scenarios and avoid look-ahead bias. Test on multiple past festival periods.
  • Metrics: Weighted MAPE (WMAPE, weighted by sales volume or value to prioritize high-impact items), MASE (Mean Absolute Scaled Error), RMSE. Also track forecast bias (consistent over/under-prediction).

This multi-faceted approach allows us to capture baseline trends, specific festival uplifts driven by various features, and address challenges like new products and forecast consistency across hierarchies.

Sophisticated Modeling Strategy: Candidate proposes a robust hybrid approach combining time series and ML regression, discusses hierarchical forecasting, and addresses cold-start/long-tail issues. Feature engineering for festivals is well-considered.
Interviewer: That's a very comprehensive modeling strategy. Let's consider a specific challenge. Traditional wear, especially pattu sarees, often involves unique designs, intricate handwork, and limited stock from specific weavers or artisans. This means many SKUs might be "one-off" or have very short lifecycles. How does your proposed forecasting approach, particularly the reliance on historical sales data for time series or ML models, effectively handle demand prediction for these unique, potentially low-volume, or short-lived SKUs that are still critical for a festive collection?
C: That's a crucial point for categories like pattu sarees where uniqueness and craftsmanship are key. Standard SKU-level forecasting based purely on that SKU's own past sales will indeed fail for one-off items or those with no direct history.

Forecasting for Unique/Low-Volume/Short-Lifecycle SKUs:

My approach would shift from forecasting the exact SKU to forecasting demand for clusters of SKUs with similar attributes or leveraging the attributes themselves to predict demand.

  1. Attribute-Based Demand Modeling:
    • Feature Extraction: For each SKU (even new ones), extract detailed attributes:
      • Fabric (Kanjeevaram, Dharmavaram, Uppada, Banarasi Silk component etc.)
      • Work Type (Zari border, Buttis, Handloom, Kalamkari print, Embroidery type)
      • Color (Primary, Secondary, Border Color)
      • Price Tier (e.g., <₹10k, ₹10-25k, ₹25k+)
      • Design Style (Traditional, Contemporary, Bridal)
      • Weaver/Artisan Cluster (if known and significant)
      • Occasion Tag (e.g., Dussehra Special, Wedding Saree)
    • Model Demand as a Function of Attributes: Train a model where the target is sales of past SKUs, and features are their attributes, alongside time-based and festival-related features.
      Sales_SKU_t = f(Attributes_SKU, Festival_Features_t, Price_SKU, Promotions_t, Trend_t, ...)
      This model learns, for example, that "Red Kanjeevaram sarees with heavy zari work in the ₹15-20k range sell X units on average during Diwali in Hyderabad when Y promotion is active."
    • Predicting for New/Unique SKUs: For a new, unique pattu saree, we input its attributes into this trained model to get a demand forecast. This is essentially predicting demand for a "type" of saree defined by its attributes.
  2. Similarity-Based Forecasting (for New Products):
    • When a new unique SKU is introduced, find the K most similar existing/past SKUs based on a weighted similarity score across attributes (text descriptions, image embeddings, structured tags).
    • The forecast for the new SKU can be a weighted average of the historical sales (or forecasts) of these K similar SKUs, adjusted for any price difference or newness factor.
  3. Forecasting at Aggregate Attribute Cluster Level:
    • Instead of individual SKUs, create meaningful clusters of products based on key attributes (e.g., "Red Kanjeevarams, Zari Border, Price Tier 2").
    • Forecast demand for these clusters. Procurement then ensures they source a portfolio of unique SKUs that collectively meet the forecasted demand for that cluster type. This allows for variety while still being data-driven at a manageable level.
    • The hierarchical forecasting approach I mentioned earlier would support this, where the lowest forecastable level might be these attribute clusters rather than every single unique SKU.
  4. Leverage Early Signals for New Unique Items (Pre-Festival):
    • Once a new unique saree is listed (even if months before the festival):
      • Track page views, zoom-ins on details, "notify me when available" clicks (if pre-launched), add-to-wishlist velocity.
      • These early engagement signals for a new unique item can be fed into a short-term model to adjust its initial attribute-based forecast as the festival approaches. A product generating high pre-festival buzz, even if unique, will likely have higher demand.
  5. Qualitative Input from Merchandising/Design Teams:
    • For truly novel designs or special artisan collaborations, historical data has limits. The model should allow for incorporating expert judgment from merchandising teams who understand upcoming trends or the appeal of unique weaves. This could be a manual override or an additional input feature to the model (e.g., "Merchandiser_Trend_Score").

So, for these unique items, we shift from "forecasting this exact SKU" to "forecasting demand for SKUs like this one based on its characteristics and early interest," and ensuring the overall assortment meets the demand for various types of traditional wear.

Handling Long-Tail & New SKUs: Candidate proposes excellent, practical solutions like attribute-based modeling, similarity-based forecasting, aggregate cluster forecasting, and leveraging early engagement signals, acknowledging the limitations of pure time-series for unique items.
Interviewer: That's a very robust way to handle unique SKUs. One final area: promotions. Myntra, like any e-commerce platform, runs various promotions during festival seasons, from site-wide discounts to bank offers or category-specific deals. How would you incorporate the impact of these varied and often overlapping promotional activities into your demand forecast for traditional Telugu clothing? And how would you advise the business on the optimal promotional strategy for these items based on your demand model?
Candidate: Incorporating promotional impact is crucial as it significantly influences demand, especially during festivals.

Incorporating Promotional Impact & Advising on Strategy:

1. Feature Engineering for Promotions:

  • For each historical time period (e.g., week) and product category/sub-category:
    • Promotion Active Flag: Binary flag (1 if any relevant promotion was active, 0 otherwise).
    • Average Discount Depth: Effective average percentage discount offered on the category during that period due to promotions. This needs to handle overlapping promos (e.g., 10% site-wide + extra 5% bank offer).
    • Type of Promotion: Categorical feature (e.g., "Flat Discount," "Bank Offer," "Bundle Deal," "New User Offer").
    • Promotion Reach/Visibility: Proxy for how widely the promotion was advertised (e.g., homepage banner, email campaign targeted).
    • Interaction with Festival: Promotion_Active_Flag * Is_Festival_Period_Flag. The impact of a promotion might be amplified during a festival.

2. Modeling Promotional Lift:

  • The machine learning regression models (XGBoost, etc.) would naturally learn the relationship between these promotional features and sales.
    • The model coefficients or feature importance scores would indicate the average lift provided by different types or depths of promotions.
  • Price Elasticity Estimation: By looking at how sales change with different effective price points (due to varying discounts), we can estimate price elasticity for different sub-categories of traditional wear. This tells us how responsive demand is to price changes.

3. Advising on Optimal Promotional Strategy:

The demand forecasting model itself can be used as a simulation tool to advise on optimal strategy:

  • Scenario Planning ("What-if" Analysis):
    • Once the model is trained, we can input different future promotional scenarios (e.g., "What if we run a 20% discount on Pattu Sarees during Dussehra?" vs. "What if we run a 15% discount + a bank offer?") and get a forecasted demand for each scenario.
  • Profitability Optimization:
    • For each promotional scenario, calculate: Forecasted_Units_Sold_Scenario_X * (Average_Selling_Price_Scenario_X - COGS) - Cost_of_Promotion_Scenario_X
    • This allows us to compare the forecasted profitability of different promotional strategies, not just forecasted sales units. The goal is to find the promotion that maximizes incremental profit.
    • A deep 40% discount might drive huge volume but could be less profitable than a targeted 20% discount on select high-margin items if elasticity is not very high.
  • Diminishing Returns Analysis:
    • The model can help identify if increasing discount depth (e.g., from 30% to 40%) yields a proportionally smaller increase in demand, indicating diminishing returns and suggesting a shallower discount might be more ROI-efficient.
  • Targeted Promotions:
    • If the model can predict demand conditional on customer segments (e.g., based on their past price sensitivity or brand loyalty for traditional wear), it can inform who should receive which type of offer for maximum impact.
  • Inventory Considerations:
    • If the forecast for a certain promotion shows demand exceeding likely supply (especially for unique artisan items), the business needs to either scale back the promotion for those items or work on increasing supply far in advance. The model helps quantify this risk.

Measuring True Incrementality of Promotions:

  • This is harder without a proper A/B test (some users get promo, some don't). If such tests are run, the demand forecasting model should be used to predict the baseline for both groups, and the lift is the difference.
  • If no A/B test, we use the model's prediction without the promo features active as the counterfactual baseline, and the prediction with promo features active as the forecast. The difference is the estimated promotional lift. This relies on the model accurately capturing historical promo effects.

So, the demand model becomes a key input into promotional planning by allowing simulation of different strategies and their likely impact on both sales volume and profitability, guiding Myntra towards more data-driven promotional investments for traditional Telugu wear during festivals.

Integrating Promotions & Optimization: Candidate details how to featurize promotions, model their lift, and critically, use the forecast model as a simulation tool for optimizing promotional strategy based on profitability, not just sales volume.

What to Learn from This Case

  • Clarify Scope First: Always start by understanding the specifics: forecast granularity, horizon, definitions.
  • Comprehensive Data Sourcing: Think broadly about internal (sales, product, customer, inventory, marketing, browsing) and external (festivals, economic, social, competitor) data.
  • Hybrid Modeling for Complex Demand: Combine classical time series (SARIMA, Prophet for baseline/seasonality) with machine learning (XGBoost for exogenous factors, complex interactions).
  • Feature Engineering is Crucial for Festivals: Create specific features for festival timing (flags, days until/since), promotions, and leading indicators (search trends, site engagement).
  • Address Product Uniqueness: For categories with many unique/low-volume SKUs (like traditional sarees), use attribute-based forecasting, similarity-based approaches for new items, or forecast at aggregate cluster levels.
  • Integrate Promotions: Model the impact of promotions as features and use the forecast model as a simulation tool to optimize promotional strategy for profitability.
  • Hierarchical Forecasting for Consistency: Ensure forecasts at different levels of product/region hierarchy are coherent.
  • Rigorous Validation: Use time-series cross-validation and appropriate error metrics (WMAPE, MASE).
  • Consider Business Application: The forecast directly impacts inventory, procurement, and marketing; the solution must be practical and interpretable for these teams.
  • Acknowledge Limitations: Be aware of challenges like data sparsity for new items and the difficulty of perfectly isolating promotional lift without A/B tests.

 

Nerchuko Academy · Free DS Interview Prep