APSRTC/TSRTC: Passenger Satisfaction Survey

Problem Statement

You work as a data analyst for APSRTC (Andhra Pradesh State Road Transport Corporation) and TSRTC (Telangana State Road Transport Corporation), which operate extensive bus networks connecting cities, towns, and villages across the Telugu states. After introducing new Super Luxury and Garuda Plus services between major cities like Hyderabad-Vijayawada, Tirupati-Hyderabad, and Visakhapatnam-Hyderabad, management wants to conduct a passenger satisfaction survey. You need to determine how many passengers to survey to estimate the proportion of satisfied travelers within +/- 3% with 95% confidence.

Factors Determining Sample Size

EASY

Conceptually, what factors determine the required sample size for this survey of bus passengers traveling between major Telugu cities and smaller towns like Warangal, Kakinada, and Kurnool?

Solution

Imagine APSRTC and TSRTC want to know how happy passengers are with their new Super Luxury and Garuda Plus buses running between cities like Hyderabad-Vijayawada or Tirupati-Hyderabad, and even to towns like Warangal or Kakinada. We need to survey some passengers, but how many?

Several things decide how many people we need to ask:

How Sure We Want to Be (Confidence Level): Do we want to be 90% sure our survey result is close to the truth, or super sure at 99%? Being more sure (e.g., 95% as stated) means we need to ask more people.
How Precise We Need to Be (Margin of Error): Do we need to know the satisfaction within +/- 5% of the true value, or really precise like +/- 1%? Getting more precise (e.g., +/- 3% as stated) means we need to ask more people.
Our Best Guess of Satisfaction (Estimated Proportion): If we think most people are either very happy or very unhappy (e.g., 90% happy or 10% happy), we might need fewer people than if we think it's split down the middle (around 50% happy). If we have no idea, assuming 50% happy is the safest (most conservative) and requires the largest sample.
How Many People Travel (Population Size): If only a very small number of people use these buses (say, only 500 total), we might be able to survey a large chunk of them or even everyone. But for APSRTC/TSRTC, many thousands travel, so this factor usually matters less for the formula unless the sample size gets very close to the total population.

So, to be 95% confident and get within +/- 3% accuracy for passengers traveling across the Telugu states, we need to consider these points to figure out the right number of surveys.

Conceptually, the required sample size for the APSRTC/TSRTC passenger satisfaction survey (covering routes like Hyderabad-Vijayawada, Tirupati-Hyderabad, Visakhapatnam-Hyderabad, and services to towns like Warangal, Kakinada, and Kurnool) is determined by the following key factors:

1. Desired Confidence Level:
- This is the level of certainty that the sample estimate (proportion of satisfied travelers) accurately reflects the true proportion in the entire passenger population.
- A higher confidence level (e.g., 95% or 99%) means we want to be more certain. To achieve higher certainty, a larger sample size is required. The problem specifies 95% confidence.
2. Desired Margin of Error (Precision):
- This is how close we want our sample estimate to be to the true population proportion. It's the "+/-" value.
- A smaller margin of error (e.g., +/- 3% as specified, versus +/- 5%) means we want a more precise estimate. To achieve higher precision (a smaller margin of error), a larger sample size is required.
3. Estimated Proportion of Satisfied Travelers (Population Proportion, p):
- This is an estimate of the characteristic we are trying to measure (passenger satisfaction).
- The sample size needed is largest when this proportion is assumed to be 50% (p=0.5). This is because p*(1-p) is maximized at p=0.5, leading to the highest variability.
- If we have a preliminary estimate (e.g., from app reviews suggesting 60% satisfaction), using this estimate (p=0.6) can lead to a slightly smaller required sample size compared to assuming p=0.5. If the true proportion is closer to 0% or 100%, variability is lower, and a smaller sample is needed. Since the satisfaction is likely not at these extremes, considering a value like 0.5 or 0.6 is reasonable.
4. Population Size (N) (Less critical for large populations):
- This is the total number of passengers using the Super Luxury and Garuda Plus services during the survey period across all Telugu states.
- If the population is very large (as is likely for APSRTC/TSRTC passengers on major routes, especially during festival seasons like Sankranti), the sample size formula often uses an assumption of an infinite population, or the population size has a minimal impact on the required sample size.
- A finite population correction factor can be applied if the calculated sample size is a significant fraction (e.g., >5%) of the total population, which would slightly reduce the required sample size. However, for a large and diverse passenger base across multiple cities and towns, this is often not a primary driver unless targeting very specific, small sub-groups.

In summary, to determine how many passengers to survey, we primarily need to decide on our desired confidence (95%), precision (+/- 3%), and make an educated guess or use a conservative estimate (like 50% or the preliminary 60%) for the expected satisfaction rate. These factors are then plugged into a standard sample size formula for proportions.

Impact of Prior Estimate on Sample Size

MODERATE

If preliminary data from ticket booking app reviews suggests approximately 60% satisfaction with the new services (particularly popular during festival seasons like Sankranti when many people travel to their native places), how does this affect the sample size calculation compared to if you had no prior estimate (i.e., assumed 50%)?

Solution

APSRTC/TSRTC wants to survey passengers about their new Super Luxury buses, especially during Sankranti travel. We need to decide how many people to ask. We have a hint from app reviews: maybe 60% are satisfied.

How does this 60% guess affect how many people we need to survey, compared to if we had no clue (and guessed 50%)?

The "50/50 Split" is Toughest: When we calculate sample size, there's a part of the math that depends on how varied opinions are. If opinions are split almost 50% happy and 50% unhappy (p=0.5), this variation is at its maximum. This means we need the largest possible sample size to be sure about our results. It's the safest, most conservative guess if we know nothing.
60% Satisfaction (or 40%): If we guess 60% are satisfied (so p=0.6, and 1-p=0.4), the term p*(1-p) becomes 0.6 * 0.4 = 0.24. If we had guessed 50% (p=0.5), then p*(1-p) would be 0.5 * 0.5 = 0.25.

Since 0.24 (from 60% guess) is slightly smaller than 0.25 (from 50% guess), using the 60% satisfaction estimate will result in a slightly smaller required sample size for the survey of passengers traveling between Hyderabad, Vijayawada, Tirupati, and other Telugu towns like Warangal or Kakinada. It means we might be able to survey a few less people and still get our +/-3% precision with 95% confidence. The further our estimate is from a 50/50 split, the smaller the sample size needed (as long as our estimate is reasonably good!).

Having preliminary data suggesting approximately 60% satisfaction with the new APSRTC/TSRTC Super Luxury and Garuda Plus services (popular during Sankranti for travel to native places) affects the sample size calculation by potentially reducing the required sample size compared to using the most conservative assumption of 50% satisfaction.

The standard formula for sample size (n) for estimating a proportion (when the population is large) is:
n = (Z² * p * (1-p)) / E²
Where:

Z = Z-score corresponding to the desired confidence level (e.g., 1.96 for 95% confidence).
p = estimated population proportion (in this case, satisfaction rate).
E = desired margin of error (e.g., 0.03 for +/- 3%).

The key term here that is affected by the estimate of satisfaction is p * (1-p), which represents the variance of a binomial distribution.

Scenario 1: No Prior Estimate (Assume p = 0.5 or 50%)
- When there is no prior estimate for the proportion, it is standard practice to use p = 0.5. This is the most conservative assumption because it maximizes the value of p * (1-p).
- If p = 0.5, then p * (1-p) = 0.5 * (1-0.5) = 0.5 * 0.5 = 0.25.
- This value maximizes the numerator in the sample size formula, thus yielding the largest (most conservative) required sample size to achieve the desired confidence and precision for the survey covering routes like Hyderabad-Vijayawada or Tirupati-Hyderabad.
Scenario 2: Preliminary Estimate of p = 0.6 or 60% Satisfaction
- If preliminary data (e.g., from app reviews for passengers traveling to Warangal, Kakinada, Kurnool, or during Sankranti) suggests p = 0.6.
- Then, p * (1-p) = 0.6 * (1-0.6) = 0.6 * 0.4 = 0.24.

Effect on Sample Size Calculation:

Since 0.24 (from p=0.6) is less than 0.25 (from p=0.5), using the preliminary estimate of 60% satisfaction will result in a slightly smaller required sample size compared to assuming 50% satisfaction, all other factors (confidence level Z and margin of error E) remaining the same.

For example, with 95% confidence (Z=1.96) and +/-3% margin of error (E=0.03):

Assuming p=0.5: n = (1.96² * 0.5 * 0.5) / 0.03² = (3.8416 * 0.25) / 0.0009 = 0.9604 / 0.0009 ≈ 1068 passengers.
Assuming p=0.6: n = (1.96² * 0.6 * 0.4) / 0.03² = (3.8416 * 0.24) / 0.0009 = 0.921984 / 0.0009 ≈ 1025 passengers.

In this case, using the 60% estimate reduces the required sample size by approximately 43 passengers. While not a massive reduction, it does make the survey slightly more efficient. The further the estimated proportion (p) is from 0.5 (towards either 0 or 1), the smaller the p*(1-p) term becomes, and thus the smaller the required sample size.

Therefore, having a reasonable preliminary estimate of satisfaction (like 60%) allows for a more optimized (slightly smaller) sample size for the APSRTC/TSRTC survey, which can save time and resources.

Tradeoffs: Precision, Confidence, and Cost

ADVANCED

What are the practical tradeoffs between wanting higher precision (smaller margin of error) or higher confidence, and the cost/feasibility of collecting larger samples from diverse passenger segments including IT professionals traveling on weekends, students returning to universities, and families visiting temples like Tirumala?

Solution

APSRTC and TSRTC want to know how happy their passengers are. They want to be 95% sure their survey is right, within +/- 3%. What if they want to be even MORE sure, or MORE precise?

Wanting More (Higher Precision or Confidence):

Higher Precision (e.g., +/- 1% instead of +/- 3%): This means wanting a much sharper, more exact estimate of satisfaction. It's like wanting to measure a field with a super-precise laser instead of a regular tape measure.
Higher Confidence (e.g., 99% instead of 95%): This means wanting to be even more certain that the true satisfaction level is within our estimated range. It's like wanting to be extra, extra sure of our measurement.

The Catch (Tradeoffs): To get higher precision OR higher confidence, APSRTC/TSRTC would need to survey A LOT more passengers.

Cost: Surveying more people costs more money (printing forms or paying surveyors, data entry, incentives if any).
Time & Effort (Feasibility): It takes much more time and effort to find and survey thousands of passengers across many routes (Hyderabad-Vijayawada, Tirupati-Hyderabad), bus types (Super Luxury, Garuda Plus), and diverse groups (IT professionals on weekends, students, families going to Tirumala, people traveling for Sankranti).
Complexity: Managing a very large survey with diverse passenger segments (from cities like Visakhapatnam to towns like Kakinada or Kurnool) becomes much more complex. Ensuring good quality responses from everyone is harder.

So, while it's great to want very precise and very confident results, APSRTC/TSRTC needs to balance this desire with the practical costs and difficulties of doing a much larger survey. The current goal of +/-3% with 95% confidence is a common, reasonable balance for many business surveys.

There are significant practical tradeoffs between desiring higher precision (a smaller margin of error) or higher confidence, and the associated costs and feasibility of collecting larger samples for the APSRTC/TSRTC passenger satisfaction survey. These tradeoffs are particularly relevant when considering diverse passenger segments (IT professionals, students, families visiting temples like Tirumala) and routes (e.g., Hyderabad-Vijayawada, Tirupati-Hyderabad, Visakhapatnam-Hyderabad, and to smaller towns like Warangal, Kakinada, Kurnool).

Tradeoffs:

1. Higher Precision (Smaller Margin of Error, e.g., +/- 1% instead of +/- 3%):
- Benefit: Provides a more exact estimate of the true proportion of satisfied passengers. This can lead to more fine-tuned decision-making.
- Tradeoff (Cost/Feasibility):
  - Significantly Larger Sample Size Needed: The required sample size increases quadratically as the margin of error decreases (since E is squared in the denominator of the sample size formula). For instance, reducing the margin of error by half (e.g., from 4% to 2%) roughly quadruples the required sample size. Going from +/-3% to +/-1% would require approximately 9 times more passengers to be surveyed.
  - Increased Costs: More surveys mean higher costs for printing, distribution, data collection (e.g., interviewer time if not self-administered via app), data entry, and processing.
  - Increased Time: Collecting a much larger sample will take longer, potentially delaying insights needed for timely decisions (e.g., service changes before a peak season like Sankranti).
  - Logistical Complexity: Managing and ensuring quality for a very large survey across diverse routes and passenger types (IT professionals on weekends, students, families visiting Tirumala) becomes more challenging. Reaching specific segments in smaller towns like Warangal or Kurnool might require more effort per respondent.
2. Higher Confidence Level (e.g., 99% instead of 95%):
- Benefit: Provides greater certainty that the true population proportion falls within the calculated confidence interval. Reduces the risk of the true value being outside the estimated range.
- Tradeoff (Cost/Feasibility):
  - Larger Sample Size Needed: A higher confidence level requires a larger Z-score (e.g., Z=2.576 for 99% vs. Z=1.96 for 95%), which increases the required sample size.
  - Similar implications to higher precision regarding increased costs, time, and logistical complexity for surveying passengers on routes like Hyderabad-Vijayawada or to/from Tirupati.
3. Cost and Feasibility of Reaching Diverse Segments:
- To ensure the results are representative of all passenger segments (IT professionals, students, families, travelers to/from smaller towns like Kakinada), the sampling plan needs to be robust.
- Reaching a large enough sample within each specific segment to achieve high precision/confidence for that segment can be particularly costly and difficult. For example, getting a large, representative sample of weekend IT professional travelers on the Visakhapatnam-Hyderabad route might require surveying over many weekends.
- The cost per completed survey might be higher for harder-to-reach segments or those in more remote locations connected by the bus network.
4. Practical Value of Incremental Precision/Confidence:
- APSRTC/TSRTC management needs to consider if the business value of, for example, knowing satisfaction is 60% +/- 1% (very precise) versus 60% +/- 3% (less precise) justifies the significant additional cost and effort.
- Often, a +/- 3% to +/- 5% margin of error at 95% confidence is deemed sufficient for many business decisions, providing a good balance between accuracy and practicality.

In conclusion, while higher precision and confidence are statistically desirable, they come at a direct cost of increased sample size, time, and complexity. For APSRTC/TSRTC, especially when surveying diverse passenger segments across many routes in Telangana and Andhra Pradesh (including those during busy Sankranti travel to native places or temple visits to Tirumala), a careful balance must be struck. The chosen levels (+/- 3% margin of error, 95% confidence) represent a common compromise, but if resources are highly constrained, management might even need to consider relaxing these slightly, fully understanding the impact on the reliability of the satisfaction estimates.

Your Survey Strategy!

What are your thoughts on these scenarios? Try answering the questions yourself and share your insights or alternative approaches in the comments section below!

Back to Inferential Stats

Problem Statement

Factors Determining Sample Size

Related Concepts

Hint

Solution

Impact of Prior Estimate on Sample Size

Related Concepts

Hint

Solution

Effect on Sample Size Calculation:

Tradeoffs: Precision, Confidence, and Cost

Related Concepts

Hint

Solution

Tradeoffs:

Your Survey Strategy!