Krishna Waters: Bottling Process Quality Comparison

Problem Statement

"Krishna Waters," a mineral water company based in Vijayawada, is comparing two bottling processes. Their traditional process (Process A) produces 10,000 bottles daily with approximately 100 bottles showing minor quality issues. A new technique developed by engineers from an institute in Warangal (Process B) has been tested on 500 bottles, with only 2 showing defects.

Challenges in Comparing Bottling Processes

MODERATE

What challenges might arise when comparing these bottling processes (traditional Process A vs. new Warangal-developed Process B) due to the highly imbalanced sample sizes (10,000 vs. 500) and the small number of defects (just 2) in Process B?

Solution

Krishna Waters in Vijayawada is comparing their old bottling method (Process A) with a new one from Warangal engineers (Process B). There are a couple of tricky things with the data they have:

Very Different Group Sizes: They've looked at 10,000 bottles from Process A but only 500 from Process B.
- What this means: The defect rate for Process A (100 defects in 10,000 bottles = 1%) is based on a lot of information, so it's quite a reliable estimate. But the defect rate for Process B (2 defects in 500 bottles = 0.4%) is based on much less information. If they tested another 500 bottles with Process B, the number of defects could easily be different (maybe 1, maybe 3), and the rate would change quite a bit. So, our estimate for Process B's defect rate is less certain.
Very Few Defects in Process B: Only 2 defective bottles from Process B is great news! But statistically, it's tough to be super sure about a rate when you've only seen the event (a defect) happen twice.
- What this means: Common statistical tests often need a certain number of "events" (like defects) to work properly. With only 2 defects, these tests might not give accurate results. It's like trying to predict the weather for the whole year based on just two rainy days – hard to be sure!

These issues make it challenging to confidently say if the new Warangal technique is truly better than the traditional Vijayawada one using standard statistical tests, and a wrong decision could impact quality across all bottling plants in the Telugu states.

Comparing Process A (traditional Vijayawada process: 10,000 bottles, 100 defects) and Process B (new Warangal-developed technique: 500 bottles, 2 defects) for Krishna Waters presents several statistical challenges:

1. Highly Imbalanced Sample Sizes:
- Process A has a sample size (n_A = 10,000) that is 20 times larger than Process B (n_B = 500).
- Challenge: The estimate of the defect rate for Process A (p_A = 100/10,000 = 0.01) is quite precise due to the large sample. However, the estimate for Process B (p_B = 2/500 = 0.004) is based on a much smaller sample, making it inherently more uncertain and variable. A small change in the number of defects in Process B would lead to a proportionally larger change in its estimated defect rate.
2. Small Number of Defect Events in Process B:
- Observing only 2 defects in Process B means we are dealing with a rare event within that smaller sample.
- Challenge:
  - Unstable Rate Estimate: The estimated proportion (0.4%) is highly sensitive to small changes. If one more defect was found, the rate would jump to 0.6%; if one less, it would be 0.2%.
  - Violation of Assumptions for Standard Tests: Many common statistical tests for comparing proportions (like the chi-squared test or z-test for two proportions) rely on approximations that are valid only when the number of observed or expected events (and non-events) in each group is sufficiently large (e.g., typically >5). With only 2 defects in Process B, these approximations may not hold, leading to inaccurate p-values and potentially incorrect conclusions.
3. Difficulty in Assessing True Variability and Confidence:
- Challenge: With very few defects, it's harder to get a reliable estimate of the true underlying defect probability for Process B and construct precise confidence intervals. The confidence interval for p_B will likely be very wide.
4. Potential for Low Statistical Power:
- Challenge: Even if Process B is genuinely better, the small sample size and few observed defects might mean the test lacks sufficient statistical power to detect a statistically significant difference from Process A. Krishna Waters might incorrectly conclude there's no improvement when one actually exists (a Type II error).
5. Impact on Decision-Making:
- Challenge: Making a decision to overhaul bottling processes across all plants in the Telugu states based on potentially unreliable statistical results is risky. An incorrect decision could lead to:
  - Unnecessary investment if Process B isn't truly superior.
  - Missed opportunity if a genuinely better process developed by Warangal engineers is discarded due to inconclusive small-sample results.

These challenges necessitate the use of statistical methods that are robust to small sample sizes and rare event counts to ensure a fair and reliable comparison between the two bottling processes.

Most Appropriate Statistical Test

ADVANCED

Which statistical test would be most appropriate for this comparison of bottling processes at Krishna Waters? Explain why a standard z-test for two proportions might be problematic in this scenario.

Solution

For comparing the defect rates of the old Vijayawada bottling process (A) and the new Warangal one (B) at Krishna Waters, especially since Process B had only 2 defective bottles, the best choice is Fisher's Exact Test.

Why Not a Standard Z-test?

A Z-test is a common way to compare percentages (like defect rates). However, it works best when you have a decent number of "events" (defects) and "non-events" (good bottles) in both groups.
Think of it like this: the Z-test uses some math shortcuts (approximations) that are accurate only with larger numbers.
In our case, Process B had only 2 defects (and 498 good bottles). The "2 defects" number is very small. When numbers are this small, the Z-test's shortcuts aren't very accurate, and it might give us a misleading result about whether the Warangal process is truly better. It's like using a ruler marked only in centimeters to measure something that's millimeters wide – you won't get a precise answer.

Why Fisher's Exact Test is Better Here:

Fisher's Exact Test is specifically designed for situations with small numbers in some categories, like our 2 defects.
It doesn't use those math shortcuts. Instead, it calculates the exact probability of seeing our specific results (or results even more extreme) if there was actually no difference between the two bottling processes.
This makes it much more reliable and accurate when dealing with rare events like very few defective bottles in one of the test groups. It gives Krishna Waters a more trustworthy answer.

For comparing the defect rates of the two bottling processes at Krishna Waters, given the data (Process A: 100 defects/10,000 bottles; Process B: 2 defects/500 bottles), the most appropriate statistical test would be Fisher's Exact Test.

Why a Standard Z-test for Two Proportions Might Be Problematic:

A standard z-test for two proportions relies on the normal approximation to the binomial distribution. This approximation is generally considered valid if certain conditions are met, typically:

n₁p₁ ≥ 5, n₁(1-p₁) ≥ 5
n₂p₂ ≥ 5, n₂(1-p₂) ≥ 5

(Where n is the sample size and p is the sample proportion for each group. Some statisticians use a threshold of 10.)

Let's check for Krishna Waters' data:

For Process A (traditional Vijayawada process):
- n_A = 10,000, Defects_A = 100, p_A = 0.01
- n_Ap_A = 10000 * 0.01 = 100 (≥ 5)
- n_A(1-p_A) = 10000 * 0.99 = 9900 (≥ 5)
- The conditions are met for Process A.
For Process B (new Warangal-developed technique):
- n_B = 500, Defects_B = 2, p_B = 0.004
- n_Bp_B = 500 * 0.004 = 2 (This is < 5)
- n_B(1-p_B) = 500 * 0.996 = 498 (≥ 5)

Since the number of observed defects (n_Bp_B = 2) in Process B is less than the commonly accepted threshold of 5, the normal approximation used by the z-test may not be accurate for this group. Using a z-test in this scenario could lead to:

An inaccurate p-value.
Potentially misleading conclusions about the statistical significance of the difference.

A chi-squared test for independence (which is mathematically related to the z-test for two proportions) would also face similar issues because the expected count for the "Process B, Defect" cell under the null hypothesis (assuming equal proportions) would be (102/10500) * 500 ≈ 4.85, which is borderline or below the typical threshold of 5 for all expected cell counts.

Why Fisher's Exact Test is Most Appropriate:

Handles Small Expected Frequencies: Fisher's Exact Test is an "exact" test, meaning it calculates the p-value based on the exact hypergeometric distribution of the data in a 2x2 contingency table, given the marginal totals. It does not rely on large-sample approximations.
Designed for Rare Events: It is particularly well-suited for situations where one or more cells in the contingency table have small counts, as is the case with the 2 defects in Process B.
Provides Reliable P-values: It gives a more accurate p-value when the assumptions for asymptotic tests (like z-test or chi-squared) are violated due to small cell counts.
Application: We would set up a 2x2 table:

Defect No Defect Total

Process A 100 9900 10000

Process B 2 498 500

Total 102 10398 10500

Fisher's Exact Test would then determine the probability of observing this table, or one more extreme, if the defect rates were truly the same.

While Fisher's Exact Test can be conservative (less likely to find a significant result), its accuracy with small counts makes it the preferred choice here for Krishna Waters to make a statistically sound decision about their bottling processes.

Determining Minimum Sample Size for Process B

ADVANCED

How would you determine the minimum sample size required for Process B to make a statistically valid conclusion before implementing it across all bottling plants of Krishna Waters in the Telugu states?

Solution

To help Krishna Waters decide how many more bottles to test with the new Warangal-developed technique (Process B), we need to make some smart guesses and set some goals. The current 500 bottles might not be enough to be super sure.

What We Need to Decide First:

What's the "Old" Defect Rate? Process A (Vijayawada) has a 1% defect rate (100 defects in 10,000 bottles). This is our starting point.
How Much Better Does "New" Need to Be? This is key. Does Krishna Waters want the new process to cut defects in half (to 0.5%)? Or do they need an even bigger improvement, say down to 0.2%? The smaller the improvement we want to reliably detect, the more bottles we need to test. Let's say they'd be happy if it's 0.5%.
How Confident Do We Want to Be? Usually, businesses want to be 95% sure that if they see a difference, it's real and not just luck. (Statisticians call this alpha = 0.05).
How Much Risk of Missing Out? If the new process is truly better, we want a good chance of finding that out. Let's say we want an 80% chance of detecting the improvement (statisticians call this power = 0.80).

Getting a Rough Idea of Sample Size:

With these goals (detecting a drop from 1% to 0.5% defect rate, with 95% confidence and 80% power), we'd use statistical formulas or software. A rough estimate would be that Krishna Waters might need to test around 2,500 to 3,000 bottles with the new Process B. This is much more than the current 500.

Why More Bottles?

Because defects are rare (especially if the new process is good), we need to look at many bottles to get a stable idea of its true defect rate.
To be confident that a small difference (like 1% vs 0.5%) is real and not just random variation, we need enough evidence.

So, before rolling out the new Warangal technique across all bottling plants in the Telugu states, Krishna Waters should plan for a more extensive test with Process B, likely involving a few thousand bottles, to make a statistically sound decision.

To determine the minimum sample size required for Process B (the new Warangal-developed technique) to make a statistically valid conclusion for Krishna Waters, we need to perform a formal sample size calculation for comparing two proportions. This involves several key inputs:

1. Baseline Proportion (p₁ from Process A): The defect rate of the traditional Vijayawada process.
p₁ = 100 defects / 10,000 bottles = 0.01 (or 1%).
2. Expected Proportion (p₂ for Process B) or Minimum Detectable Effect (MDE): This is the defect rate for Process B that Krishna Waters would consider a practically significant improvement.
The pilot showed p_{B_observed} = 2/500 = 0.004. Management needs to decide what level of reduction they want to be able to detect. For example:
- If they want to detect if p₂ is 0.005 (a 50% reduction from 0.01).
- Or perhaps a more ambitious p₂ = 0.003.
The smaller the difference (p₁ - p₂) they want to detect, the larger the sample size required. Let's assume they want to reliably detect if Process B can achieve a defect rate of 0.005 (0.5%).
3. Significance Level (α): The probability of making a Type I error (rejecting the null hypothesis when it is true; i.e., concluding Process B is better when it's not). This is typically set at 0.05. Since Krishna Waters is interested in whether Process B has fewer defects, a one-sided test is appropriate.
4. Statistical Power (1-β): The probability of making a correct decision when the alternative hypothesis is true (i.e., correctly concluding Process B is better if it truly has the target defect rate p₂). This is typically set at 0.80 (80%) or 0.90 (90%). Higher power means a lower chance of a Type II error (failing to detect a real improvement). Let's use 80% power.
5. Ratio of Sample Sizes (if applicable): If Process A's data is considered fixed from a very large historical dataset, we are essentially calculating n₂ (for Process B). If both are new samples, the ratio (e.g., 1:1) would be considered. Given the large n₁=10,000, we can often treat p₁ as well-estimated.

Calculation Approach:

Using statistical software (like R, Python's statsmodels, G*Power) or specialized online calculators for sample size for two independent proportions (one-sided test):

Inputs:

p₁ = 0.01
p₂ = 0.005 (the defect rate we want to be able to detect for Process B)
α = 0.05 (one-sided)
Power (1-β) = 0.80

A common formula for sample size per group (assuming equal sizes, which we can adjust for or use as a starting point for n₂ when n₁ is large) is:

n = [ (Z_α√(p_pooled(1-p_pooled)(1+1/k)) + Z_β√(p₁(1-p₁) + p₂(1-p₂)/k) ) / (p₁-p₂) ]²

Where p_pooled is the pooled proportion, k is the ratio n₁/n₂. For a simpler approximation when n₁ is very large, we can focus on the power to detect a difference from a known p₁.

Alternatively, using a calculator for comparing two proportions (one-sided):

Proportion 1 (p1): 0.01
Proportion 2 (p2): 0.005
Alpha (one-sided): 0.05
Power: 0.80
Sample size ratio (n2/n1): This can be tricky. If we assume n1 is very large and fixed, we are essentially calculating n2 needed. If we were to run a new experiment with equal samples, the numbers would be different. For this context, we're focused on n2.

Using a standard online calculator (e.g., for a one-sided test of p1=0.01 vs p2=0.005, alpha=0.05, power=0.8, assuming n1 is large or we are looking for sample size n per group in a 1:1 scenario for simplicity of calculation here):

The required sample size for Process B (n_B) to detect this difference (from 1% down to 0.5%) with 80% power and 5% significance (one-sided) would typically be in the range of 2,500 to 3,000 bottles. (The exact number depends on the specific formula used by the calculator, whether continuity correction is applied, etc. For instance, some calculators might suggest around 2,755 per group for a 1:1 design. If n1 is very large, n2 would be in a similar ballpark.)

Recommendation:

To make a statistically valid conclusion for Krishna Waters about implementing the new Warangal-developed bottling technique across all plants in the Telugu states, I would recommend testing a minimum of approximately 2,800 bottles using Process B.

Rationale for this recommendation:

This sample size would provide approximately 80% power to detect a decrease in the defect rate from the current 1% (Process A in Vijayawada) down to 0.5% (a 50% relative reduction) for Process B, using a one-sided test at a 5% significance level.
A 0.5% defect rate (or a reduction of 0.005 absolute) is likely a practically significant improvement for a high-volume business like mineral water bottling, impacting costs and quality.
While the initial 500 bottles showed an even lower rate (0.4%), a larger sample is needed to confirm this with confidence and ensure the result isn't due to random chance in a small sample.
This balances the need for statistical confidence with the costs and time associated with extended testing. The company can adjust this based on their risk appetite and the precise MDE they deem critical. If they want higher power (e.g., 90%) or to detect an even smaller difference, the sample size would need to be larger.

It's also advisable to perform interim analyses if the testing is staged, though this requires more complex statistical planning (e.g., using alpha-spending functions) to avoid inflating the Type I error rate.

Your Clear Perspective!

What are your thoughts on these scenarios? Try answering the questions yourself and share your insights or alternative approaches in the comments section below!

Back to Inferential Stats

Problem Statement

Challenges in Comparing Bottling Processes

Related Concepts

Hint

Solution

Most Appropriate Statistical Test

Related Concepts

Hint

Solution

Why a Standard Z-test for Two Proportions Might Be Problematic:

Why Fisher's Exact Test is Most Appropriate:

Determining Minimum Sample Size for Process B

Related Concepts

Hint

Solution

Calculation Approach:

Recommendation:

Your Clear Perspective!

	Defect	No Defect	Total
Process A	100	9900	10000
Process B	2	498	500
Total	102	10398	10500