Gaussian Naive Bayes: Handling Continuous Data (Part 2)
Learn how Gaussian Naive Bayes extends Naive Bayes to continuous features by assuming Gaussian distributions. Master the PDF formula, implementation with scikit-learn, and handling violations of assumptions.
Gaussian Naive Bayes: Handling Continuous Data (Part 2)
Learn how Naive Bayes handles features like Age or Salary using the Bell Curve.
Gaussian Naive Bayes: Handling Numbers in Naive Bayes
In Part 1, we saw how the Naive Bayes classifier uses probabilities based on feature frequencies (like counting words) to classify data. But what happens when our input features aren’t categories, but continuous numbers like ‘Age’, ‘Salary’, or ‘Temperature’?
We can’t simply count frequencies for every possible number! We need a different way to estimate the likelihood P(Feature | Class). This is where Gaussian Naive Bayes (GNB) comes in. It’s a specific type of Naive Bayes designed to work directly with continuous numerical features.
Main Technical Concept: Gaussian Naive Bayes is an extension of Naive Bayes that handles continuous features by assuming that the values of each feature, for each class, follow a Gaussian (Normal, or “bell curve”) distribution.
The Key Idea: Assuming the Bell Curve
The core idea behind GNB is simple but powerful:
For a given class (e.g., Class ‘Yes’), it assumes that the continuous values of a specific feature (e.g., ‘Age’) are distributed according to a Gaussian (Normal) distribution. It makes the same assumption for Class ‘No’, but potentially with a different mean and standard deviation.
Why the Gaussian Distribution?
- It’s a very common distribution found in nature and many real-world datasets.
- It’s mathematically well-understood and defined by just two parameters:
- Mean (μ): The center of the bell curve.
- Variance (σ²) or Standard Deviation (σ): How spread out the curve is.
Calculating Likelihoods (P(feature | Class))
Instead of counting frequencies, GNB calculates the likelihood using the Gaussian Probability Density Function (PDF). Here’s the idea:
- Calculate Class-Specific Stats: For each class (e.g., ‘Yes’ and ‘No’) and for each continuous feature (e.g., ‘Age’), calculate the mean (μ) and variance (σ²) of that feature’s values only for the data points belonging to that class.
- Use the Gaussian PDF: When you get a new data point with a specific feature value (
x), plug this value, along with the calculated mean (μ) and variance (σ²) for a given class, into the Gaussian PDF formula to get the likelihood density P(x | Class).
Gaussian Probability Density Function (PDF)
f(x | μ, σ²) = 1/√(2πσ²) * e^(-(x - μ)²/2σ²)
This formula gives the likelihood density of observing value x, given that the data for this class follows a Normal distribution with mean μ and variance σ².
The algorithm calculates this likelihood density for every feature and every class.
Putting it All Together: GNB Prediction
The overall process for classifying a new data point X = {x₁, x₂, ..., xn} using Gaussian Naive Bayes is:
- Calculate Priors: Determine the prior probability P(C) for each class C (e.g., fraction of ‘Yes’ samples in training data).
- Calculate Likelihoods: For each class C and each feature xᵢ:
- Retrieve the pre-calculated mean (μᵢ,C) and variance (σ²ᵢ,C) of feature
ifor classCfrom the training data. - Calculate P(xᵢ | C) using the Gaussian PDF formula with these specific μ and σ².
- Retrieve the pre-calculated mean (μᵢ,C) and variance (σ²ᵢ,C) of feature
- Combine using Bayes’ Theorem (Naive Assumption): For each class C, calculate the value proportional to the posterior probability:
(Multiply the prior by all the individual likelihood densities calculated in step 2).Score(C) = P(C) * P(x₁|C) * P(x₂|C) * ... * P(xn|C) - Predict: Choose the class C that has the highest Score(C).
Essentially, it asks: “Based on the typical ‘Age’ and ‘Salary’ distributions we saw for people who did purchase (Class 1), and the distributions for those who didn’t (Class 0), which class does this new person’s ‘Age’ and ‘Salary’ fit better with, considering the overall likelihood of purchase?”
Performance Tips & When to Use GNB
Key Points:
- Check Normality Assumption: While often robust even if violated, GNB works best if your continuous features are roughly normally distributed within each class. Use histograms or statistical tests to check. If features are highly non-normal, consider data transformations (like log or Box-Cox) or a different Naive Bayes variant or another algorithm.
- Independence Assumption: Remember GNB assumes features are independent. If your features are highly correlated, GNB might not perform optimally compared to models that handle correlations.
- Computational Efficiency: GNB is generally very fast to train as it mainly involves calculating means and variances.
- Good Baseline: Due to its speed and simplicity, GNB is often a good baseline model to try early in a classification project.
- Works Well with High Dimensions: It can perform reasonably well even with a large number of features relative to the number of samples, partly due to the independence assumption simplifying calculations.
Gaussian Naive Bayes: Key Takeaways
- Gaussian Naive Bayes (GNB) is a type of Naive Bayes classifier specifically designed for continuous numerical features.
- It assumes that features within each class follow a Gaussian (Normal) distribution.
- It calculates the likelihood P(feature | Class) using the Gaussian PDF formula, based on the mean and variance of the feature for that class learned from training data.
- It still relies on the “naive” assumption that features are conditionally independent given the class.
- Requires calculating mean and variance for each feature per class during “training”.
- Feature scaling is often recommended for numerical stability and better visualization, although GNB handles scales mathematically.
- It’s implemented easily in Scikit-learn using
GaussianNB.