PMF vs. PDF

PMF vs. PDF - Key Differences

EASY

What are the main differences between a Probability Mass Function (PMF) and a Probability Density Function (PDF)? Provide examples of when you would use each.

Explanation: PMF vs. PDF

Imagine you're measuring things:

Sometimes you count distinct items, and sometimes you measure something that can take on any value within a range.

Counting: How many apples are in a basket? (0, 1, 2, 3... You can't have 1.5 apples). This is like a PMF. It tells you the probability of getting exactly 2 apples.
Measuring: How tall is a person? (e.g., 170.5 cm, 170.51 cm, 170.512 cm... infinite possibilities between any two heights). This is like a PDF. It's tricky to say the probability of someone being exactly 170.50000... cm tall. Instead, we ask about the probability of being between 170 cm and 171 cm.

PMF and PDF are just tools to describe the probabilities for these two different types of scenarios: countable (discrete) and measurable (continuous).

The Probability Mass Function (PMF) and Probability Density Function (PDF) are both used to describe the probabilities of different outcomes for a random variable. The key difference lies in whether the random variable is discrete (countable outcomes) or continuous (measurable outcomes on a scale).

Feature	PMF (Probability Mass Function)	PDF (Probability Density Function)
Applies to	Discrete Random Variables	Continuous Random Variables
Definition	Gives the probability that a discrete random variable X is exactly equal to some value x. `P(X = x)`	A function whose value at any given sample (or point) in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be close to that sample. It does not give the probability of an exact point.
Value Interpretation	The value `P(X = x)` is a probability. `0 ≤ P(X = x) ≤ 1`	The value `f(x)` is a probability density. It can be greater than 1. It is not a probability itself. `f(x) ≥ 0`
Calculating Probability	For a specific outcome: `P(X = x)`. For a range `P(a ≤ X ≤ b) = Σ P(X = x)` for all x from a to b.	For a specific outcome: `P(X = x) = 0` (probability of any single point is zero). For a range: `P(a < X < b) = ∫[a to b] f(x)dx` (integral of the PDF over the range).
Total Probability	The sum of all probabilities for all possible values of X must equal 1. `Σ P(X = x) = 1` (sum over all possible x)	The total area under the curve of the PDF over its entire range must equal 1. `∫[-∞ to ∞] f(x)dx = 1`
Graphical Representation	Usually a bar chart or a set of points, where the height of the bar/point at x is `P(X = x)`.	A continuous curve. The area under the curve between two points gives the probability of the variable falling in that range.
Example Variables	Number of heads in 3 coin flips (0, 1, 2, 3) Number of clicks on an ad (0, 1, 2, ...) Number of defective items in a batch Dice roll outcome (1, 2, 3, 4, 5, 6)	Height of a person Weight of an object Time taken to complete a task (e.g., user session duration) Temperature Revenue amounts
Common Distributions	Bernoulli, Binomial, Poisson, Geometric	Normal (Gaussian), Exponential, Uniform (continuous), Chi-squared

Key Insight & Why PDF values can exceed 1

A common point of confusion is that PDF values, f(x), can be greater than 1, while PMF values, P(X=x), cannot (as they are probabilities).

PMF: P(X=x) is a direct probability, so it must be between 0 and 1.
PDF: f(x) represents a density. Think of density like mass per unit volume. A very small object can have a high density. Similarly, a PDF can have a high value over a very narrow interval, but the area under the curve for that interval (which represents probability) will still be ≤ 1. For the total probability (total area) to be 1, if the function is very high in one region, it must be correspondingly low in others, or the high region must be very narrow.

For example, a continuous uniform distribution between 0 and 0.5, denoted U(0, 0.5), would have a PDF f(x) = 2 for 0 < x < 0.5, and f(x) = 0 otherwise. Here, the PDF value is 2, but the total area is (0.5 - 0) * 2 = 1.

When to Use Which:

Use a PMF when you are dealing with outcomes that are distinct and countable. You want to know the probability of specific counts or categories.
Use a PDF when you are dealing with outcomes that can take on any value within a continuous range. You are interested in the probability of the outcome falling within a certain interval.

Understanding this distinction is crucial for choosing the correct probabilistic models and tools in data analysis and machine learning.

Understanding Random Variables & Their Distributions

What are Random Variables?

Describing Probabilities: PMF & PDF

PMF vs. PDF - Key Differences

Related Concepts

Hint

Explanation: PMF vs. PDF

Key Insight & Why PDF values can exceed 1