Lecture 4: Probability I
Datasets often come from samples, not populations
Samples can introduce limitations and uncertainty
Probability quantifies the likelihood of different outcomes
It is the mathematical foundation for statistical inference
\[p = \frac{\text{Number of favorable outcomes}}{\text{Total number of trials}}\]
Probabilities always range from 0 to 1
\(P = 0\) → outcome is impossible
\(P = 1\) → outcome is certain
All possible outcomes must sum to 1
For mutually exclusive outcomes:
\[P(\text{one of outcomes}) = P(A_1) + P(A_2) + \cdots + P(A_N)\]
Example: What is \(P(5 \text{ or } 6)\) on a single die?
Six equally likely outcomes: {1, 2, 3, 4, 5, 6}
\(P(5) = 1/6\)
\(P(6) = 1/6\)
\[P(5 \text{ or } 6) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6} = \frac{1}{3}\]
A die is rolled. What is \(P(\text{prime number})\)?
\[P(\text{prime}) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{3}{6} = \frac{1}{2}\]
For independent events:
\[P(A \text{ and } B) = P(A) \times P(B)\]
Events are independent when one outcome does not affect the other
Example 3: Roll two dice. What is \(P(\text{both} \geq 5)\)?
\(P(\text{both} \geq 4)\)?
Example 5: Roll two dice. \(P(\text{at least one even})\)?
Four possibilities:
Even and Odd: \(P = 1/2 \times 1/2 = 1/4\)
Even and Even: \(P = 1/2 \times 1/2 = 1/4\)
Odd and Even: \(P = 1/2 \times 1/2 = 1/4\)
Three favorable outcomes, each with \(P = 1/4\):
\[P(\text{at least one even}) = \frac{1}{4} + \frac{1}{4} + \frac{1}{4} = \frac{3}{4}\]
We can calculate any probability using its complement:
\[P(A) = 1 - P(A^C)\]
\[P(A) = 1 - \frac{1}{4} = \frac{3}{4}\]
Roll two dice. What is \(P(\text{sum} = 2)\)?
Only one way: both dice show 1
\[P(\text{sum} = 2) = \frac{1}{6} \times \frac{1}{6} = \frac{1}{36}\]
Roll two dice. What is \(P(\text{sum} = 7)\)?
Six ways to get 7:
| Option | Dice 1 | Dice 2 | Probability |
|---|---|---|---|
| 1 | 1 | 6 | 1/36 |
| 2 | 6 | 1 | 1/36 |
| 3 | 2 | 5 | 1/36 |
| 4 | 5 | 2 | 1/36 |
| 5 | 3 | 4 | 1/36 |
| 6 | 4 | 3 | 1/36 |
\[P(\text{sum} = 7) = 6 \times \frac{1}{36} = \frac{6}{36} = \frac{1}{6}\]
| Sum | Combinations | Count | Probability |
|---|---|---|---|
| 2 | (1,1) | 1 | 1/36 |
| 3 | (1,2), (2,1) | 2 | 2/36 |
| 4 | (1,3), (2,2), (3,1) | 3 | 3/36 |
| 5 | (1,4), (2,3), (3,2), (4,1) | 4 | 4/36 |
| 6 | (1,5), (2,4), …, (5,1) | 5 | 5/36 |
| 7 | (1,6), (2,5), …, (6,1) | 6 | 6/36 |
| 8–12 | (symmetric to 2–6) | 5→1 | 5/36→1/36 |
Figure 1
Figure 2
Figure 3
7 is the most likely sum (most combinations)
2 and 12 are the least likely (one combination each)
Empirical distribution is irregular with few trials
With more trials, it converges to the theoretical shape
The Law of Large Numbers (LLN):
As the number of trials increases, the empirical distribution converges to the theoretical one
Roll a die 10 times → might get 30% sixes
Roll it 10,000 times → proportion of sixes → \(1/6\)
LLN is about individual outcomes converging to their true probabilities
In practice, we almost never observe the full population
Key question: How much do sample means vary from sample to sample?
This is the idea behind the sampling distribution
The logic of inferential statistics starts with a population
The researcher draws repeated samples: 1, 2, 3
The researcher draws repeated samples: 1, 2, 3
The researcher draws repeated samples: 1, 2, 3
Each sample produces a sample mean
The sample means form a sampling distribution
Can be converted to an empirical probability distribution
As we repeat more, the distribution of means takes shape
Figure 4
Figure 5
The Central Limit Theorem (CLT) says something different from LLN:
This holds regardless of the shape of the original distribution
| Law of Large Numbers | Central Limit Theorem | |
|---|---|---|
| About | One long experiment | Many repeated samples |
| What converges | Proportions / averages → true value | Distribution of sample means → normal |
| Requires | More trials | More samples (of fixed size) |
| Example | 10,000 die rolls → \(\bar{x} \to 3.5\) | 10,000 samples of 30 rolls → means form bell curve |
A pollster surveys 1,000 voters. The candidate gets 52% support.
The CLT tells us: if we polled 1,000 voters many times, the sample proportions would form a bell curve around the true proportion
This is the foundation of margins of error and confidence intervals
A bag contains 3 red, 2 blue, and 5 green marbles.
You draw one marble. Calculate:
Hint: Think about which rule applies — addition or complement?
\(P(\text{red or blue}) = \frac{3}{10} + \frac{2}{10} = \frac{5}{10} = \frac{1}{2}\)
\(P(\text{not green}) = 1 - P(\text{green}) = 1 - \frac{5}{10} = \frac{1}{2}\)
Yes — “red or blue” and “not green” describe the same event. The complement rule and the addition rule give the same answer.
You draw two marbles with replacement.
What is \(P(\text{both green})\)?
Hint: Are the draws independent? Which rule applies?
With replacement, draws are independent:
\[P(\text{both green}) = \frac{5}{10} \times \frac{5}{10} = \frac{25}{100} = \frac{1}{4}\]
What is \(P(\text{not 5 on all 32 throws})\)?
We multiply the probabilities (independent events):
\[P = \left(\frac{5}{6}\right)^{32} = 0.0029\]
Probabilities range from 0 to 1, sum to 1
Addition rule: mutually exclusive outcomes
Multiplication rule: independent events
Complement rule: \(P(A) = 1 - P(A^C)\)
Law of Large Numbers: more trials → empirical converges to theoretical
Central Limit Theorem: sample means → normal distribution
Popescu (JCU) Statistical Analysis Lecture 4: Probability I