| Country | Urb. (x) |
|---|---|
| Afghanistan | 18.6118 |
| Albania | 40.4442 |
| Algeria | 52.6092 |
| American Samoa | 79.7625 |
| … | … |
| Zimbabwe | 26.3045 |
| \(\bar{x}\) | 51.33 |
Lecture 7: T-Tests and Correlations
Solution: use the t-test, which replaces \(\sigma\) with the sample standard deviation \(s_x\).
Figure 1: The t-distribution has heavier tails for small samples
Z-observed (requires known \(\sigma\)):
\[Z_{\text{obs}} = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}\]
T-observed (uses sample \(s_x\)):
\[t_{\text{obs}} = \frac{\bar{X} - \mu}{s_x / \sqrt{n}}\]
The only difference: \(\sigma \to s_x\)
Does life expectancy differ between EU and Latin America?
| Statistic | Latin America | EU |
|---|---|---|
| Mean life exp. | 61.914 | 67.114 |
| SD | 5.14 | 5.55 |
| n | 23 | 27 |
| Result | Value |
|---|---|
| t-statistic | -3.4364 |
| df | 47.64 |
| p-value | 0.001232 |
| 95% CI | [-8.24, -2.16] |
p < 0.05: reject \(H_0\). EU life expectancy is significantly higher than Latin America’s.
Figure 2: Observed t falls in the rejection region
Before trusting the test, check normality of each group:
| Group | W statistic | p-value | Normal? |
|---|---|---|---|
| Latin America | 0.963 | 0.5255 | Yes |
| EU | 0.8486 | 0.0011 | No |
Latin America includes Central America and Caribbean — not just South America:
| Region | Countries | n |
|---|---|---|
| Latin America | 23 countries (incl. Mexico, Cuba, etc.) | 23 |
| South America | 12 countries (Argentina to Venezuela) | 12 |
| South America (n=12) | EU (n=27) | |
|---|---|---|
| Mean | 62.965 | 67.114 |
| SD | 4.17 | 5.55 |
| Result | S. America vs. EU | Latam vs. EU |
|---|---|---|
| 95% CI | [-7.45, -0.85] | [-8.24, -2.16] |
| CI width | 6.6 | 6.09 |
| p-value | 0.0156 | 0.001232 |
Figure 5
| Statistic | Latin America | EU |
|---|---|---|
| Mean urb. | 59.018 | 67.138 |
| SD | 16.3 | 13.12 |
| Result | Value |
|---|---|
| t-statistic | -1.9176 |
| p-value | 0.06195 |
p = 0.062 > 0.05: fail to reject \(H_0\). No significant difference in urbanization.
| \(H_0\) True | \(H_0\) False | |
|---|---|---|
| Reject \(H_0\) | Type I Error (\(\alpha\)) | Correct |
| Retain \(H_0\) | Correct | Type II Error (\(\beta\)) |
| Criterion | Z-Test | T-Test |
|---|---|---|
| Pop. variance | Known | Unknown |
| Sample size | Large (n > 30) | Any (esp. small) |
| Distribution | Normal | Student’s t |
| Practice | Rarely used | Frequently used |
Social data typically come from samples, not populations, so the t-test is the standard tool.
\[CI = \bar{X} \pm t \times \frac{s_x}{\sqrt{n}}\]
Where:
\[CI = 59.018 \pm 2.074 \times \frac{16.3}{\sqrt{23}}\]
\[CI = [51.97,\; 66.07]\]
Our best estimate for mean LATAM urbanization is 59.018, but it plausibly ranges from 51.97 to 66.07.
Figure 6: 95% confidence intervals for regional urbanization means
Figure 7: Urbanization vs. life expectancy across all countries
\[r = \frac{\sum(x_i - \bar{x})(y_i - \bar{y}) / (n-1)}{s_x \cdot s_y}\]
Where:
X = Urbanization
\[\bar{x} = \frac{\sum x_i}{n} = 51.33\]
| Country | Urb. (x) |
|---|---|
| Afghanistan | 18.6118 |
| Albania | 40.4442 |
| Algeria | 52.6092 |
| American Samoa | 79.7625 |
| … | … |
| Zimbabwe | 26.3045 |
| \(\bar{x}\) | 51.33 |
Y = Life Expectancy
\[\bar{y} = \frac{\sum y_i}{n} = 61.259\]
| Country | Urb. (x) | Life Exp. (y) |
|---|---|---|
| Afghanistan | 18.6118 | 45.3833 |
| Albania | 40.4442 | 68.2861 |
| Algeria | 52.6092 | 57.5301 |
| American Samoa | 79.7625 | 68.6375 |
| … | … | … |
| Zimbabwe | 26.3045 | 54.2472 |
| \(\bar{x}\) | 51.33 | |
| \(\bar{y}\) | 61.259 |
\[x_i - \bar{x}\]
| Country | Urb. (x) | \(x_i - \bar{x}\) |
|---|---|---|
| Afghanistan | 18.6118 | 18.612 - 51.33 |
| Albania | 40.4442 | 40.444 - 51.33 |
| Algeria | 52.6092 | 52.609 - 51.33 |
| American Samoa | 79.7625 | 79.762 - 51.33 |
| … | … | … |
| Zimbabwe | 26.3045 | 26.304 - 51.33 |
| \(\bar{x}\) | 51.33 |
\[x_i - \bar{x}\]
| Country | Urb. (x) | \(x_i - \bar{x}\) |
|---|---|---|
| Afghanistan | 18.6118 | -32.7179 |
| Albania | 40.4442 | -10.8855 |
| Algeria | 52.6092 | 1.2795 |
| American Samoa | 79.7625 | 28.4328 |
| … | … | … |
| Zimbabwe | 26.3045 | -25.0252 |
| \(\bar{x}\) | 51.33 |
\[y_i - \bar{y}\]
| Country | Life Exp. (y) | \(y_i - \bar{y}\) |
|---|---|---|
| Afghanistan | 45.3833 | 45.383 - 61.259 |
| Albania | 68.2861 | 68.286 - 61.259 |
| Algeria | 57.5301 | 57.53 - 61.259 |
| American Samoa | 68.6375 | 68.638 - 61.259 |
| … | … | … |
| Zimbabwe | 54.2472 | 54.247 - 61.259 |
| \(\bar{y}\) | 61.259 |
\[y_i - \bar{y}\]
| Country | Life Exp. (y) | \(y_i - \bar{y}\) |
|---|---|---|
| Afghanistan | 45.3833 | -15.8757 |
| Albania | 68.2861 | 7.0271 |
| Algeria | 57.5301 | -3.7289 |
| American Samoa | 68.6375 | 7.3785 |
| … | … | … |
| Zimbabwe | 54.2472 | -7.0118 |
| \(\bar{y}\) | 61.259 |
\[(x_i - \bar{x})^2\]
| Country | \(x_i - \bar{x}\) | \((x_i - \bar{x})^2\) |
|---|---|---|
| Afghanistan | -32.7179 | (-32.7179)² |
| Albania | -10.8855 | (-10.8855)² |
| Algeria | 1.2795 | (1.2795)² |
| American Samoa | 28.4328 | (28.4328)² |
| … | … | … |
| Zimbabwe | -25.0252 | (-25.0252)² |
\[(x_i - \bar{x})^2\]
| Country | \(x_i - \bar{x}\) | \((x_i - \bar{x})^2\) |
|---|---|---|
| Afghanistan | -32.7179 | 1070.4623 |
| Albania | -10.8855 | 118.4943 |
| Algeria | 1.2795 | 1.6372 |
| American Samoa | 28.4328 | 808.4251 |
| … | … | … |
| Zimbabwe | -25.0252 | 626.2598 |
\[(y_i - \bar{y})^2\]
| Country | \(y_i - \bar{y}\) | \((y_i - \bar{y})^2\) |
|---|---|---|
| Afghanistan | -15.8757 | (-15.8757)² |
| Albania | 7.0271 | (7.0271)² |
| Algeria | -3.7289 | (-3.7289)² |
| American Samoa | 7.3785 | (7.3785)² |
| … | … | … |
| Zimbabwe | -7.0118 | (-7.0118)² |
\[(y_i - \bar{y})^2\]
| Country | \(y_i - \bar{y}\) | \((y_i - \bar{y})^2\) |
|---|---|---|
| Afghanistan | -15.8757 | 252.038 |
| Albania | 7.0271 | 49.3797 |
| Algeria | -3.7289 | 13.9047 |
| American Samoa | 7.3785 | 54.4417 |
| … | … | … |
| Zimbabwe | -7.0118 | 49.1656 |
\[\sum_{i=1}^{n}(x_i - \bar{x})^2 = 1070.4623 + 118.4943 + \cdots + 626.2598\]
\[\sum_{i=1}^{215}(x_i - \bar{x})^2 = 1.2512211\times 10^{5}\]
\[\sum_{i=1}^{n}(y_i - \bar{y})^2 = 252.038 + 49.3797 + \cdots + 49.1656\]
\[\sum_{i=1}^{215}(y_i - \bar{y})^2 = 1.5988861\times 10^{4}\]
\[s_x = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}} = \sqrt{\frac{1.2512211\times 10^{5}}{215 - 1}} = \sqrt{\frac{1.2512211\times 10^{5}}{214}}\]
\[s_x = 24.18021\]
\[s_y = \sqrt{\frac{\sum(y_i - \bar{y})^2}{n-1}} = \sqrt{\frac{1.5988861\times 10^{4}}{215 - 1}} = \sqrt{\frac{1.5988861\times 10^{4}}{214}}\]
\[s_y = 8.64374\]
\[(x_i - \bar{x})(y_i - \bar{y})\]
| Country | \(x_i - \bar{x}\) | \(y_i - \bar{y}\) | \((x_i-\bar{x})(y_i-\bar{y})\) |
|---|---|---|---|
| Afghanistan | -32.7179 | -15.8757 | (-32.7179)(-15.8757) |
| Albania | -10.8855 | 7.0271 | (-10.8855)(7.0271) |
| Algeria | 1.2795 | -3.7289 | (1.2795)(-3.7289) |
| American Samoa | 28.4328 | 7.3785 | (28.4328)(7.3785) |
| … | … | … | … |
| Zimbabwe | -25.0252 | -7.0118 | (-25.0252)(-7.0118) |
\[(x_i - \bar{x})(y_i - \bar{y})\]
| Country | \(x_i - \bar{x}\) | \(y_i - \bar{y}\) | \((x_i-\bar{x})(y_i-\bar{y})\) |
|---|---|---|---|
| Afghanistan | -32.7179 | -15.8757 | 519.4201 |
| Albania | -10.8855 | 7.0271 | -76.4933 |
| Algeria | 1.2795 | -3.7289 | -4.7713 |
| American Samoa | 28.4328 | 7.3785 | 209.7904 |
| … | … | … | … |
| Zimbabwe | -25.0252 | -7.0118 | 175.472 |
\[\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y}) = 519.4201 + -76.4933 + \cdots + 175.472\]
\[\sum_{i=1}^{215}(x_i - \bar{x})(y_i - \bar{y}) = 2.819337\times 10^{4}\]
\[r = \frac{\frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{n-1}}{s_x \cdot s_y} = \frac{\frac{2.819337\times 10^{4}}{214}}{24.18021 \times 8.64374}\]
\[r = \frac{131.7447}{209.0076} = 0.6303347\]
| Statistic | Value |
|---|---|
| Pearson \(r\) | 0.6303 |
| t-statistic | 11.85 |
| p-value | <2e-16 |
\(r =\) 0.63: strong positive correlation. Statistically significant at \(\alpha = 0.05\).
| \(|r|\) Range | Interpretation |
|---|---|
| 0.00 – 0.19 | Very weak |
| 0.20 – 0.39 | Weak |
| 0.40 – 0.59 | Moderate |
| 0.60 – 0.79 | Strong |
| 0.80 – 1.00 | Very strong |
Our \(r =\) 0.63 falls in the strong range.
Caveat: These are Cohen-style rules of thumb, not universal laws. What counts as “strong” depends on the field — \(r = 0.3\) may be large in psychology but small in physics.
Ice cream sales and drowning deaths are positively correlated.
Does ice cream cause drowning?
No — summer temperature drives both:
The correlation is real; the causal story is wrong.
Both variables trend upward over time — is the correlation just a shared trend?
%%{init:{'flowchart':{'useMaxWidth':true,'nodeSpacing':60,'rankSpacing':60},'themeVariables':{'fontSize':'22px'}}}%%
flowchart TD
Z["Confounder<br/>(e.g., GDP)"] --> X["Urbanization"]
Z --> Y["Life Expectancy"]
X -. "correlation<br/>(not causation)" .-> Y
%%{init:{'flowchart':{'useMaxWidth':true,'nodeSpacing':60,'rankSpacing':60},'themeVariables':{'fontSize':'22px'}}}%%
flowchart LR
X["Urbanization"] -- "?" --> Y["Life Expectancy"]
Y -- "?" --> X
Using the same dataset, answer these questions:
Popescu (JCU) Statistical Analysis Lecture 7: T-Tests and Correlations