Call:
lm(formula = life_expectancy ~ urb_mean, data = merged_data2)
Residuals:
Min 1Q Median 3Q Max
-14.6442 -4.2065 -0.5017 5.0702 18.3283
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.69757 1.08225 45.92 <2e-16 ***
urb_mean 0.22529 0.01906 11.82 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.742 on 212 degrees of freedom
Multiple R-squared: 0.3971, Adjusted R-squared: 0.3943
F-statistic: 139.7 on 1 and 212 DF, p-value: < 2.2e-16
Regression Example
Interpretation
This is how we create a professionally-looking table.
Show the code
library("modelsummary")model<-lm(life_expectancy~urb_mean, data=merged_data2)models<-list("DV: Life Expectancy"= model)cm <-c('urb_mean'='Urbanization',"(Intercept)"="Intercept")modelsummary(models, stars =TRUE, coef_map = cm, gof_map =c("nobs", "r.squared"))
DV: Life Expectancy
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Urbanization
0.225***
(0.019)
Intercept
49.698***
(1.082)
Num.Obs.
214
R2
0.397
Regression Example
Interpretation
This is how we create a professionally-looking plot.
Show the code
library(broom)model_scaled<-lm(scale(life_expectancy)~scale(urb_mean), data=merged_data2)results<-tidy(model_scaled)cm <-c('scale(urb_mean)'='Urbanization',"(Intercept)"="Intercept")ggplot(results, aes(x = estimate, y = term)) +geom_point()+geom_errorbar(aes(xmin = estimate-1.96*std.error, xmax = estimate+1.96*std.error), linewidth =1, width=0)+scale_y_discrete(labels = cm) +# this maps term names to nicer labelstheme_bw()+theme(axis.text =element_text(size =14),axis.title =element_text(size =14),plot.title =element_text(hjust =0.5))
Regression Example
Interpretation: The Slope (b)
Show the code
library("modelsummary")model<-lm(life_expectancy~urb_mean, data=merged_data2)models<-list("DV: Life Expectancy"= model)cm <-c('urb_mean'='Urbanization',"(Intercept)"="Intercept")modelsummary(models, stars =TRUE, coef_map = cm, gof_map =c("nobs", "r.squared"))
DV: Life Expectancy
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Urbanization
0.225***
(0.019)
Intercept
49.698***
(1.082)
Num.Obs.
214
R2
0.397
Interpretation: For every 1 percentage point increase in urbanization (urb_mean), the model predicts an average increase of approximately 0.225 in life expectancy. This relationship is statistically significant at the 0.1% level (p < 0.001), as indicated by the *** in the regression output.
Regression Example
Interpretation: The Standard Error (SE)
Show the code
library("modelsummary")model<-lm(life_expectancy~urb_mean, data=merged_data2)models<-list("DV: Life Expectancy"= model)cm <-c('urb_mean'='Urbanization',"(Intercept)"="Intercept")modelsummary(models, stars =TRUE, coef_map = cm, gof_map =c("nobs", "r.squared"))
DV: Life Expectancy
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Urbanization
0.225***
(0.019)
Intercept
49.698***
(1.082)
Num.Obs.
214
R2
0.397
SE for Urbanization = 0.019. This tells you how much the estimate would vary across different samples. The smaller it is, the more precise your estimate.
Regression Example
Interpretation: The Intercept (a)
Show the code
library("modelsummary")model<-lm(life_expectancy~urb_mean, data=merged_data2)models<-list("DV: Life Expectancy"= model)cm <-c('urb_mean'='Urbanization',"(Intercept)"="Intercept")modelsummary(models, stars =TRUE, coef_map = cm, gof_map =c("nobs", "r.squared"))
DV: Life Expectancy
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Urbanization
0.225***
(0.019)
Intercept
49.698***
(1.082)
Num.Obs.
214
R2
0.397
Interpretation: Intercept = 49.698 When urbanization is 0%, the predicted life expectancy is 49.698 years. This is not always meaningful in practice (e.g., no country has exactly 0% urbanization), but it’s mathematically necessary.
Regression Example
Interpretation: R Squared
The \(R^2\) is 0.397
It tells us the variance explained in our outcome variable by our predictor variable(s).
The interpretation is: “Our model explains 39.7% of the variance in our outcome variable.”
R squared ranges from 0 to 1: \(R\in [0,1]\)
R sq = 0 means no variance is explained
R sq = 1 means that all variance is explained
Notation
We now have the following equation:
\[
\hat{Y} = bX + a
\]
This can also be written as:
\[
\hat{y} = \hat{\alpha} + \hat{\beta}_1 x + \epsilon
\]
\[
\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x + \epsilon
\]
Where:
\(a = \alpha = \beta_0\)
\(b = \beta_1\)
\(\epsilon\) = error term
Conclusion
Correlation quantifies the strength and direction of a linear relationship between two variables.
Regression models this relationship to make predictions.
The slope (\(\beta_1\)) tells us the expected change in the outcome for a one-unit change in the predictor.
The intercept (\(\beta_0\)) gives the expected value when the predictor is zero.
\(R^2\) measures how well the model explains variation in the outcome.
Correlation and regression do not imply causation.