What if we interacted urbanization with some of our other variables?
(1)
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Intercept
48.981***
(1.044)
EU
33.095***
(6.611)
Urbanization
0.233***
(0.019)
EU * Urbanization
-0.456***
(0.097)
Num.Obs.
214
R2
0.464
The coefficient -0.456 not the intercept for EU countries, but rather but the difference in slope for EU countries compared to others.
The intercept for the EU countries: the intercept + EU= 48.981 + (-0.456) = 48.525
Interaction Effects
What if we interacted urbanization with some of our other variables?
(1)
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Intercept
48.981***
(1.044)
EU
33.095***
(6.611)
Urbanization
0.233***
(0.019)
EU * Urbanization
-0.456***
(0.097)
Num.Obs.
214
R2
0.464
Similarly, EU * urbanization = 48.525 is not the slope for urbanization for the EU, but rather the offset in slope for the EU.
Therefore, the slope for urbanization for EU countries is: urbanization + EU * urbanization = 0.233–0.456 = -0.223
Interaction Effects
What if we interacted urbanization with some of our other variables?
(1)
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Intercept
48.981***
(1.044)
EU
33.095***
(6.611)
Urbanization
0.233***
(0.019)
EU * Urbanization
-0.456***
(0.097)
Num.Obs.
214
R2
0.464
Since the slope for urbanization for the rest of the world is 0.233, it means that on average, a one-unit increase in urbanization in any country outside the EU leads to an increase in life expectancy of 0.233 years.
In the EU, by contrast, a one-unit increase in urbanization is associated with a decrease in life expectancy of -0.223 years.
setwd("/Users/bgpopescu/Library/CloudStorage/Dropbox/john_cabot/teaching/research_workshop/lecture10/data/")gdp <-read.csv(file ='./gdp-per-capita-maddison-2020.csv')gdp2 <- gdp %>%group_by(Code) %>%summarize(gdp =mean(GDP.per.capita))# Removing continentsgdp3 <-subset(gdp2, gdp2$Code !="")merged_data3<-left_join(merged_data2, gdp3, by =c("Code"="Code"))merged_data3<-na.omit(merged_data3)#Taking the logmerged_data3$log_gdp<-log(merged_data3$gdp)
Regression Example
Interpretation
This is how we create a professionally-looking table.
Show the code
library("modelsummary")model<-lm(life_expectancy~urb_mean+log_gdp, data=merged_data3)models<-list("DV: Life Expectancy"= model)cm <-c('urb_mean'='Urbanization','log_gdp'='GDP',"(Intercept)"="Intercept")modelsummary(models, stars =TRUE, coef_map = cm, gof_map =c("nobs", "r.squared"))
DV: Life Expectancy
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Urbanization
0.054+
(0.029)
GDP
5.947***
(0.699)
Intercept
7.060
(4.880)
Num.Obs.
164
R2
0.608
Regression Example
Interpretation
This is how we create a professionally-looking plot
Show the code
#Step1: To make readable graphs, we need to standardize our coefficients:# Scaling variables means subtracting the mean of the original variable from the raw value and then divide it by the standard deviation of the original variable.x5_b<-lm(scale(life_expectancy)~scale(urb_mean)+scale(log_gdp),data=merged_data3)#Step2: Tyding your coeficientsresults1_b <-tidy(x5_b)cm <-c('scale(urb_mean)'='Urbanization','scale(log_gdp)'='GDP',"(Intercept)"="Intercept")# Ensure the levels of 'term' match the names of cmresults1_b$term <-factor(results1_b$term, levels =names(cm))graph_results1_b <-ggplot(results1_b, aes(x = estimate, y = term)) +geom_point(position =position_dodge(width =0.4), size =4) +geom_errorbar(aes(xmin = estimate -1.96* std.error, xmax = estimate +1.96* std.error), size =1, width =0,position =position_dodge(width =0.4)) +geom_vline(xintercept =0, color ="black", linetype ="dashed") +scale_y_discrete(labels = cm) +theme_bw() +theme(axis.text.x =element_text(size =16),axis.text.y =element_text(size =16),legend.position =c(1, 0),legend.box.background =element_rect(fill ='white'),legend.background =element_blank(),legend.text =element_text(size =16))graph_results1_b
Regression Example
Interpretation
We can now interpret the coefficients:
Show the code
library("modelsummary")model<-lm(life_expectancy~urb_mean+log_gdp, data=merged_data3)models<-list("DV: Life Expectancy"= model)cm <-c('urb_mean'='Urbanization','log_gdp'='GDP',"(Intercept)"="Intercept")modelsummary(models, stars =TRUE, coef_map = cm, gof_map =c("nobs", "r.squared"))
DV: Life Expectancy
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Urbanization
0.054+
(0.029)
GDP
5.947***
(0.699)
Intercept
7.060
(4.880)
Num.Obs.
164
R2
0.608
Urbanization: Every unit increase in urbanization has a 0.054 increase in life expectancy, holding everything else constant
Log GDP: Every unit increase in log GDP has a 5.947 increase in life expectancy, holding everything else constant
Regression Example
Interpretation
We can now interpret the coefficients:
Show the code
library("modelsummary")model<-lm(life_expectancy~urb_mean+log_gdp, data=merged_data3)models<-list("DV: Life Expectancy"= model)cm <-c('urb_mean'='Urbanization','log_gdp'='GDP',"(Intercept)"="Intercept")modelsummary(models, stars =TRUE, coef_map = cm, gof_map =c("nobs", "r.squared"))
DV: Life Expectancy
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Urbanization
0.054+
(0.029)
GDP
5.947***
(0.699)
Intercept
7.060
(4.880)
Num.Obs.
164
R2
0.608
Note that our independent variable was logged.
The right interpretation is: A 1% increase in GDP per capita has a 5.947/100 = 0.05947 increase in life expectancy.
Multivariate Regressions
R Sq. and Adj. R Sq.
The output provided by R for multiple regression is exactly the same as for bivariate regression
Sometimes we need to be conscious that we are talking about more than one predictor variable
In a multivariate regression, we care about the adjusted R squared
The reason is that the value of the \(R^2\) never decreases no matter the number of variables we add to our regression model.
Even if we are adding redundant variables to the model, the value of R-squared does not decrease.
In the case of the Adjusted R-squared, adding redundant variables to the model reduces the Adjusted R-squared: the Adjusted R-squared can thus be negative: This usually happens when the model fits the data worse than a model with no predictors — meaning it adds noise rather than explanatory power.
Conclusion: What You Should Remember
Binary regression estimates group differences: \[
\beta_1 = \text{mean}(y \mid x = 1) - \text{mean}(y \mid x = 0)
\]
Always check group means and sample sizes first.
Interaction terms allow effects to vary across groups
(e.g. slope/intercept differences for EU vs. non-EU).
Multivariate regression controls for other factors:
coefficients show the effect of one variable holding others constant.
Use Adjusted \(R^2\) to assess model fit when adding predictors.