Lab 12: Statistics

Analysis of Diff-in-Diff Data

Author

Bogdan G. Popescu

1 Introduction

In this hypothetical situation (the data is made up), the World Bank is planning on launching a program designed to reduce the risk of malaria in Malawi by providing insecticide-treated bed nets. So they provided insecticide-treated bed nets only to city B from 2017 to 2020. The World Bank selected 24 individuals (over 3 years) from city B and they want to investigate whether receiving such nets has any effect on people’s risk of malaria.

In this document, we want to calculate \(\beta_3\) from:

\[ Y_{it} = \beta_0 + \beta_1 \text{Group}_i + \beta_2 \text{Time}_t +\beta_3 (\text{Group}_i \times \text{Time}_t) + \epsilon_{it} \]

where:
Group = 1 if this is the treatment group
Time = 1 if this is the period after intervention
\(\beta_0\) - mean of the control group in the pre-treatment period
\(\beta_1\) - the change in outcome across groups
\(\beta_2\) - the change in outcome within groups.
\(\beta_3\) - the differences in differences

In our case, this is equivalent to:

\[ \text{Malaria Risk}_{it} = \beta_0 + \beta_1 \text{City B}_i + \beta_2 \text{Post-2017}_t +\beta_3 (\text{City B}_i \times \text{Post-2017}_t) + \epsilon_{it} \]

Preliminaries: Create a folder within the relevant week called “lab”. Within it, create another folder called “data” and another one, called “graphs”.

library("broom")
library("dplyr")
library("ggplot2")
library("ggrepel")
library("modelsummary")
library("ggdag")
library("purrr")
library("stringr")
library("ggpubr")

# For Maps
library("sf")
library("stars")
library("rnaturalearth")
library("rnaturalearthdata")
library("ggspatial")
library("gganimate")
library("lemon")


# Removing previous files and setting the
# directory
rm(list = ls())
setwd("/Users/bgpopescu/Dropbox/john_cabot/teaching/stats/week12/lab/")

In a first instance, we want to draw a causal model which explains the effect of the insecticide-treated bed nets on participant risk of malaria, given the time and group. See the DAG models below (together with the code to produce them).

Step 1 Code
malaria_dag <- dagify(malaria_risk ~ net + year + city,
                     net ~ year + city,
                     exposure = "net",
                     outcome = "malaria_risk",
                     labels = c(malaria_risk = "Malaria Risk",
                                net = "Mosquito Net",
                                year = "Year",
                                city = "City",
                                pre_malaria_risk = "Pre Malaria Risk"),
                     coords = list(x = c(malaria_risk = 3, net = 1, year = 2, city = 2),
                                   y = c(malaria_risk = 2, net = 2, year = 3, city = 1)))

bigger_dag <-data.frame(tidy_dagitty(malaria_dag))
bigger_dag$type[bigger_dag$name=="malaria_risk"]<-"Outcome"
bigger_dag$type[bigger_dag$name=="net"]<-"Intervention"
min_lon_x<-min(bigger_dag$x)
max_lon_x<-max(bigger_dag$x)
min_lat_y<-min(bigger_dag$y)
max_lat_y<-max(bigger_dag$y)
Step 2 Code
col = c("Outcome"="green3",
        "Intervention"="red",
        "Confounder"="grey60")

order_col<-c("Outcome", "Intervention", "Confounder")


example_dag<-ggplot(data = bigger_dag, aes(x = x, y = y, xend = xend, yend = yend, color=type)) +
  geom_dag_point() +
  geom_dag_edges() +
  coord_sf(xlim = c(min_lon_x-1, max_lon_x+1), ylim = c(min_lat_y-1, max_lat_y+1))+
  scale_colour_manual(values = col,
                      name = "Group",
                      breaks = order_col)+
  geom_label_repel(data = subset(bigger_dag, !duplicated(bigger_dag$label)), 
                   aes(label = label), fill = alpha(c("white"),0.8))+
  theme_bw()+
  labs(x = "", y="")

#We now save the graph in a folder called "graphs"
ggsave(example_dag, file = "./graphs/example_dag1.jpg", 
       height = 20, width = 25, 
       units = "cm", dpi = 200)

example_dag