```
library("broom")
library("dplyr")
library("ggplot2")
library("ggrepel")
library("modelsummary")
library("ggdag")
library("purrr")
library("stringr")
library("ggpubr")
# For Maps
library("sf")
library("stars")
library("rnaturalearth")
library("rnaturalearthdata")
library("ggspatial")
library("gganimate")
library("lemon")
# Removing previous files and setting the
# directory
rm(list = ls())
setwd("/Users/bgpopescu/Dropbox/john_cabot/teaching/stats/week12/lab/")
```

# Lab 12: Statistics

Analysis of Diff-in-Diff Data

# 1 Introduction

In this hypothetical situation (the data is made up), the World Bank is planning on launching a program designed to reduce the risk of malaria in Malawi by providing insecticide-treated bed nets. So they provided insecticide-treated bed nets only to city B from 2017 to 2020. The World Bank selected 24 individuals (over 3 years) from city B and they want to investigate whether receiving such nets has any effect on people’s risk of malaria.

In this document, we want to calculate \(\beta_3\) from:

\[ Y_{it} = \beta_0 + \beta_1 \text{Group}_i + \beta_2 \text{Time}_t +\beta_3 (\text{Group}_i \times \text{Time}_t) + \epsilon_{it} \]

where:

Group = 1 if this is the treatment group

Time = 1 if this is the period after intervention

\(\beta_0\) - mean of the control group in the pre-treatment period

\(\beta_1\) - the change in outcome across groups

\(\beta_2\) - the change in outcome within groups.

\(\beta_3\) - the differences in differences

In our case, this is equivalent to:

\[ \text{Malaria Risk}_{it} = \beta_0 + \beta_1 \text{City B}_i + \beta_2 \text{Post-2017}_t +\beta_3 (\text{City B}_i \times \text{Post-2017}_t) + \epsilon_{it} \]

**Preliminaries**: Create a folder within the relevant week called “lab”. Within it, create another folder called “data” and another one, called “graphs”.

In a first instance, we want to draw a causal model which explains the effect of the insecticide-treated bed nets on participant risk of malaria, given the time and group. See the DAG models below (together with the code to produce them).

## Step 1 Code

```
<- dagify(malaria_risk ~ net + year + city,
malaria_dag ~ year + city,
net exposure = "net",
outcome = "malaria_risk",
labels = c(malaria_risk = "Malaria Risk",
net = "Mosquito Net",
year = "Year",
city = "City",
pre_malaria_risk = "Pre Malaria Risk"),
coords = list(x = c(malaria_risk = 3, net = 1, year = 2, city = 2),
y = c(malaria_risk = 2, net = 2, year = 3, city = 1)))
<-data.frame(tidy_dagitty(malaria_dag))
bigger_dag $type[bigger_dag$name=="malaria_risk"]<-"Outcome"
bigger_dag$type[bigger_dag$name=="net"]<-"Intervention"
bigger_dag<-min(bigger_dag$x)
min_lon_x<-max(bigger_dag$x)
max_lon_x<-min(bigger_dag$y)
min_lat_y<-max(bigger_dag$y) max_lat_y
```

## Step 2 Code

```
= c("Outcome"="green3",
col "Intervention"="red",
"Confounder"="grey60")
<-c("Outcome", "Intervention", "Confounder")
order_col
<-ggplot(data = bigger_dag, aes(x = x, y = y, xend = xend, yend = yend, color=type)) +
example_daggeom_dag_point() +
geom_dag_edges() +
coord_sf(xlim = c(min_lon_x-1, max_lon_x+1), ylim = c(min_lat_y-1, max_lat_y+1))+
scale_colour_manual(values = col,
name = "Group",
breaks = order_col)+
geom_label_repel(data = subset(bigger_dag, !duplicated(bigger_dag$label)),
aes(label = label), fill = alpha(c("white"),0.8))+
theme_bw()+
labs(x = "", y="")
#We now save the graph in a folder called "graphs"
ggsave(example_dag, file = "./graphs/example_dag1.jpg",
height = 20, width = 25,
units = "cm", dpi = 200)
example_dag
```