How can we identify causal effects using observational data?
Some of the ways to conduct causal analysis with observational data, we need to run
differences in differences
regression discontinuity design
RDD
RDD are about arbitrary rules and thresholds determining access to policy programs
subjects are in the program if they have scores above a specific threshold
subject are not in the program if they have scores below a specific threshold
Or the other way around
Running or forcing variable - index that measures eligibility for the program (e.g. scores) Cutoff / cutpoint / threshold - the number that the determines access to the program
RDD
RDD
Student ID
Exam Score
Scholarship Awarded?
First-Year GPA
101
83
No
3.2
102
84
No
3.3
103
85
Yes
3.6
104
86
Yes
3.5
105
87
Yes
3.7
106
84.5
No
3.4
107
85.1
Yes
3.6
RDD
Hypothetical tutoring program
Students take an entrance exam
Those who score 70 or lower get a free tutor for the year
Students then take an exit exam at the end of the year
The way to test for manipulation at the cutoff is the McCrary density test.
This is an example of Manipulation
Show the code
set.seed(123) # for reproducibilitynum_students <-1500# a bit more to have room for thinningtutoring2 <-tibble(id =1:num_students,entrance_exam =rbeta(num_students, shape1 =7, shape2 =2),exit_exam =rbeta(num_students, shape1 =5, shape2 =3)) |>mutate(entrance_exam = entrance_exam *100) |># 🔥 Add manipulation: fewer people just above the cutofffilter(!(entrance_exam >70& entrance_exam <75&runif(n()) <0.5)) |># randomly drop 50% just above cutoffmutate(# Assignmenttutoring_assignment = entrance_exam <=70,# Fuzzy compliancetutoring_fuzzy =case_when( tutoring_assignment &runif(n()) <0.85~TRUE,!tutoring_assignment &runif(n()) <0.15~TRUE,TRUE~FALSE ),# Outcomeexit_exam = exit_exam *40+10* tutoring_fuzzy + entrance_exam /2,# Labelstutoring_text =factor(tutoring_assignment, levels =c(FALSE, TRUE), labels =c("No tutor", "Tutor")),tutoring_fuzzy_text =factor(tutoring_fuzzy, levels =c(FALSE, TRUE), labels =c("No tutor", "Tutor")),entrance_centered = entrance_exam -70 )library(rddensity)mtest <-rddensity(X = tutoring2$entrance_exam, c=70)man<-rdplotdensity(mtest,X = tutoring2$entrance_exam)$Estplot+geom_vline(xintercept=70, linetype="dashed") +labs(x="CD4 Count", y="Density")
Noncompliance
The logic for noncompliance means that people on the margin of the cutoff might end up in/out of the program.
Show the code
ggplot(tutoring, aes(x = entrance_exam, y = tutoring_fuzzy_text, fill = entrance_exam <=70)) +geom_vline(xintercept =70, linewidth =2, color ="#FFDC00") +geom_point(size =3, pch =21, color ="white", alpha =0.4,position =position_jitter(width =0, height =0.15, seed =1234)) +labs(x ="Entrance exam score", y =NULL) +guides(fill ="none") +scale_fill_manual(values =c("black", "red"), name =NULL) +theme_bw(base_size =28)
Compliance
The logic for noncompliance means that people on the margin of the cutoff might end up in/out of the program.
Show the code
ggplot(tutoring, aes(x = entrance_exam, y = tutoring_text, fill = tutoring_text)) +geom_vline(xintercept =70, linewidth =2, color ="#FFDC00") +geom_point(size =3, pch =21, color ="white", alpha =0.4,position =position_jitter(width =0, height =0.15, seed =1234)) +labs(x ="Entrance exam score", y =NULL) +guides(fill ="none") +scale_fill_manual(values =c("black", "red"), name =NULL) +theme_bw(base_size =28)
Noncompliance
Addressing it
The way to address noncompliance is to use fuzzy discontinuities:
Use an instrument for which side of the cutoff people should be on
Thus, the effect will only be for compliers near the cutoff
Compliance
Person Type
If Assigned to Treatment
If Assigned to Control
Complier
Takes treatment
Does not take treatment
Always-Taker
Takes treatment
Takes treatment
Never-Taker
Does not take treatment
Does not take treatment
Defier
Does not take treatment
Takes treatment
Example
Noncompliance
Addressing it
So, we can use an instrument = above/below cutoff to address non-compliance
The instrument has to meet three important criteria
Relevance
The cutoff changes the likelihood of receiving the treatment.
→ People below the cutoff are more likely to get tutoring.
Exclusion
The cutoff affects the outcome only through the treatment.
→ Crossing the threshold doesn’t directly change the outcome except by changing tutoring status.
Exogeneity
Unobserved confounders between the tutoring program and exit exam scores are unrelated to the cutoff.
Noncompliance
Addressing it
Noncompliance
Addressing it
The way to deal with noncompliance is to use a Fuzzy RD
We can do it in the following way:
rdrobust(y = tutoring$exit_exam, x = tutoring$entrance_exam, c =70, fuzzy = tutoring$tutoring_fuzzy)
Fuzzy RD estimates using local polynomial regression.
Number of Obs. 1000
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 246 754
Eff. Number of Obs. 187 350
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 13.056 13.056
BW bias (b) 19.608 19.608
rho (h/b) 0.666 0.666
Unique Obs. 246 754
First-stage estimates.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional -0.680 0.072 -9.382 0.000 [-0.822 , -0.538]
Robust - - -8.029 0.000 [-0.869 , -0.528]
=============================================================================
Treatment effect estimates.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 11.955 2.030 5.889 0.000 [7.976 , 15.934]
Robust - - 4.940 0.000 [7.450 , 17.250]
=============================================================================
Noncompliance
Addressing it
Show the code
# Uniform# Run rdrobust with all resultsrd1 <-rdrobust(y = tutoring$exit_exam, x = tutoring$entrance_exam, c =70,fuzzy = tutoring$tutoring_fuzzy,kernel ="Uniform")# Create a named list of models (each method with its estimate and se)models <-list("Conventional"= rd1$Estimate[1],"Bias-Corrected"= rd1$Estimate[2],"Robust Bias-Corrected"= rd1$Estimate[3])ses <-list("Conventional"= rd1$se[1],"Bias-Corrected"= rd1$se[2],"Robust Bias-Corrected"= rd1$se[3])# Build a modelsummary-style data framelibrary(tibble)results1 <- tibble::tibble(Statistic =c("Estimate", "Std. Error", "Kernel", "Obs.", "Bandwidth"),`Conventional`=c(sprintf("%.3f", models$Conventional),sprintf("%.3f", ses$Conventional), rd1$kernel, # keep as-is (a string)sum(rd1$N),sprintf("%.3f",rd1$bws[1]) ),`Bias-Corrected`=c(sprintf("%.3f", models$`Bias-Corrected`),sprintf("%.3f", ses$`Bias-Corrected`), rd1$kernel,sum(rd1$N),sprintf("%.3f",rd1$bws[2]) ),`Robust Bias-Corrected`=c(sprintf("%.3f", models$`Robust Bias-Corrected`),sprintf("%.3f", ses$`Robust Bias-Corrected`), rd1$kernel,sum(rd1$N),sprintf("%.3f",rd1$bws[2]) ))#Triangular# Run rdrobust with all resultsrd2 <-rdrobust(y = tutoring$exit_exam, x = tutoring$entrance_exam, c =70,fuzzy = tutoring$tutoring_fuzzy,kernel ="Triangular")# Create a named list of models (each method with its estimate and se)models <-list("Conventional"= rd2$Estimate[1],"Bias-Corrected"= rd2$Estimate[2],"Robust Bias-Corrected"= rd2$Estimate[3])ses <-list("Conventional"= rd2$se[1],"Bias-Corrected"= rd2$se[2],"Robust Bias-Corrected"= rd2$se[3])# Build a modelsummary-style data framelibrary(tibble)results2 <- tibble::tibble(Statistic =c("Estimate", "Std. Error", "Kernel", "Obs.", "Bandwidth"),`Conventional`=c(sprintf("%.3f", models$Conventional),sprintf("%.3f", ses$Conventional), rd2$kernel, # keep as-is (a string)sum(rd2$N),sprintf("%.3f",rd2$bws[1]) ),`Bias-Corrected`=c(sprintf("%.3f", models$`Bias-Corrected`),sprintf("%.3f", ses$`Bias-Corrected`), rd2$kernel,sum(rd2$N),sprintf("%.3f",rd2$bws[2]) ),`Robust Bias-Corrected`=c(sprintf("%.3f", models$`Robust Bias-Corrected`),sprintf("%.3f", ses$`Robust Bias-Corrected`), rd2$kernel,sum(rd2$N),sprintf("%.3f",rd2$bws[2]) ))#Epanechnikov# Run rdrobust with all resultsrd3 <-rdrobust(y = tutoring$exit_exam, x = tutoring$entrance_exam, c =70,fuzzy = tutoring$tutoring_fuzzy,kernel ="Epanechnikov")# Create a named list of models (each method with its estimate and se)models <-list("Conventional"= rd3$Estimate[1],"Bias-Corrected"= rd3$Estimate[2],"Robust Bias-Corrected"= rd3$Estimate[3])ses <-list("Conventional"= rd3$se[1],"Bias-Corrected"= rd3$se[2],"Robust Bias-Corrected"= rd3$se[3])# Build a modelsummary-style data framelibrary(tibble)results3 <- tibble::tibble(Statistic =c("Estimate", "Std. Error", "Kernel", "Obs.", "Bandwidth"),`Conventional`=c(sprintf("%.3f", models$Conventional),sprintf("%.3f", ses$Conventional), rd3$kernel, # keep as-is (a string)sum(rd1$N),sprintf("%.3f",rd3$bws[1]) ),`Bias-Corrected`=c(sprintf("%.3f", models$`Bias-Corrected`),sprintf("%.3f", ses$`Bias-Corrected`), rd3$kernel,sum(rd1$N),sprintf("%.3f",rd3$bws[2]) ),`Robust Bias-Corrected`=c(sprintf("%.3f", models$`Robust Bias-Corrected`),sprintf("%.3f", ses$`Robust Bias-Corrected`), rd3$kernel,sum(rd3$N),sprintf("%.3f",rd3$bws[2]) ))# Create a separator rowseparator_fs <- tibble::tibble(Statistic ="<b>First stage</b>",Conventional ="", `Bias-Corrected`="", `Robust Bias-Corrected`="")separator_ss <- tibble::tibble(Statistic ="<b>Treatment effect estimates</b>",Conventional ="", `Bias-Corrected`="", `Robust Bias-Corrected`="")separator_uniform <- tibble::tibble(Statistic ="<b>Uniform Kernel</b>",Conventional ="", `Bias-Corrected`="", `Robust Bias-Corrected`="")separator_triangular <- tibble::tibble(Statistic ="<b>Triangular Kernel</b>",Conventional ="", `Bias-Corrected`="", `Robust Bias-Corrected`="")separator_epanechnikov <- tibble::tibble(Statistic ="<b>Epanechnikov Kernel</b>",Conventional ="", `Bias-Corrected`="", `Robust Bias-Corrected`="")# First-stage estimatesfirst_stage_rd1 <- tibble::tibble(Statistic =c("Estimate", "Std. Error"),`Conventional`=c(sprintf("%.3f", rd1$tau_T[1]),sprintf("%.3f", rd1$se_T[1])),`Bias-Corrected`=c(sprintf("%.3f", rd1$tau_T[2]),sprintf("%.3f", rd1$se_T[2])),`Robust Bias-Corrected`=c(sprintf("%.3f", rd1$tau_T[3]),sprintf("%.3f", rd1$se_T[3])))# First-stage estimatesfirst_stage_rd2 <- tibble::tibble(Statistic =c("Estimate", "Std. Error"),`Conventional`=c(sprintf("%.3f", rd2$tau_T[1]),sprintf("%.3f", rd2$se_T[1])),`Bias-Corrected`=c(sprintf("%.3f", rd2$tau_T[2]),sprintf("%.3f", rd2$se_T[2])),`Robust Bias-Corrected`=c(sprintf("%.3f", rd2$tau_T[3]),sprintf("%.3f", rd2$se_T[3])))# First-stage estimatesfirst_stage_rd3 <- tibble::tibble(Statistic =c("Estimate", "Std. Error"),`Conventional`=c(sprintf("%.3f", rd3$tau_T[1]),sprintf("%.3f", rd3$se_T[1])),`Bias-Corrected`=c(sprintf("%.3f", rd3$tau_T[2]),sprintf("%.3f", rd3$se_T[2])),`Robust Bias-Corrected`=c(sprintf("%.3f", rd3$tau_T[3]),sprintf("%.3f", rd3$se_T[3])))results_final1 <-rbind( separator_uniform, separator_fs, first_stage_rd1, separator_ss, results1)results_final2 <-rbind(separator_triangular, separator_fs, first_stage_rd2, separator_ss, results2)results_final3 <-rbind(separator_epanechnikov, separator_fs, first_stage_rd3, separator_ss, results3)# Show using modelsummary custom tablemodelsummary::datasummary_df(results_final1, output ="markdown")
Statistic
Conventional
Bias-Corrected
Robust Bias-Corrected
Uniform Kernel
First stage
Estimate
-0.633
-0.625
-0.625
Std. Error
0.092
0.092
0.110
Treatment effect estimates
Estimate
18.112
18.512
4.144
Std. Error
4.144
4.144
4.984
Kernel
Uniform
Uniform
Uniform
Obs.
1000
1000
1000
Bandwidth
6.123
10.533
10.533
Noncompliance
Addressing it
Show the code
# Show using modelsummary custom tablemodelsummary::datasummary_df(results_final2, output ="markdown")
Statistic
Conventional
Bias-Corrected
Robust Bias-Corrected
Triangular Kernel
First stage
Estimate
-0.680
-0.699
-0.699
Std. Error
0.072
0.072
0.087
Treatment effect estimates
Estimate
16.644
16.644
2.899
Std. Error
2.899
2.899
3.546
Kernel
Triangular
Triangular
Triangular
Obs.
1000
1000
1000
Bandwidth
13.090
19.608
19.608
Noncompliance
Addressing it
Show the code
# Show using modelsummary custom tablemodelsummary::datasummary_df(results_final3, output ="markdown")
Statistic
Conventional
Bias-Corrected
Robust Bias-Corrected
Epanechnikov Kernel
First stage
Estimate
-0.660
-0.672
-0.672
Std. Error
0.077
0.077
0.092
Treatment effect estimates
Estimate
17.159
17.440
3.214
Std. Error
3.214
3.214
3.902
Kernel
Epanechnikov
Epanechnikov
Epanechnikov
Obs.
1000
1000
1000
Bandwidth
10.779
16.937
16.937
Summary: Regression Discontinuity Design
Goal:
Estimate local causal effect of treatment at a known cutoff using a continuous running variable.
Key Assumptions:
- Continuity: Potential outcomes \(Y(0)\) and \(Y(1)\) change smoothly at the cutoff.
- No Precise Manipulation: Units cannot perfectly sort just above or below the cutoff.
Diagnostics & Robustness:
- McCrary Test: Check for manipulation in the running variable’s density.
- Covariate Balance: No jumps in pre-treatment covariates at cutoff.
- Placebo Tests: No effects at fake cutoffs.
Interpretation:
- Estimate is Local Average Treatment Effect (LATE) at the cutoff.
- RDD is valid only near the cutoff—not population-wide.