How can we identify causal effects using observational data?
Some ways to estimate causal effects with observational data include:
differences in differences
regression discontinuity design
RDD
RDD leverages deterministic rules/thresholds that create as-if random assignment near a cutoff.
subjects are in the program if they have scores below a specific threshold
subject are not in the program if they have scores above a specific threshold
Or the other way around
Running or forcing variable - index that measures eligibility for the program (e.g. scores) Cutoff / cutpoint / threshold - the number that determines access
Sharp vs. Fuzzy RD
Sharp RD: Everyone follows the rule exactly (below 70 = gets the Tutoring Program). Fuzzy RD: Some people don’t follow it perfectly (someone below 70 might not apply).
RDD
Key assumption: Potential outcomes (Y(0), Y(1)) are continuous in the running variable at the cutoff.
In other words: People just above and just below are basically alike.
If the program didn’t exist, their average outcomes would change smoothly with the score—no sudden jump right at the threshold.
RDD
Student ID
Entrance Score
Tutor Assigned?
Exit Exam (illustrative)
201
68.0
Yes
74.0
202
69.5
Yes
74.8
203
70.0
Yes
75.0
204
70.2
No
65.1
205
71.0
No
65.5
206
72.3
No
66.2
RDD
Hypothetical tutoring program
Students take an entrance exam
Those who score 70 or lower get a free tutor for the year
Students then take an exit exam at the end of the year
RDD
Hypothetical tutoring program
Show the code
# Fake program dataset.seed(1234) # makes the random numbers repeatable (so results are the same each time)num_students <-1000tutoring <-tibble(id =1:num_students, # create 1000 students, each with a unique id# entrance exam scores: drawn from a beta distribution (skewed toward higher scores)entrance_exam =rbeta(num_students, shape1 =7, shape2 =2),# exit exam base scores: drawn from another beta distributionexit_exam =rbeta(num_students, shape1 =5, shape2 =3)) |>mutate(# scale entrance exam from 0–1 to 0–100 to make it look like a real exam scoreentrance_exam = entrance_exam *100,# assign tutoring: students with entrance exam ≤ 70 get a tutortutoring = entrance_exam <=70 ) |>mutate(# create final exit exam scores:# start with a base (exit_exam * 40),# add +10 points if the student had tutoring,# and add a benefit from prior knowledge (half of entrance score)exit_exam = exit_exam *40+10* tutoring + entrance_exam /2 ) |>mutate(# introduce some “fuzziness” (real life isn’t perfect!)# for students with entrance exam between 60 and 80,# randomly decide whether they got tutoring or nottutoring_fuzzy =ifelse( entrance_exam >60& entrance_exam <80,sample(c(TRUE, FALSE), n(), replace =TRUE), tutoring ) ) |>mutate(# create nice text labels for charts and tablestutoring_text =factor( tutoring,levels =c(FALSE, TRUE),labels =c("No tutor", "Tutor") ),tutoring_fuzzy_text =factor( tutoring_fuzzy,levels =c(FALSE, TRUE),labels =c("No tutor", "Tutor") ) ) |>mutate(# center the entrance exam around 70 (the cutoff)# so 0 = exactly at the tutoring ruleentrance_centered = entrance_exam -70 )
Show the code
ggplot(tutoring, aes(x = entrance_exam, y = tutoring_text, fill = tutoring_text)) +geom_vline(xintercept =70, linewidth =2, color ="#FFDC00") +geom_point(size =3, pch =21, color ="white", alpha =0.4,position =position_jitter(width =0, height =0.15, seed =1234)) +labs(x ="Entrance exam score", y =NULL) +guides(fill ="none") +scale_fill_manual(values =c("black", "red"), name =NULL) +theme_bw(base_size =28)
Intuition
Hypothetical tutoring program
The people right before and right after the threshold are very similar
The idea is that we can measure the effect of the program by examining students just around the threshold
Show the code
ggplot(tutoring, aes(x = entrance_exam, y = tutoring_text, fill = tutoring_text)) +geom_vline(xintercept =70, linewidth =2, color ="#FFDC00") +annotate(geom ="rect", fill ="grey50", alpha =0.25, ymin =-Inf, ymax =Inf,xmin =70-5, xmax =70+5) +annotate(geom ="rect", fill ="grey50", alpha =0.5, ymin =-Inf, ymax =Inf,xmin =70-2, xmax =70+2) +geom_point(size =3, pch =21, color ="white", alpha =0.4,position =position_jitter(width =0, height =0.15, seed =1234)) +labs(x ="Entrance exam score", y =NULL) +guides(fill ="none") +scale_fill_manual(values =c("black", "red"), name =NULL) +theme_bw(base_size =28)
Intuition
Hypothetical tutoring program
The people right before and right after the threshold are very similar
The idea is that we can measure the effect of the program by examining students just around the threshold.
Show the code
ggplot(tutoring, aes(x = entrance_exam, y = tutoring_text, fill = tutoring_text)) +geom_vline(xintercept =70, linewidth =2, color ="#FFDC00") +annotate(geom ="rect", fill ="grey50", alpha =0.25, ymin =-Inf, ymax =Inf,xmin =70-5, xmax =70+5) +annotate(geom ="rect", fill ="grey50", alpha =0.5, ymin =-Inf, ymax =Inf,xmin =70-2, xmax =70+2) +geom_point(size =3, pch =21, color ="white", alpha =0.4,position =position_jitter(width =0, height =0.15, seed =1234)) +labs(x ="Entrance exam score", y =NULL) +guides(fill ="none") +scale_fill_manual(values =c("black", "red"), name =NULL) +theme_bw(base_size =28)+coord_cartesian(xlim =c(70-5, 70+5))
Intuition
Hypothetical tutoring program
Relationship between entrance and exit exam scores, by tutoring status.
Show the code
ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +geom_point(size =3, pch =21, color ="white", alpha =0.4)+geom_vline(xintercept =70, size =2, color ="#FFDC00") +labs(x ="Entrance exam score", y ="Exit exam score") +#guides(fill = "none") +guides(fill =guide_legend(reverse =TRUE)) +scale_fill_manual(values =c("black", "red"), name =NULL) +theme_bw(base_size =28)+coord_cartesian(xlim =c(30, 100),ylim =c(40, 90))
Intuition
Hypothetical tutoring program
Relationship between entrance and exit exam scores, by tutoring status.
Relationship between entrance and exit exam scores, by tutoring status.
Show the code
library("broom")tutoring <- tutoring |>mutate(tutoring =relevel(factor(tutoring), ref ="TRUE")) # "TRUE" = treated is referencerdd_tutoring <-lm(exit_exam ~ entrance_centered + tutoring, data = tutoring) |>tidy()effect_control <-filter(rdd_tutoring, term =="(Intercept)")$estimatelate <-filter(rdd_tutoring, term =="tutoringFALSE")$estimateeffect_treatment <- effect_control + lateggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +geom_point(size =3, pch =21, color ="white", alpha =0.4) +geom_vline(xintercept =70, size =2, color ="#FFDC00") +labs(x ="Entrance exam score", y ="Exit exam score") +#guides(fill = "none") +guides(fill =guide_legend(reverse =TRUE)) +geom_smooth(data =filter(tutoring, entrance_exam <=70),method ="lm",color="red",#aes(color = tutoring_text),size =2, show.legend =FALSE) +geom_smooth(data =filter(tutoring, entrance_exam >70),method ="lm", color="black",#aes(color = tutoring_text),size =2, show.legend =FALSE) +scale_fill_manual(values =c("black", "red"), name =NULL) +scale_color_manual(values =c("black", "red")) +theme_bw(base_size =28)+annotate(geom ="segment", x =70, xend =70, y = effect_control, yend = effect_treatment,size =4, color ="red") +annotate(geom ="label", x =75, y = effect_treatment - (late /2),label ="δ\n(causal\n effect)",size=5,color ="red", fill =alpha(c("white"),0.8))+coord_cartesian(xlim =c(30, 100), ylim =c(40, 90))
Intuition
Hypothetical tutoring program
The \(\delta\) measures the gap in outcome for people on both sides of the cutpoint.
Sharp RD, \(\delta\) - local average treatment effect (ATE) at the cutoff. Fuzzy RD, \(\delta\) - local average treatment effect for compliers (LATE) at the cutoff.
Intuition
Hypothetical tutoring program
The size of the gap depends on how you fit the functions on each side of the cutoff
The type of lines you choose can change the estimate of \(\delta\)
parametric vs. non-parametric
bandwidths
kernels
Intuition
Hypothetical tutoring program
Relationship between entrance and exit exam scores, by tutoring status.
Show the code
ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +geom_point(size =3, pch =21, color ="white", alpha =0.4) +geom_vline(xintercept =70, size =2, color ="#FFDC00") +labs(x ="Entrance exam score", y ="Exit exam score") +#guides(fill = "none") +guides(fill =guide_legend(reverse =TRUE)) +geom_smooth(data =filter(tutoring, entrance_exam <=70),method ="lm",color="red",#aes(color = tutoring_text),size =2, show.legend =FALSE) +geom_smooth(data =filter(tutoring, entrance_exam >70),method ="lm", color="black",#aes(color = tutoring_text),size =2, show.legend =FALSE) +scale_fill_manual(values =c("black", "red"), name =NULL) +scale_color_manual(values =c("black", "red")) +theme_bw(base_size =28)+annotate(geom ="segment", x =70, xend =70, y = effect_control, yend = effect_treatment,size =4, color ="red") +annotate(geom ="label", x =75, y = effect_treatment - (late /2),label ="δ\n(causal\n effect)",size=5,color ="red", fill =alpha(c("white"),0.8))+coord_cartesian(xlim =c(30, 100), ylim =c(40, 90))
Intuition
Hypothetical tutoring program
Relationship between entrance and exit exam scores, by tutoring status.
Show the code
program_data <- tutoring %>%mutate(entrance_centered = entrance_exam -70)model1 <-lm(exit_exam ~ entrance_centered + tutoring,data = program_data)x<-tidy(model1)late<-round(x$estimate[x$term=="tutoringFALSE"],2)const<-round(x$estimate[x$term=="(Intercept)"],2)ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +geom_point(size =3, pch =21, color ="white", alpha =0.4) +geom_vline(xintercept =70, size =2, color ="#FFDC00") +labs(x ="Entrance exam score", y ="Exit exam score") +#guides(fill = "none") +guides(fill =guide_legend(reverse =TRUE)) +geom_smooth(data =filter(tutoring, entrance_exam <=70),method ="lm",color="red",#aes(color = tutoring_text),size =2, show.legend =FALSE) +geom_smooth(data =filter(tutoring, entrance_exam >70),method ="lm", color="black",#aes(color = tutoring_text),size =2, show.legend =FALSE) +scale_fill_manual(values =c("black", "red"), name =NULL) +scale_color_manual(values =c("black", "red")) +theme_bw(base_size =28)+annotate(geom ="segment", x =70, xend =70, y = effect_control, yend = effect_treatment,size =4, color ="red") +annotate(geom ="label", x =75, y = const + (late /2),label =paste("δ=", late),size=5,color ="red", fill =alpha(c("white"),0.8))+coord_cartesian(xlim =c(30, 100), ylim =c(40, 90))
Intuition
Hypothetical tutoring program
Relationship between entrance and exit exam scores, by tutoring status.
Show the code
# 1. Fit cubic modelsfit_control <-lm(exit_exam ~poly(entrance_exam, 3, raw =TRUE),data =filter(tutoring, entrance_exam <70))fit_treat <-lm(exit_exam ~poly(entrance_exam, 3, raw =TRUE),data =filter(tutoring, entrance_exam >=70))# 2. Get tidy versionstidy_control <-tidy(fit_control)tidy_treat <-tidy(fit_treat)cutoff <-70effect_control <-predict(fit_control, newdata =data.frame(entrance_exam = cutoff))effect_treatment <-predict(fit_treat, newdata =data.frame(entrance_exam = cutoff))delta <-round(effect_treatment-effect_control, 2)# Create prediction data for plottingx_vals <-tibble(entrance_exam =seq(30, 100, length.out =300))preds <- x_vals %>%mutate(exit_exam =ifelse( entrance_exam <70,predict(fit_control, newdata = x_vals),predict(fit_treat, newdata = x_vals) ) )ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +geom_point(size =3, pch =21, color ="white", alpha =0.4) +geom_line(data = preds, aes(x = entrance_exam, y = exit_exam),inherit.aes =FALSE, color ="black", size =1.5) +geom_vline(xintercept =70, color ="#FFDC00", size =2) +theme_bw(base_size =28)+scale_fill_manual(values =c("Tutor"="red", "No tutor"="black"), name =NULL,guide =guide_legend(reverse =TRUE))+labs(x ="Entrance exam score", y ="Exit exam score")+annotate(geom ="label", x =75, y = const + (late /2),label =paste("δ=", delta),size=5,color ="red", fill =alpha(c("white"),0.8))+coord_cartesian(xlim =c(30, 100), ylim =c(40, 90))+guides(fill =guide_legend(reverse =TRUE))
Intuition
Hypothetical tutoring program
Relationship between entrance and exit exam scores, by tutoring status.
Show the code
library(dplyr)# Add sine-transformed variabletutoring <- tutoring |>mutate(sin_term =sin(0.2* entrance_exam))# Control group: entrance_exam < 70fit_trig_control <-nls( exit_exam ~ a + b * entrance_exam + c *sin(d * entrance_exam),data =filter(tutoring, entrance_exam <70),start =list(a =55, b =0.4, c =15, d =0.4))# Treatment group: entrance_exam >= 70fit_trig_treat <-nls( exit_exam ~ a + b * entrance_exam + c *sin(d * entrance_exam),data =filter(tutoring, entrance_exam >=70),start =list(a =55, b =0.4, c =15, d =0.4))x_vals <-tibble(entrance_exam =seq(30, 100, length.out =300))preds_trig <- x_vals |>mutate(exit_exam =ifelse( entrance_exam <70,predict(fit_trig_control, newdata = x_vals),predict(fit_trig_treat, newdata = x_vals) ) )cutoff <-70# Create a new data frame for x = 70cutoff_df <-data.frame(entrance_exam = cutoff)# Predicted outcome from each modeleffect_control <-predict(fit_trig_control, newdata = cutoff_df)effect_treatment <-predict(fit_trig_treat, newdata = cutoff_df)# Compute the treatment effect (jump at cutoff)delta <-round(effect_treatment-effect_control,2)ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, fill = tutoring_text)) +geom_point(size =3, pch =21, color ="white", alpha =0.4) +geom_line(data = preds_trig, aes(x = entrance_exam, y = exit_exam),inherit.aes =FALSE, color ="black", size =1.5) +geom_vline(xintercept =70, color ="#FFDC00", size =2) +theme_bw(base_size =28)+scale_fill_manual(values =c("Tutor"="red", "No tutor"="black"), name =NULL,guide =guide_legend(reverse =TRUE))+labs(x ="Entrance exam score", y ="Exit exam score")+annotate(geom ="label", x =75, y = const + (late /2),label =paste("δ=", delta),size=5,color ="red", fill =alpha(c("white"),0.8))+coord_cartesian(xlim =c(30, 100), ylim =c(40, 90))+guides(fill =guide_legend(reverse =TRUE))
Lines and Fit
Parametric line should fit the data pretty well
Non-parametric lines use the data to find the best line, often with windows and moving averages.
For example, one common method is LOESS (Locally Estimated Scatterplot Smoothing)
Measuring the Gap Parametrically
The gap can be easily measured by centering the running variable around the threshold:
redefining the variable so that the cutoff equals zero.
One key assumption of RDD is that units just above and just below the cutoff are similar in all aspects except treatment assignment. There is no precise manipulation at the cutoff.
This means pre-treatment covariates should be balanced across the threshold.
Why It Matters
Helps validate the continuity assumption
Suggests that selection bias is unlikely
Strengthens causal interpretation of \(\delta\)
Covariate Balance
How to Test It
Run the same RDD logic, but use pre-treatment covariates as outcomes.
If there’s no jump, the assumption holds.
Caveat 1
Manipulation at the threshold
People might be aware of the cutoff so they might modify their behavior to get in/out of program
For example, students could try to get a slightly lower grade to benefit from the tutoring program.
This means that people at the cutoff are not representative of the control and treatment group.
Caveat 1
Manipulation Test
The way to test for manipulation at the cutoff is the McCrary density test.
This is an example of Non-manipulation
Show the code
library(dplyr)set.seed(123) # Set random seed for reproducibilitynum_students <-1000# Number of simulated students# Create a tibble (data frame) with simulated student datatutoring <-tibble(id =1:num_students, # Unique ID for each student# Simulate entrance exam scores using a Beta distribution (scaled 0–100)entrance_exam =rbeta(num_students, shape1 =7, shape2 =2),# Simulate base ability for exit exam (before adding treatment effects)exit_exam_base =rbeta(num_students, shape1 =5, shape2 =3)) |>mutate(entrance_exam = entrance_exam *100, # Convert entrance exam scores to 0–100 scale# Deterministic assignment rule:# Students with entrance_exam ≤ 70 are assigned to receive tutoringtutoring_assignment = entrance_exam <=70,# Fuzzy compliance: not everyone follows their assignment perfectlytutoring_fuzzy =case_when(# 85% of assigned students actually take tutoring tutoring_assignment &runif(n()) <0.85~TRUE,# 15% of unassigned students take tutoring anyway!tutoring_assignment &runif(n()) <0.15~TRUE,# Everyone else does not take tutoringTRUE~FALSE ),# Construct the true exit exam score assuming tutoring has a positive effect# - Base score scaled to 0–40# - +10 points for being assigned tutoring# - +0.5 points per entrance exam pointexit_exam = exit_exam_base *40+10* tutoring_assignment + entrance_exam /2,# Same as above but using actual (fuzzy) tutoring statusexit_exam_fuzzy = exit_exam_base *40+10* tutoring_fuzzy + entrance_exam /2,# Label variables for clarity (for use in plots/tables)tutoring_text =factor(tutoring_assignment, levels =c(FALSE, TRUE), labels =c("No tutor", "Tutor")),tutoring_fuzzy_text =factor(tutoring_fuzzy, levels =c(FALSE, TRUE), labels =c("No tutor", "Tutor")),# Center entrance exam scores around 70 for regression interpretationentrance_centered = entrance_exam -70 )
[1] 1
Caveat 1
Manipulation Test
The way to test for manipulation at the cutoff is the McCrary density test.
This is an example of Manipulation
Show the code
set.seed(123) # for reproducibilitynum_students <-1500# a bit more to have room for thinningtutoring2 <-tibble(id =1:num_students,entrance_exam =rbeta(num_students, shape1 =7, shape2 =2),exit_exam =rbeta(num_students, shape1 =5, shape2 =3)) |>mutate(entrance_exam = entrance_exam *100) |># 🔥 Add manipulation: fewer people just above the cutofffilter(!(entrance_exam >70& entrance_exam <75&runif(n()) <0.5)) |># randomly drop 50% just above cutoffmutate(# Assignmenttutoring_assignment = entrance_exam <=70,# Fuzzy compliancetutoring_fuzzy =case_when( tutoring_assignment &runif(n()) <0.85~TRUE,!tutoring_assignment &runif(n()) <0.15~TRUE,TRUE~FALSE ),# Outcomeexit_exam = exit_exam *40+10* tutoring_fuzzy + entrance_exam /2,# Labelstutoring_text =factor(tutoring_assignment, levels =c(FALSE, TRUE), labels =c("No tutor", "Tutor")),tutoring_fuzzy_text =factor(tutoring_fuzzy, levels =c(FALSE, TRUE), labels =c("No tutor", "Tutor")),entrance_centered = entrance_exam -70 )library(rddensity)mtest <-rddensity(X = tutoring2$entrance_exam, c=70)man<-rdplotdensity(mtest,X = tutoring2$entrance_exam)$Estplot+geom_vline(xintercept=70, linetype="dashed") +labs(x="Entrance exam score", y="Density")
Noncompliance
The logic for noncompliance means that people on the margin of the cutoff might end up in/out of the program.
Show the code
ggplot(tutoring, aes(x = entrance_exam, y = tutoring_fuzzy_text, fill = entrance_exam <=70)) +geom_vline(xintercept =70, linewidth =2, color ="#FFDC00") +geom_point(size =3, pch =21, color ="white", alpha =0.4,position =position_jitter(width =0, height =0.15, seed =1234)) +labs(x ="Entrance exam score", y =NULL) +guides(fill ="none") +scale_fill_manual(values =c("black", "red"), name =NULL) +theme_bw(base_size =28)
Compliance
The logic for noncompliance means that people on the margin of the cutoff might end up in/out of the program.
Show the code
ggplot(tutoring, aes(x = entrance_exam, y = tutoring_text, fill = tutoring_text)) +geom_vline(xintercept =70, linewidth =2, color ="#FFDC00") +geom_point(size =3, pch =21, color ="white", alpha =0.4,position =position_jitter(width =0, height =0.15, seed =1234)) +labs(x ="Entrance exam score", y =NULL) +guides(fill ="none") +scale_fill_manual(values =c("black", "red"), name =NULL) +theme_bw(base_size =28)
Noncompliance
Addressing it
The way to address noncompliance is to use fuzzy discontinuities:
Use an instrument for which side of the cutoff people should be on
Thus, the effect will only be for compliers near the cutoff
Compliance
Person Type
If Assigned to Treatment
If Assigned to Control
Complier
Takes treatment
Does not take treatment
Always-Taker
Takes treatment
Takes treatment
Never-Taker
Does not take treatment
Does not take treatment
Defier
Does not take treatment
Takes treatment
Example
Note: We don’t include defiers because they would be students who do the exact opposite of the rule—refusing tutoring when they’re offered it and taking it when they’re not. In real life (and in our model), we assume people don’t usually behave that way.
Noncompliance
Addressing it
So, we can use an instrument = above/below cutoff to address non-compliance
The instrument has to meet three important criteria
Relevance
The cutoff changes the likelihood of receiving the treatment.
→ People below the cutoff are more likely to get tutoring.
Exclusion
The cutoff affects the outcome only through the treatment.
→ Crossing the threshold doesn’t directly change the outcome except by changing tutoring status.
Exogeneity
Unobserved confounders between the tutoring program and exit exam scores are unrelated to the cutoff.
Noncompliance
Addressing it
Noncompliance
Addressing it
The way to deal with noncompliance is to use a Fuzzy RD
We can do it in the following way:
rdrobust(y = tutoring$exit_exam, x = tutoring$entrance_exam, c =70, fuzzy = tutoring$tutoring_fuzzy)
Fuzzy RD estimates using local polynomial regression.
Number of Obs. 1000
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 246 754
Eff. Number of Obs. 187 350
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 13.056 13.056
BW bias (b) 19.608 19.608
rho (h/b) 0.666 0.666
Unique Obs. 246 754
First-stage estimates.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional -0.680 0.072 -9.382 0.000 [-0.822 , -0.538]
Robust - - -8.029 0.000 [-0.869 , -0.528]
=============================================================================
Treatment effect estimates.
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 11.955 2.030 5.889 0.000 [7.976 , 15.934]
Robust - - 4.940 0.000 [7.450 , 17.250]
=============================================================================
Noncompliance
Addressing it
Uniform Kernel
Show the code
# Uniform# Run rdrobust with all resultsrd1 <-rdrobust(y = tutoring$exit_exam, x = tutoring$entrance_exam, c =70,fuzzy = tutoring$tutoring_fuzzy,kernel ="Uniform")# Create a named list of models (each method with its estimate and se)models <-list("Conventional"= rd1$Estimate[1],"Bias-Corrected"= rd1$Estimate[2],"Robust Bias-Corrected"= rd1$Estimate[3])ses <-list("Conventional"= rd1$se[1],"Bias-Corrected"= rd1$se[2],"Robust Bias-Corrected"= rd1$se[3])# Build a modelsummary-style data framelibrary(tibble)results1 <- tibble::tibble(Statistic =c("Estimate", "Std. Error", "Kernel", "Obs.", "Bandwidth"),`Conventional`=c(sprintf("%.3f", models$Conventional),sprintf("%.3f", ses$Conventional), rd1$kernel, # keep as-is (a string)sum(rd1$N),sprintf("%.3f",rd1$bws[1]) ),`Bias-Corrected`=c(sprintf("%.3f", models$`Bias-Corrected`),sprintf("%.3f", ses$`Bias-Corrected`), rd1$kernel,sum(rd1$N),sprintf("%.3f",rd1$bws[2]) ),`Robust Bias-Corrected`=c(sprintf("%.3f", models$`Robust Bias-Corrected`),sprintf("%.3f", ses$`Robust Bias-Corrected`), rd1$kernel,sum(rd1$N),sprintf("%.3f",rd1$bws[2]) ))#Triangular# Run rdrobust with all resultsrd2 <-rdrobust(y = tutoring$exit_exam, x = tutoring$entrance_exam, c =70,fuzzy = tutoring$tutoring_fuzzy,kernel ="Triangular")# Create a named list of models (each method with its estimate and se)models <-list("Conventional"= rd2$Estimate[1],"Bias-Corrected"= rd2$Estimate[2],"Robust Bias-Corrected"= rd2$Estimate[3])ses <-list("Conventional"= rd2$se[1],"Bias-Corrected"= rd2$se[2],"Robust Bias-Corrected"= rd2$se[3])# Build a modelsummary-style data framelibrary(tibble)results2 <- tibble::tibble(Statistic =c("Estimate", "Std. Error", "Kernel", "Obs.", "Bandwidth"),`Conventional`=c(sprintf("%.3f", models$Conventional),sprintf("%.3f", ses$Conventional), rd2$kernel, # keep as-is (a string)sum(rd2$N),sprintf("%.3f",rd2$bws[1]) ),`Bias-Corrected`=c(sprintf("%.3f", models$`Bias-Corrected`),sprintf("%.3f", ses$`Bias-Corrected`), rd2$kernel,sum(rd2$N),sprintf("%.3f",rd2$bws[2]) ),`Robust Bias-Corrected`=c(sprintf("%.3f", models$`Robust Bias-Corrected`),sprintf("%.3f", ses$`Robust Bias-Corrected`), rd2$kernel,sum(rd2$N),sprintf("%.3f",rd2$bws[2]) ))#Epanechnikov# Run rdrobust with all resultsrd3 <-rdrobust(y = tutoring$exit_exam, x = tutoring$entrance_exam, c =70,fuzzy = tutoring$tutoring_fuzzy,kernel ="Epanechnikov")# Create a named list of models (each method with its estimate and se)models <-list("Conventional"= rd3$Estimate[1],"Bias-Corrected"= rd3$Estimate[2],"Robust Bias-Corrected"= rd3$Estimate[3])ses <-list("Conventional"= rd3$se[1],"Bias-Corrected"= rd3$se[2],"Robust Bias-Corrected"= rd3$se[3])# Build a modelsummary-style data framelibrary(tibble)results3 <- tibble::tibble(Statistic =c("Estimate", "Std. Error", "Kernel", "Obs.", "Bandwidth"),`Conventional`=c(sprintf("%.3f", models$Conventional),sprintf("%.3f", ses$Conventional), rd3$kernel, # keep as-is (a string)sum(rd1$N),sprintf("%.3f",rd3$bws[1]) ),`Bias-Corrected`=c(sprintf("%.3f", models$`Bias-Corrected`),sprintf("%.3f", ses$`Bias-Corrected`), rd3$kernel,sum(rd1$N),sprintf("%.3f",rd3$bws[2]) ),`Robust Bias-Corrected`=c(sprintf("%.3f", models$`Robust Bias-Corrected`),sprintf("%.3f", ses$`Robust Bias-Corrected`), rd3$kernel,sum(rd3$N),sprintf("%.3f",rd3$bws[2]) ))# Create a separator rowseparator_fs <- tibble::tibble(Statistic ="<b>First stage</b>",Conventional ="", `Bias-Corrected`="", `Robust Bias-Corrected`="")separator_ss <- tibble::tibble(Statistic ="<b>Treatment effect estimates</b>",Conventional ="", `Bias-Corrected`="", `Robust Bias-Corrected`="")separator_uniform <- tibble::tibble(Statistic ="<b>Uniform Kernel</b>",Conventional ="", `Bias-Corrected`="", `Robust Bias-Corrected`="")separator_triangular <- tibble::tibble(Statistic ="<b>Triangular Kernel</b>",Conventional ="", `Bias-Corrected`="", `Robust Bias-Corrected`="")separator_epanechnikov <- tibble::tibble(Statistic ="<b>Epanechnikov Kernel</b>",Conventional ="", `Bias-Corrected`="", `Robust Bias-Corrected`="")# First-stage estimatesfirst_stage_rd1 <- tibble::tibble(Statistic =c("Estimate", "Std. Error"),`Conventional`=c(sprintf("%.3f", rd1$tau_T[1]),sprintf("%.3f", rd1$se_T[1])),`Bias-Corrected`=c(sprintf("%.3f", rd1$tau_T[2]),sprintf("%.3f", rd1$se_T[2])),`Robust Bias-Corrected`=c(sprintf("%.3f", rd1$tau_T[3]),sprintf("%.3f", rd1$se_T[3])))# First-stage estimatesfirst_stage_rd2 <- tibble::tibble(Statistic =c("Estimate", "Std. Error"),`Conventional`=c(sprintf("%.3f", rd2$tau_T[1]),sprintf("%.3f", rd2$se_T[1])),`Bias-Corrected`=c(sprintf("%.3f", rd2$tau_T[2]),sprintf("%.3f", rd2$se_T[2])),`Robust Bias-Corrected`=c(sprintf("%.3f", rd2$tau_T[3]),sprintf("%.3f", rd2$se_T[3])))# First-stage estimatesfirst_stage_rd3 <- tibble::tibble(Statistic =c("Estimate", "Std. Error"),`Conventional`=c(sprintf("%.3f", rd3$tau_T[1]),sprintf("%.3f", rd3$se_T[1])),`Bias-Corrected`=c(sprintf("%.3f", rd3$tau_T[2]),sprintf("%.3f", rd3$se_T[2])),`Robust Bias-Corrected`=c(sprintf("%.3f", rd3$tau_T[3]),sprintf("%.3f", rd3$se_T[3])))results_final1 <-rbind( separator_uniform, separator_fs, first_stage_rd1, separator_ss, results1)results_final2 <-rbind(separator_triangular, separator_fs, first_stage_rd2, separator_ss, results2)results_final3 <-rbind(separator_epanechnikov, separator_fs, first_stage_rd3, separator_ss, results3)# Show using modelsummary custom tablemodelsummary::datasummary_df(results_final1, output ="markdown")
Statistic
Conventional
Bias-Corrected
Robust Bias-Corrected
Uniform Kernel
First stage
Estimate
-0.633
-0.625
-0.625
Std. Error
0.092
0.092
0.110
Treatment effect estimates
Estimate
18.112
18.512
4.144
Std. Error
4.144
4.144
4.984
Kernel
Uniform
Uniform
Uniform
Obs.
1000
1000
1000
Bandwidth
6.123
10.533
10.533
Noncompliance
Addressing it
Triangular Kernel
Show the code
# Show using modelsummary custom tablemodelsummary::datasummary_df(results_final2, output ="markdown")
Statistic
Conventional
Bias-Corrected
Robust Bias-Corrected
Triangular Kernel
First stage
Estimate
-0.680
-0.699
-0.699
Std. Error
0.072
0.072
0.087
Treatment effect estimates
Estimate
16.644
16.644
2.899
Std. Error
2.899
2.899
3.546
Kernel
Triangular
Triangular
Triangular
Obs.
1000
1000
1000
Bandwidth
13.090
19.608
19.608
Noncompliance
Addressing it
Epanechnikov Kernel
Show the code
# Show using modelsummary custom tablemodelsummary::datasummary_df(results_final3, output ="markdown")
Statistic
Conventional
Bias-Corrected
Robust Bias-Corrected
Epanechnikov Kernel
First stage
Estimate
-0.660
-0.672
-0.672
Std. Error
0.077
0.077
0.092
Treatment effect estimates
Estimate
17.159
17.440
3.214
Std. Error
3.214
3.214
3.902
Kernel
Epanechnikov
Epanechnikov
Epanechnikov
Obs.
1000
1000
1000
Bandwidth
10.779
16.937
16.937
Summary: Regression Discontinuity Design
Goal:
Estimate local causal effect of treatment at a known cutoff using a continuous running variable.
Key Assumptions:
- Continuity: Potential outcomes \(Y(0)\) and \(Y(1)\) change smoothly at the cutoff.
- No Precise Manipulation: Units cannot perfectly sort just above or below the cutoff.
Diagnostics & Robustness:
- McCrary Test: Check for manipulation in the running variable’s density.
- Covariate Balance: No jumps in pre-treatment covariates at cutoff.
- Placebo Tests: No effects at fake cutoffs.
Interpretation:
- Estimate is Local Average Treatment Effect (LATE) at the cutoff.
- RDD is valid only near the cutoff—not population-wide.