The Experimental Selection Correction Estimator

$$ \newcommand{\indep}{\mathrel{\perp\mkern-10mu\perp}} \newcommand{\P}{\mathbb{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\1}[1]{\mathbf{1}\\{#1\\}} $$

TL;DR

Athey, Chetty, and Imbens develop a new method to estimate treatment effects on primary outcomes (like graduation rates) by combining experimental data where treatment is randomized but only secondary outcomes are observed (like test scores) with observational data where both primary and secondary outcomes are measured but treatment is not randomized. Their Experimental Selection Correction (ESC) estimator uses differences in secondary outcomes between the two samples to correct for selection bias, relying on a new “latent unconfoundedness” assumption that requires the same unobserved confounders to affect both primary and secondary outcomes. Applied to class size effects, the method reveals that a 25% reduction in third grade class size increases high school graduation rates by 0.7 percentage points, while standard observational methods yield implausible negative estimates.

What is this paper about?

This paper addresses a common challenge in causal inference where researchers have access to two complementary but incomplete datasets: experimental data with randomized treatment and secondary outcomes (like test scores) but missing primary outcomes of interest (like graduation rates), and observational data with both primary and secondary outcomes but non-random treatment assignment plagued by selection bias. The motivating example involves estimating how third grade class size affects high school graduation when Project STAR provides experimental evidence on test score effects but lacks graduation data, while NYC school district records contain graduation information but suffer from confounded class size variation. The authors aim to leverage the internal validity of experiments to purge selection bias from observational estimates without relying on the strong surrogacy assumptions that typically require treatment to affect primary outcomes only through secondary outcomes.

What do the authors do?

The authors develop both the theoretical foundations and practical implementation of the ESC estimator through identification results and multiple estimation approaches. Theoretically, they prove that the average treatment effect on the primary outcome is point-identified under three key assumptions: random assignment in the experimental sample, conditional external validity (requiring treatment effects to generalize across samples conditional on covariates), and their novel latent unconfoundedness condition (requiring that unobserved confounders affecting the primary outcome are the same as those affecting the secondary outcome). They show this approach strictly weakens standard surrogate assumptions by permitting direct effects of treatment on primary outcomes and allowing for unobserved confounding in the observational data. For estimation, they present four equivalent approaches—control function, imputation, weighting, and influence function methods—with the control function approach being most straightforward: estimate the experimental treatment effect on the secondary outcome, calculate residuals in the observational sample as the difference between actual and predicted secondary outcomes, then regress the primary outcome on treatment while controlling for these residuals. They validate their method empirically by applying it to estimate class size effects, demonstrating that ESC estimates on holdout outcomes (test scores in grades 4-8) closely match experimental benchmarks and capture well-known fadeout patterns, while standard OLS estimates remain severely biased even after controlling for rich demographic covariates.

Why is this important?

This method addresses a pervasive limitation in policy evaluation where long-term outcomes are too costly or time-consuming to measure in experiments but are routinely captured in administrative observational data that suffer from selection bias. The ESC estimator provides a principled way to harness experimental internal validity for correcting observational estimates without requiring the restrictive surrogacy assumption that treatment operates exclusively through the secondary outcome—an assumption frequently violated in practice (as evidenced by the education literature showing that early interventions affect long-term outcomes through pathways beyond test scores, such as non-cognitive skills). The latent unconfoundedness assumption, while novel and untestable in isolation, is substantially weaker than assuming either surrogacy or unconfoundedness in the observational sample, making it applicable in settings where conventional methods fail. The paper also formalizes a common empirical heuristic: when observational and experimental estimates align on secondary outcomes, researchers often proceed to estimate effects on primary outcomes using the observational data, and the authors clarify precisely when this practice is justified. Methodologically, the connection to control function methods and missing data theory provides a familiar statistical framework, while the demonstration that standard covariate adjustment fails where ESC succeeds highlights the method’s ability to address selection on dimensions typically unobserved in administrative data.

Who should care?

Applied researchers conducting policy evaluations in education, labor economics, health, and other fields where experiments measure short-term proxies but administrative data track long-term outcomes should pay close attention to this method. It is particularly relevant for analysts working with combined experimental and observational datasets who currently rely on surrogate approaches but suspect violations of the exclusion restriction (that treatment affects primary outcomes only through secondary outcomes). Econometricians and statisticians interested in causal inference methodology will find value in the identification results, especially the latent unconfoundedness framework and its connections to control function methods, missing data assumptions, and semiparametric efficiency. Policy analysts and government agencies that commission experiments but need evidence on outcomes with long observation lags (like earnings, mortality, or recidivism) can use this approach to accelerate evidence generation. Education researchers studying intervention effects will find the class size application instructive, as it provides rare causal estimates of elementary school inputs on high school completion and demonstrates how to validate identifying assumptions using holdout outcomes.

Do we have code?

Yes, replication code is available at the GitHub repository: https://github.com/OpportunityInsights/Experimental-Selection-Correction-Replication-Code.git. The paper also provides straightforward Stata code in Appendix C for implementing the control function version of the ESC estimator in three steps (estimate experimental treatment effect on secondary outcome, construct residuals in observational sample, regress primary outcome on treatment controlling for residuals), though proper inference requires bootstrapping rather than conventional standard errors. The appendix additionally provides R code and details alternative estimation approaches including imputation, weighting, and influence function methods for researchers who prefer different implementations.


In summary, this paper introduces a method that strategically combines experimental and observational data to estimate treatment effects on primary outcomes that cannot be measured in experiments. By using experimental estimates on secondary outcomes to construct a selection correction term, the ESC estimator removes biases in observational data under the novel latent unconfoundedness assumption that the same confounders affect both primary and secondary outcomes. This assumption is substantially weaker than surrogacy (which prohibits direct treatment effects on primary outcomes) and weaker than standard unconfoundedness (which requires no unmeasured confounding). The application to class size effects demonstrates the method’s ability to recover experimental benchmarks on holdout test scores while revealing that class size reductions meaningfully increase graduation rates—a finding obscured by severe selection bias in standard observational estimates that persists even after rich covariate adjustment. The method opens new possibilities for leveraging experiments to accelerate policy learning about long-term outcomes routinely captured in administrative data.

Reference

Athey, S., Chetty, R., & Imbens, G. (2025). The experimental selection correction estimator: Using experiments to remove biases in observational estimates (Working Paper No. 33817; Working Paper Series). National Bureau of Economic Research. https://doi.org/10.3386/w33817

Chen Xing
Chen Xing
Founder & Data Scientist

Enjoy Life & Enjoy Work!

Related