Notes on Synthetic Control

$$ \newcommand{\indep}{\mathrel{\perp\mkern-10mu\perp}} \newcommand{\P}{\mathbb{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\1}[1]{\mathbf{1}\\{#1\\}} $$

Motivation

Abadie et al. (2010) motivate the synthetic control method with a model that generalizes the difference-in-differences (fixed-effects) model commonly applied in the empirical social science literature by allowing the effect of unobserved confounding characteristics to vary over time.

When parallel trend assumption fails to hold

DiD provides a simple estimator of the ATT provided that non-anticipation and parallel trends hold. However, the parallel trends assumption can often fail to hold in practice. Synthetic control (SC) allows extension of DiD type of methods to settings without parallel trends. Specifically, SC methods seek to mitigate bias from failures of parallel trends by carefully reweighting the control units. Intuitively, we use SC to “enforce the parallel trend”.

Setup and Estimation

  • Data: $(J + 1)$ units across periods $t = 1, \cdots, T$

  • Treated unit: the first unit ($j = 1$) is being treated only after period $T_0 \ (1 \le T_0 < T)$

    • Pre-treatment period: $1 \le t \le T_0$

    • Post-treatment period: $T_0 + 1 \le t \le T$

  • Untreated units: $j = 2, \cdots, J+1$ is a collection of untreated units, also called “donor pool”

  • For each unit, $j$, and time, $t$, we observe the outcome of interest, $Y_{jt}$

    • Let $Y_{jt}^N$ be potential response without intervention

    • Let $Y_{1t}^I$ be potential response under intervention, for $t \ge T_0 + 1$

  • We aim to estimate the effect of the intervention on the treated unit: $$ \tau_{1 t}=Y_{1 t}^I - Y_{1 t}^N =Y_{1 t}- \textcolor{red}{Y_{1 t}^N} \text {, } t \geq T_0+1 $$

    • As $Y_{1 t}^I$ is observable, we have $Y_{1 t}^I = Y_{1t}$ in post-treatment period, $t \ge T_0 + 1$. The key challenge is to estimate the counterfactual, $Y_{1 t}^N$

    • $\tau_{1t}$ depends on time $t$. It allows the effect of the treatment to change over time. This is crucial because treatment effects may not be instantaneous and may accumulate or dissipate as time after the intervention passes.

  • Let $\boldsymbol{W}=\left(w_2, \ldots, w_{J+1}\right)^{\prime}$ with:

    • $w_j \geq 0$ for $j=2, \ldots, J+1$

    • $w_2+\cdots+w_{J+1}=1$.

    Each value of $\boldsymbol{W}$ represents a potential synthetic control.

  • Let $\boldsymbol{X}_1$ be a $(k \times 1)$ vector of pre-intervention characteristics for the treated unit.

  • Let $\boldsymbol{X}_0$ be a $(k \times J)$ matrix which contains the same variables for the unaffected units.

  • The vector $\boldsymbol{W}^*=\left(w_2^*, \ldots, w_{J+1}^*\right)^{\prime}$ is chosen to minimize $\left\|\boldsymbol{X}_1-\boldsymbol{X}_0 \boldsymbol{W}\right\|$, subject to our weight constraints.
  • For a post-intervention period $t$ (with $t \geq T_0$ ) the synthetic control estimator is:

$$ \widehat{\tau}_{1 t}=Y_{1 t}-\sum_{j=2}^{J+1} w_j^* Y_{j t} $$
  • Typically, $$ \begin{align} \left\|\boldsymbol{X}_1-\boldsymbol{X}_0 \boldsymbol{W}\right\| &= \left\|\boldsymbol{X}_1-\boldsymbol{X}_0 \boldsymbol{W}\right\|_V \\ &=\sqrt{\left(X_1-X_0 W\right)^{\prime} V\left(X_1-X_0 W\right)} \\ &= \left(\sum_{h=1}^k v_h\left(X_{h 1}-w_2 X_{h 2}-\cdots-w_{J+1} X_{h J+1}\right)^2\right)^{1 / 2} \end{align} $$

    • The positive constants $v_1, \ldots, v_k$ reflect the predictive power of each of the $k$ predictors on $Y_{1 t}^N$.
    • $v_1, \ldots, v_k$ can be chosen by the analyst or by data-driven methods.

Theory behind SC

Assumption 1 (Linear factor model for counterfactuals).
$$ Y_{i t}(0)= \textcolor{red}{ \mu_i^{\prime} \lambda_t } + \delta_t+X_i^{\prime} \beta+\epsilon_{i t}, \tag{1} $$

where

  • $\mu_i$ is a vector of unobserved confounders

  • $\lambda_t$ is the corresponding time-varying coefficients

  • $X_i$ is a vector of observed covariates

Equation (1) generalizes the usual fixed-effects model for DiD, where $\textcolor{red}{ \mu_i^{\prime} \lambda_t }$ is replaced by the unit fixed effect $\alpha_i$ , known as the interactive fixed effects model, essentially latent factor model.

Notice that the assumptions on the data-generating process involve $Y_{i t}(0)$ , but not $Y_{i t}(1)$. Since $Y_{1t}(1) = Y_{1t}$ is observed, estimation of $\tau_{1t}$ for $t > T_0$ requires no assumptions on the process that generates $Y_{i t}(1)$.

The key idea of synthetic control is to estimate the unobserved $\textcolor{red}{Y_{1 t}(0)}$ by a convex combination of the observed outcomes for the control units. Intuitively, the goal is to create a weighted average of control units that “look like” a treatment unit using past outcomes.

Let $W=\left(w_2, \ldots, w_{J+1}\right)^{\prime}$ with $w_j \geq 0$ and $\sum_{j=2}^{J+1} w_j=1$. Each choice of $W$ represents a potential synthetic control.

Assumption 2 (Key assumption).
There exists weights $W^*$ such that the pre-treatment covariates and outcomes for the treated unit are balanced $$ \sum_{j=2}^{J+1} w_j^* X_j=X_1, $$ $$ \sum_{j=2}^{J+1} w_j^* Y_{j 1}=Y_{11}, \ \cdots \ , \ \sum_{j=2}^{J+1} w_j^* Y_{j T_0}=Y_{1 T_0} $$
  • Assuming factor model (1) and fairly standard conditions, one could show $$Y_{1 t}(0)-\sum_{j=2}^{J+1} w_j^* Y_{j t} \approx 0$$ if the # of pre-treatment periods is large relative to the residual variance

  • An approximately unbiased estimator of $\tau_{1 t}$ is

$$ \hat{\tau}_{1 t}=Y_{i t}-\sum_{j=2}^{J+1} w_j^* Y_{j t}, \quad t=T_0+1, \ldots, T $$

How to find $W^{*}$?

  • We can generalize the synthetic control method

  • Pre-treatment covariates: $\mathbf{Z}_i=\left(\mathbf{Y}_i^{\top}, \mathbf{X}_i^{\top}\right)^{\top}$

    • lagged outcomes: $\mathbf{Y}_i=\left(Y_{i 1}, Y_{i 2}, \ldots, Y_{i, T_0}\right)^{\top}$

    • lagged covariates $\mathbf{X}_i=\left(\mathbf{X}_{i 1}^{\top}, \mathbf{X}_{i 2}^{\top}, \ldots, \mathbf{X}_{i, T_0}^{\top}\right)^{\top}$

  • Or some subsets or functions of these variables

  • Balance both the lagged outcomes and pre-treatment covariates $$ \begin{aligned} \hat{\mathbf{w}}= & \underset{\mathbf{w}}{\operatorname{argmin}}\left(\mathbf{Z}_1-\sum_{i=2}^{J+1} w_i \mathbf{Z}_i\right)^{\top} \widehat{\Sigma}^{-1}\left(\mathbf{Z}_1-\sum_{i=2}^{J+1} w_i \mathbf{Z}_i\right) \\ & \text { subject to } \sum_{i=2}^{J+1} w_i=1, \text { and } w_i \geq 0 \text { for all } i=1, \ldots, N-1 \end{aligned}$$ where $\widehat{\Sigma}$ is the covariance matrix of $\mathbf{Z}_i$

Limitations and recommendation of SC

  • Exclude unit from donor pool that may be affected by treatment (including indirect effect)

  • Exclude unit that received big shock that NOT related to the treatment

  • To avoid interpolation bias, one could inlcude units that are similar

  • Avoid overfitting by having too many units in the control group

  • SC requires enough pre-treatment time period

  • Credibility depends on ability to match pre-treatment covariates and outcomes

  • SC is not recommended if pre-treatment fit is poor, or just a few pre-treatment periods

Reference

  • Abadie, Alberto, Alexis Diamond, and Jens Hainmueller (2010), “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program,” Journal of the American Statistical Association, 105 (490), 493–505.

  • Abadie, Alberto (2021), “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects,” Journal of Economic Literature, 59 (2), 391–425.

  • R-synthetic-control-tutorial: Synth 📦 R package

Chen Xing
Chen Xing
Founder & Data Scientist

Enjoy Life & Enjoy Work!

Related