Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data
TL;DR
This paper proves that horizontal regression (time series methods like unconfoundedness) and vertical regression (cross-sectional methods like synthetic controls) can produce algebraically identical point estimates for several common estimators including OLS, PCR, and ridge regression. However, even when point estimates match, the source of randomness assumed (time series vs. cross-sectional vs. both) leads to different estimands and different measures of uncertainty. The key message: researchers must carefully consider where randomness stems from in their data, as this directly impacts the validity of inference, even though it doesn’t affect point estimates for symmetric methods.
What is this paper about?
Panel data analysis typically uses one of two broad approaches to estimate treatment effects: horizontal regression exploits time series patterns (like unconfoundedness methods that use lagged outcomes), while vertical regression exploits cross-sectional patterns (like synthetic controls that construct counterfactuals from control units). These methods are conventionally viewed as fundamentally different approaches suited to different data configurations. The paper investigates two central questions: (1) When do these seemingly different methods actually produce identical point estimates? and (2) How does the assumed source of randomness affect statistical inference? Using canonical examples like Abadie and Gardeazabal’s study of terrorism in Basque Country, the authors show that the same point estimate can support vastly different confidence intervals depending on whether randomness is attributed to time patterns, unit patterns, or both.
What do the authors do?
The authors classify regression formulations into symmetric and asymmetric classes based on whether horizontal and vertical approaches yield identical point estimates. For the symmetric class (OLS with minimum $\mathcal{l}2$-norm, PCR, ridge regression), they prove algebraic equivalence holds without any assumptions on the data generating process or dimensions. For the asymmetric class (lasso, elastic net, simplex regression), they prove the methods diverge. They then analyze inference under three frameworks: model-based (where potential outcomes are random) and design-based (where treatment assignment is random), considering randomness from (i) time series only, (ii) cross-sections only, or (iii) both simultaneously. Through asymptotic theory, they derive distinct estimands and variances for each source of randomness. The paper includes Monte Carlo simulations calibrated to three classic studies (Basque terrorism, California Proposition 99, West Germany reunification) and applies their framework to these empirical cases, constructing separate confidence intervals for each randomness source.
Why is this important?
The equivalence of point estimates across methods is surprising and practically useful—it shows that OLS and ridge regression are not “invalid” in certain dimensional regimes as commonly believed, but rather that different solutions exist and implicit regularization (via minimum norm) produces the same answer as the cross-sectional approach. More critically, the paper reveals that inference depends fundamentally on assumptions about the source of randomness, not just the estimation method. A confidence interval calibrated for one estimand (say, the time series effect) can dramatically under- or over-cover when the true estimand is different (say, the cross-sectional or doubly robust effect). This matters because in observational panel studies, the source of randomness is never definitively known—it’s an assumption researchers make, often implicitly. The simulations show coverage probabilities can deviate substantially from nominal levels (e.g., 93% vs 75% vs 67% for a nominal 95% interval), meaning researchers can be badly misled about precision and statistical significance if they choose the wrong framework. This elevates what seems like a technical modeling choice into a first-order identification concern.
Who should care?
Applied economists and policy evaluators using difference-in-differences, synthetic controls, or any panel-based causal inference methods should care, especially when working with observational data where parallel trends or other assumptions are uncertain. Econometricians developing new panel methods need to account for how randomness assumptions affect both estimation and inference. Anyone using modern DiD estimators (Callaway-Sant’Anna, SDID, augmented synthetic controls) should pay attention, since the paper shows even doubly robust methods inherit these issues.
Do we have code?
Yes. The replication package is available at https://doi.org/10.5281/zenodo.8423395 and was verified by Econometrica. The authors implemented their analysis in Python (using scikit-learn for regularized regression) and mention that simplex regression was implemented using publicly available code. However, the confidence intervals constructed in Section 4 are primarily conceptual devices for illustration rather than recommended for general use, as they rely on stylized homoskedastic assumptions and asymptotic approximations that may not hold in real applications.
In summary, this paper fundamentally reframes how we should think about panel data methods: horizontal and vertical regressions are not just different—they can be algebraically identical for point estimation (symmetric class) yet still lead to completely different inference depending on randomness assumptions. The authors don’t provide a definitive solution for which framework to use, but they make clear that this choice matters enormously and is currently under-examined in applied work. The contribution is both technical (proving equivalence results, deriving variance formulas) and conceptual (emphasizing that uncertainty quantification depends critically on where we believe randomness originates). Future work needs principled frameworks for testing or checking these randomness assumptions, since the paper shows that getting them wrong leads to systematically incorrect inference.
Reference
Shen, Dennis, Peng Ding, Jasjeet Sekhon, and Bin Yu (2023), “Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data,” Econometrica, 91 (6), 2125–54.