Notes on Propensity Score Methods
Introduction
Here are my notes on propensity scores, mainly from Prof. Ding’s textbook (2024).
The traditional propensity score analysis workflow is shown in the image below, which I will not cover in detail. Instead, I will summarize the key theorems and results from Ding’s textbook.

I will also provide some connections with Riesz Representer (RR).
-
Why connect with the Riesz Representer (RR)? The connection provides a powerful generalization of the foundational Rosenbaum-Rubin (1983) result.
-
Rosenbaum and Rubin showed that conditioning on the propensity score is sufficient for removing confounding bias when estimating causal effects. The Riesz representer extends this principle: it suffices to regress on the Riesz representer to obtain unbiased estimates of the average treatment effect.
-
The key insight is that the Riesz representer, like the propensity score, serves as a sufficient statistic – it captures all the confounding information necessary for unbiased estimation of your target causal parameter.
Setting & Notation
- Binary treatment
- Potential outcomes
- Propensity score:
, where represents covariates
Two approaches learning causal relationships:
-
Outcome process (via outcome regression)
-
Treatment assignment mechanism (via propensity score)
The following summarizes the key theorems and results related to propensity scores from Prof. Ding’s textbook.
1. The propensity score as a dimension reduction tool
-
Covariates
can be high dimensional, but the propensity score, , is a 1-dimensional scalar -
We can view the propensity score as a dimensional reduction tool
2. Propensity score stratification
-
Idea: Discretize the estimated propensity score by its
quantiles:Estimate ATE within each subclass and then average by the block size
-
Advantage: The propensity score stratification estimator only requires the correct ordering of the estimated propensity scores rather than their exact values, which makes it relatively robust compared with other methods
3. Propensity score weighting
-
Connection the Riesz Representer (RR)
Remark 1 (RR in the case of ATE).In the case of ATE, the Riesz Representer, , has the same form as above Horvitz-Thompson transform,
3.1 Estimation
-
The sample version of IPW is called the Horvitz–Thompson (HT) estimator,
-
HT estimator
has many problems -
Problem: lack of invariance, i.e. if we replace
by , changed because it depends on . This is not reasonable. -
Solution: normalizing the weights
-
Hajek estimator is invariant to the location transformation
3.2 Strong overlap condition
Many asymptotic analyses require a strong overlap condition,
-
Crump et al. (2009) suggested
and -
Kurth et al. (2005) suggested
and
4. Balancing property

-
Conditional on
, the treatment and the covariates are independent -
Within the same level of the propensity score, the covariate distributions are balanced across the treatment and control groups
-
Useful implication: we can check whether the propensity score model is specified well enough to ensure the covariate balance in the data
4.1 Propensity score is a balancing score



-
This is relevant in subgroup analysis
-
The conditional independence in (11.5) ensures unconfoundedness holds given the propensity score, within each level of
. Therefore, we can perform the same analysis based on the propensity score, within each level of , yielding estimates for two subgroup effects
5. Doubly Robust or AIPW
The following Theorem is summarized from Prof. Wager’s lecture notes (2024).
-
Check my previous post: Intuition for Doubly Robust Estimator
-
AIPW provides a natural starting point for understanding Double Machine Learning
-
Key insight of RR in DML framework: Leverage the Riesz Representer, a “generalized version of propensity score” to “correct the bias”
6. Other Estimands related to IPW
More general, Li et al. (2018a) gave a unified discussion of the causal estimands in observational studies.

Summary Table of common estimands:

-
This table provides us a good way to understand and remember IPW estimator for ATT
-
How to remember
? Apply IPW on “pseudo outcome” then divide by -
When the parameter of interest is ATT, then
-
Use it to better understand IPW for ATT
7. Propensity Score in Regression
PS as a covariate
Based on above Theorem, we also have:
1. the coefficient of
2. the coefficient of
PS as a weight
There is a convenient way to obtain

-
Need to use bootstrap for standard error
-
Why does the WLS give a consistent estimator for
? -
In RCT with a constant propensity score, we can simply use the coefficient of
in the OLS fit of on ( ) to estimate -
In observational studies, we need to deal with the selection bias. The key idea is:
-
If we weight the treated units by
and the control units by , then both treated and control groups can represent the whole population -
Thus, by weighting, we effectively have a pseudo-randomized experiment
-
Remark 2 (IPCW).Inverse Probability of Censoring Weighting (IPCW) follows the same idea — it adjusts for censoring bias by reweighting observations based on their probability of being uncensored.
-
-
Consequently, the difference between the weighted means is consistent for
. The numerical equivalence of and WLS is not only a fun numerical fact itself but also useful for motivating more complex estimators with covariate adjustment
Reference
Ding, Peng (2024), A First Course in Causal Inference, CRC Press.
Wager, S. (2024). Causal inference: A statistical learning approach. https://web.stanford.edu/~swager/causal_inf_book.pdf