Notes on Doubly Robust Censoring Unbiased Transformation

$$ \newcommand{\indep}{\mathrel{\perp\mkern-10mu\perp}} \newcommand{\P}{\mathbb{P}} \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\1}[1]{\mathbf{1}\\{#1\\}} $$

Predicting outcomes with right-censored survival data forces a choice: do we model the outcome distribution, or the censoring mechanism? Classical transformations require us to commit to one. Rubin and van der Laan (2007) tell us we don’t have to. Their doubly robust censoring unbiased transformation fuses both approaches, remaining valid as long as at least one of the two nuisance models is correctly specified. This post walks through the setup, the classical transformations, and how the doubly robust version combines them — drawing the analogy to AIPW along the way.

1. Setup

We observe an i.i.d. sample $\{O_i\}_{i=1}^n$ where each observation is

$$O = \bigl(W,\; \Delta = \mathbf{1}(Y \le C),\; \tilde{Y} = Y \wedge C\bigr)$$
  • $W$: covariates
  • $Y$: true (possibly unobserved) survival time
  • $C$: random censoring time
  • $\tilde{Y} = \min(Y, C)$ is what we actually record
  • $\Delta = 1$ if the event is observed ($Y \le C$), and $\Delta = 0$ if the outcome is right-censored ($Y > C$)

Our goal is to estimate the regression function

$$m(w) = \mathbb{E}[Y \mid W = w]$$

Two nuisance functions appear throughout:

  • $\bar{F}(\cdot \mid W)$: conditional survival function of the response $Y$ given $W$
  • $\bar{G}(\cdot \mid W)$: conditional survival function of the censoring time $C$ given $W$

We maintain the standard assumption $Y \indep C \mid W$.


2. The Challenge: Unidentifiability

When the censoring time $C$ corresponds to a fixed study endpoint, the true response $Y$ may exceed it. Beyond that endpoint, nothing can be learned about the tail of the survival distribution — making $m(W) = \mathbb{E}[Y \mid W]$ unidentifiable in general.

The fix: truncate the response at a known horizon $\tau$,

$$Y \longmapsto Y \wedge \tau = \min(Y, \tau)$$

and estimate $w \mapsto \mathbb{E}[Y \wedge \tau \mid W = w]$ instead.

The Surrogate-Response Strategy

The general approach to prediction with right-censored data is:

  1. Replace the possibly unavailable responses $\{Y_i\}_{i=1}^n$ with surrogate values $\{Y^*(O_i)\}_{i=1}^n$ using an imputation map $Y^*(\cdot)$ built from observed data.

  2. Plug the imputed dataset $\{W_i, Y^*(O_i)\}_{i=1}^n$ into any standard regression algorithm.

The imputation map $Y^*(\cdot)$ is called a censoring unbiased transformation (Fan and Gijbels 1996) if it satisfies

$$\mathbb{E}[Y^*(O) \mid W] = \mathbb{E}[Y \mid W] = m(W)$$

That is, the surrogate is an unbiased proxy for the true (latent) response, conditional on covariates.


3. Two Classical Transformations

A. The Buckley–James Transformation (depends on $\bar{F}$)

The Buckley–James transformation imputes a censored observation with its conditional mean given that it exceeds the censoring time:

$$Y^*(O) = \Delta Y + (1-\Delta) Q_{\bar{F}}(W, C)$$

where $\Delta = 1$ if $Y$ is observed and $\Delta = 0$ if right-censored, and

$$Q_{\bar{F}}(w, y) = \mathbb{E}[Y \mid W = w,\; Y > y] = \frac{1}{\bar{F}(y \mid W=w)} \int_y^{+\infty} u \; dF(u \mid W=w)$$

Intuitively: if we observe the event, we keep $Y$; if censored at $C$, we impute with the expected remaining survival time above $C$.

This transformation requires correctly estimating $\bar{F}(\cdot \mid W)$, i.e., the conditional distribution of the survival time.


B. The IPCW Transformation (depends on $\bar{G}$)

Inverse probability of censoring weighting (IPCW) up-weights the observed events to compensate for the censored ones:

$$Y^*(O) = \frac{Y \Delta}{\bar{G}(Y \mid W)}$$

This requires correctly estimating $\bar{G}(\cdot \mid W)$, the conditional survival function of the censoring time. It is the survival-analysis analogue of IPW in the causal inference literature.


4. The Doubly Robust Censoring Unbiased Transformation

The two classical transformations each stake everything on one nuisance model. The doubly robust approach combines them:

$$ Y^*(O) = \underbrace{\frac{Y\Delta}{\bar{G}(Y \mid W)}}_{\text{1st term}} + \underbrace{\frac{Q_{\bar{F}}(W,C)\,(1-\Delta)}{\bar{G}(C \mid W)}}_{\text{2nd term}} - \underbrace{\int_{-\infty}^{\tilde{Y}} \frac{Q_{\bar{F}}(W,c)}{\bar{G}^2(c \mid W)}\, dG(c \mid W)}_{\text{3rd term (correction)}} $$

where $\tilde{Y} = Y \wedge C = \min(Y,C)$ and $Q_{\bar{F}}(w, c) = \mathbb{E}[Y \mid W=w,; Y > c]$.

Theorem 1.
$\mathbb{E}[Y^*(O) \mid W] = \mathbb{E}[Y \mid W]$ whenever either $\bar{F}(\cdot \mid W)$ or $\bar{G}(\cdot \mid W)$ is correctly specified.

5. Intuition: An AIPW in Disguise

The three-term structure has a clean interpretation. Recall that $Q_{\bar{F}}(W,C) = \mathbb{E}[Y \mid W, Y > C]$ is the outcome regression for censored individuals.

Term Role
1st: $Y\Delta \,/\, \bar{G}(Y\!\mid\! W)$ For observed outcomes — apply IPCW, the analogue of inverse propensity score weighting
2nd: $Q_{\bar{F}}(W,C)(1-\Delta) \,/\, \bar{G}(C\!\mid\! W)$ For censored outcomes — impute with the outcome regression $Q_{\bar{F}}$, then apply an IPCW-style weight
3rd: $-\int Q_{\bar{F}} / \bar{G}^2 \; dG$ Bias correction term

The first two terms together look exactly like IPW + imputation — the two ingredients of AIPW in the standard (binary treatment) setting. The third term is the survival-analysis counterpart of the augmentation correction in AIPW: it removes the bias that accumulates when both nuisance models are slightly off.


Takeaway

The doubly robust censoring unbiased transformation is a drop-in replacement for the classical Buckley–James or IPCW transformations. Once $Y^*(O_i)$ is computed for each observation, any off-the-shelf regression algorithm can be applied to the pairs $\{W_i, Y^*(O_i)\}$.

Reference

Rubin, Daniel and Mark J. van der Laan (2007), “A Doubly Robust Censoring Unbiased Transformation,” The International Journal of Biostatistics, 3 (1).

Fan, Jianqing and Irène Gijbels (1996), Local Polynomial Modelling and Its Applications, Chapman & Hall.

Chen Xing
Chen Xing
Founder & Data Scientist

Enjoy Life & Enjoy Work!

Related