Observational studies: propensity scores methods

class: center, middle, inverse, title-slide

# Observational studies: propensity scores methods
### Dr. Olanrewaju Michael Akande
### Nov 7, 2019

---

## Announcements

- Reminder: please remember to respond to survey II.

- No more HW5; more time to work on individual projects.

- Final lab next Friday; will cover causal inference.

## Outline

- Propensity scores: definition and properties

- Estimation

- PS stratification

- PS matching

- PS regression

---
class: center, middle

# Propensity scores: definition and properties

---
## Propensity scores

- The .hlight[propensity score] (ps) is defined as the conditional probability of receiving a treatment given pre-treatment covariates X.

- That is,
.block[
.small[
$$
e(X) = \mathbb{Pr}[W = 1 | X] = \mathbb{E}[W | X],
$$
]
]

where `$X = (X_1, \ldots, X_p)$` is the vector of `$p$` covariates/predictors.

- Propensity score is a probability, analogous to a summary statistic.

- Propensity score has really nice properties which makes it desirable to use within our causal inference framework.

---
## Balancing property of propensity score

- .hlight[Property 1]. The propensity score e(X) balances the distribution of all `$X$` between the treatment groups:
.block[
.small[
$$
W \perp X | e(X)
$$
]
]

- Equivalently,
.block[
.small[
$$
\mathbb{Pr}[W_i = 1 | X_i, e(X_i)] = \mathbb{Pr}[W_i = 1 | e(X_i)].
$$
]
]

- The propensity score is not the only .hlight[balancing score]. Generally, a balancing score `$b(x)$` is a function of the covariates such that:
.block[
.small[
$$
W \perp X | b(X)
$$
]
]

---
## Remarks on the balancing property

- Rosenbaum and Rubin (1983) show that all balancing scores are a function of `$e(X)$`.

- If a subclass of units or a matched treatment-control pair are homogeneous in `$e(X)$`, then the treatment and control units have the same distribution of `$X$`.

- The balancing property is a statement on the distribution of `$X$`, NOT on assignment mechanism or potential outcomes.

---
## Propensity score: unconfoundedness

- .hlight[Property 2]. If `$W$` is unconfounded given `$X$`, then `$W$` is unconfounded given `$e(X)$`, i.e.,

- That is, if
.block[
.small[
$$
\{Y_i(0), Y_i(1)\} \perp W_i | X_i
$$
]
]

holds, then
.block[
.small[
$$
\{Y_i(0), Y_i(1)\} \perp W_i | e(X_i),
$$
]
]

also holds.

--
  
- Given a vector of covariates that ensure unconfoundedness, adjustment for differences in propensity scores removes all biases associated with differences in the covariates.
  
  
---
## Propensity score: unconfoundedness

- `$e(X)$` can be viewed as a summary score of the observed covariates.

- This is great because causal inference can then be drawn through stratification,
matching, regression, etc. using the scalar `$e(X)$` instead of
the high dimensional covariates.

- The propensity score balances the **observed covariates**, but does not generally balance **unobserved covariates**.

- In most observational studies, the propensity score e(X) is unknown and thus needs to be estimated.

---
## Causal inference using propensity scores

Propensity score analysis (in observational studies) typically involves two stages:

- .hlight[Stage 1]. Estimate the propensity score: by a logistic regression or machine learning methods.

- .hlight[Stage 2]. Given the estimated propensity score, estimate the causal effects through one of these methods:
  + Stratification
  + Matching
  + Regression
  + Weighting (which we will not cover)
  + Mixed combinations of the above
  
  
---
class: center, middle

# Stage 1: estimating the propensity score

---
## Stage 1: estimating the propensity score

- The main purpose of estimating propensity score is to ensure .hlight[overlap and balance of covariates] between treatment groups, instead of “finding a perfect fit" of propensity score.

- As long as the important covariates are balanced, model overfitting is not a concern; underfit can be a problem however.

- Essentially any balancing score (not necessarily propensity score) would be good enough for practical use.

---
## Stage 1: estimating the propensity score

- A standard procedure for estimating propensity scores includes:
  1. initial fit;
  
--

2. discarding outliers (with too large or too small propensity scores);
  
--

3. check covariate balance; and
  
--

4. re-fit if necessary.

---
## Stage 1: estimating the propensity score

- .hlight[Step 1.] Estimate propensity score using a logistic regression:
.block[
.small[
`$$W_i | X_i \sim \textrm{Bernoulli}(\pi_i); \ \ \ \ \textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) = X_i\boldsymbol{\beta}.$$`
]
]

Include all covariates in this initial model or do a stepwise selection on the covariates and interactions to get an initial estimate of the propensity scores. That is,
.block[
.small[
`$$\hat{e}^0(X_i) = \dfrac{e^{X_i\hat{\boldsymbol{\beta}}}}{1 + e^{X_i\hat{\boldsymbol{\beta}}}}.$$`
]
]

Can also use machine learning methods.

---
## Stage 1: estimating the propensity score

- .hlight[Step 2.] Check overlap of propensity score between treatment groups. If necessary, .hlight[discard the observations with non-overlapping propensity scores].

- .hlight[Step 3.] Assess balance given by initial model in Step 1.

- .hlight[Step 4.] If one or more covariates are seriously unbalanced, include some of their higher order terms and/or interactions to re-fit the propensity score model and repeat Steps 1-3, until most covariates are balanced.

*.block[Note: There are situations where some important covariates will still not be completely balanced after repeated trials. Then they should be taken into account in Stage 2 (outcome stage) of propensity score analysis.]*
  
  
---
## Stage 1: estimating the propensity score

- In practice, balance checking in the PS estimation stage can be done via sub-classification/stratification, matching or weighting.
--

+ sub-classification/stratification: check the balance of all important covariates within `$K$` blocks of `$\hat{e}^0(X_i)$` based on its quantiles.
--

+ matching: check the balance of all important covariates in the matched sample.
--

+ in weighting, check the balance of the weighted covariates between treatment and control groups.
  
- The workflow is the same: fit initial model, check balance (sub-classification, matching or weighting), refit.

---
## Propensity score analysis workflow

---
class: center, middle

# Stage 2: estimating the causal effect

---
## Stage 2: stratification

- Given the estimated propensity score, we can estimate the causal estimands through  sub-classification/stratification, weighting or matching.

- Let's start with stratification.

- Recall that the result of 5 strata of a single covariate removes 90% bias.

- Stratification using propensity score as the summary score should have approximately the same effects.

---
## Stage 2: stratification

- Divide the subjects in to `$K$` strata by the corresponding quantiles of the estimated propensity scores.

- .hlight[ATE]: estimate ATE within each stratum and then average by the block size. That is,
.block[
.small[
`$$\hat{\tau}^{ATE} = \sum_{k=1}^K \left(\bar{Y}_{k,1} - \bar{Y}_{k,0} \right) \dfrac{N_{k,1}+N_{k,0}}{N},$$`
]
]

with `$N_{k,1}$` and `$N_{k,0}$` being the numbers of units in class `$k$` under treated and control, respectively.

- .hlight[ATT]: weight within-block ATE by proportion of treated units `$N_{k,1}/N_1$`.

- A variance estimator for `$\hat{\tau}^{ATE}$` is 
.block[
.small[
`$$\mathbb{Var}\left[\hat{\tau}^{ATE}\right] = \sum_{k=1}^K \left(\mathbb{Var}[\bar{Y}_{k,1}] - \mathbb{Var}[\bar{Y}_{k,0}] \right) \left(\dfrac{N_{k,1}+N_{k,0}}{N}\right)^2,$$`
]
]

or use bootstrap.

---
## Propensity score stratification: Remarks

- 5 blocks is usually not enough, consider higher number such as 10.

- Stratification is a coarsened version of matching.

- Empirical results from real applications and situations: usually not as good as matching or weighting.

- Good for cases with extreme outliers (smoothing): less sensitive, but also less efficient.

- Can be combined with regression: first estimate causal effects using regression within each block and then average the within-subclass estimates.

---
## Stage 2: matching

- In propensity score matching, potential matches are compared using (estimated) propensity score.

- 1-to-n closest neighbor matching is common when the control group is large compared to treatment group.

- In most software packages, the default is actually 1-to-1 closest neighbor matching.

- Pros: robust, matched pairs (so you can do within pair analysis).

- Sometimes, dimension reduction via the propensity score may be too drastic, recent methods advocate matching on the multivariate covariates directly.

- Nonetheless, this is what we will focus on for our minimum wage data.

---
## Stage 2: regression

- Remember the key propensity score property:
.block[
.small[
$$
\{Y_i(0), Y_i(1)\} \perp W_i | X_i \ \ \Rightarrow \ \ \{Y_i(0), Y_i(1)\} \perp W_i | e(X_i)
$$
]
]

- Idea: in a regression estimator, adjusting for `$e(X)$` instead of the whole `$X$`; thus in regression models of `$Y(w)$` use
`$e(X)$` as the single predictor.

- Clearly, modeling `$\mathbb{Pr}(Y(w)|\hat{e}(X))$` is simpler than modeling `$\mathbb{Pr}(Y(w)|X)$`; effectively more data to estimate essential parameters due to the dimension reduction.

- However,
  + we lose interpretation of the effects of individual covariates, e.g. age, sex; and
  
--

+ reduction to the one-dimensional propensity score may be too drastic.

---
## Stage 2: regression

- Idea: instead of using the estimated `$\hat{e}(X)$` as the single predictor, use it as an additional predictor in the model. That is, `$\mathbb{Pr}(Y(w)|X,\hat{e}(X))$`.

- Turns out that `$\mathbb{Pr}(Y(w)|X,\hat{e}(X))$` gives both efficiency and robustness.

- Also, if we are unable to achieve full balance on some of the predictors, using `$\mathbb{Pr}(Y(w)|X,\hat{e}(X))$` will help further control for those unbalance predictors.

- Empirical evidences (e.g. simulations) support.

- Why it works? Continuous version of regression after stratification

---
## The minimum wage analysis

- Now let's actually see how this works with the minimum wage example from last class.

.small[
Variables    | Description
:------------- | :------------
NJ.PA | indicator for which state the restaurant is in (1 if NJ, 0 if PA)
EmploymentPre | measures employment for each restaurant before the minimum wage raise in NJ
EmploymentPost | measures employment for each restaurant after the minimum wage raise in NJ
WagePre | measures the hourly wage for each restaurant before the minimum wage raise
BurgerKing | indicator for Burger King
KFC | indicator for KFC
Roys | indicator for Roys
Wendys | indicator for Wendys
]

- In-class analysis: move to the R script [here](https://ids-702-f19.github.io/Course-Website/slides/lec-slides/MinimumWage.R).

---
## Acknowledgements

These slides contain materials adapted from courses taught by Dr. Fan Li.