Processing math: 40%
+ - 0:00:00
Notes for current slide
Notes for next slide

Observational studies: propensity scores methods

Dr. Olanrewaju Michael Akande

Nov 7, 2019

1 / 25

Announcements

  • Reminder: please remember to respond to survey II.

  • No more HW5; more time to work on individual projects.

  • Final lab next Friday; will cover causal inference.

Outline

  • Propensity scores: definition and properties

  • Estimation

  • PS stratification

  • PS matching

  • PS regression

2 / 25

Propensity scores: definition and properties

3 / 25

Propensity scores

  • The propensity score (ps) is defined as the conditional probability of receiving a treatment given pre-treatment covariates X.
4 / 25

Propensity scores

  • The propensity score (ps) is defined as the conditional probability of receiving a treatment given pre-treatment covariates X.

  • That is,

    e(X)=Pr[W=1|X]=E[W|X],

    where X=(X1,,Xp) is the vector of p covariates/predictors.

4 / 25

Propensity scores

  • The propensity score (ps) is defined as the conditional probability of receiving a treatment given pre-treatment covariates X.

  • That is,

    e(X)=Pr[W=1|X]=E[W|X],

    where X=(X1,,Xp) is the vector of p covariates/predictors.

  • Propensity score is a probability, analogous to a summary statistic.

4 / 25

Propensity scores

  • The propensity score (ps) is defined as the conditional probability of receiving a treatment given pre-treatment covariates X.

  • That is,

    e(X)=Pr[W=1|X]=E[W|X],

    where X=(X1,,Xp) is the vector of p covariates/predictors.

  • Propensity score is a probability, analogous to a summary statistic.

  • Propensity score has really nice properties which makes it desirable to use within our causal inference framework.

4 / 25

Balancing property of propensity score

  • Property 1. The propensity score e(X) balances the distribution of all X between the treatment groups:

    WX|e(X)

5 / 25

Balancing property of propensity score

  • Property 1. The propensity score e(X) balances the distribution of all X between the treatment groups:

    WX|e(X)

  • Equivalently,

    Pr[Wi=1|Xi,e(Xi)]=Pr[Wi=1|e(Xi)].

5 / 25

Balancing property of propensity score

  • Property 1. The propensity score e(X) balances the distribution of all X between the treatment groups:

    WX|e(X)

  • Equivalently,

    Pr[Wi=1|Xi,e(Xi)]=Pr[Wi=1|e(Xi)].

  • The propensity score is not the only balancing score. Generally, a balancing score b(x) is a function of the covariates such that:

    WX|b(X)

5 / 25

Remarks on the balancing property

  • Rosenbaum and Rubin (1983) show that all balancing scores are a function of e(X).
6 / 25

Remarks on the balancing property

  • Rosenbaum and Rubin (1983) show that all balancing scores are a function of e(X).

  • If a subclass of units or a matched treatment-control pair are homogeneous in e(X), then the treatment and control units have the same distribution of X.

6 / 25

Remarks on the balancing property

  • Rosenbaum and Rubin (1983) show that all balancing scores are a function of e(X).

  • If a subclass of units or a matched treatment-control pair are homogeneous in e(X), then the treatment and control units have the same distribution of X.

  • The balancing property is a statement on the distribution of X, NOT on assignment mechanism or potential outcomes.

6 / 25

Propensity score: unconfoundedness

  • Property 2. If W is unconfounded given X, then W is unconfounded given e(X), i.e.,
7 / 25

Propensity score: unconfoundedness

  • Property 2. If W is unconfounded given X, then W is unconfounded given e(X), i.e.,

  • That is, if

    Yi(0),Yi(1)Wi|Xi

    holds, then

    Yi(0),Yi(1)Wi|e(Xi),

    also holds.

7 / 25

Propensity score: unconfoundedness

  • Property 2. If W is unconfounded given X, then W is unconfounded given e(X), i.e.,

  • That is, if

    Yi(0),Yi(1)Wi|Xi

    holds, then

    Yi(0),Yi(1)Wi|e(Xi),

    also holds.

  • Given a vector of covariates that ensure unconfoundedness, adjustment for differences in propensity scores removes all biases associated with differences in the covariates.

7 / 25

Propensity score: unconfoundedness

  • e(X) can be viewed as a summary score of the observed covariates.
8 / 25

Propensity score: unconfoundedness

  • e(X) can be viewed as a summary score of the observed covariates.

  • This is great because causal inference can then be drawn through stratification, matching, regression, etc. using the scalar e(X) instead of the high dimensional covariates.

8 / 25

Propensity score: unconfoundedness

  • e(X) can be viewed as a summary score of the observed covariates.

  • This is great because causal inference can then be drawn through stratification, matching, regression, etc. using the scalar e(X) instead of the high dimensional covariates.

  • The propensity score balances the observed covariates, but does not generally balance unobserved covariates.

8 / 25

Propensity score: unconfoundedness

  • e(X) can be viewed as a summary score of the observed covariates.

  • This is great because causal inference can then be drawn through stratification, matching, regression, etc. using the scalar e(X) instead of the high dimensional covariates.

  • The propensity score balances the observed covariates, but does not generally balance unobserved covariates.

  • In most observational studies, the propensity score e(X) is unknown and thus needs to be estimated.

8 / 25

Causal inference using propensity scores

Propensity score analysis (in observational studies) typically involves two stages:

9 / 25

Causal inference using propensity scores

Propensity score analysis (in observational studies) typically involves two stages:

  • Stage 1. Estimate the propensity score: by a logistic regression or machine learning methods.
9 / 25

Causal inference using propensity scores

Propensity score analysis (in observational studies) typically involves two stages:

  • Stage 1. Estimate the propensity score: by a logistic regression or machine learning methods.

  • Stage 2. Given the estimated propensity score, estimate the causal effects through one of these methods:

    • Stratification
    • Matching
    • Regression
    • Weighting (which we will not cover)
    • Mixed combinations of the above
9 / 25

Stage 1: estimating the propensity score

10 / 25

Stage 1: estimating the propensity score

  • The main purpose of estimating propensity score is to ensure overlap and balance of covariates between treatment groups, instead of “finding a perfect fit" of propensity score.
11 / 25

Stage 1: estimating the propensity score

  • The main purpose of estimating propensity score is to ensure overlap and balance of covariates between treatment groups, instead of “finding a perfect fit" of propensity score.

  • As long as the important covariates are balanced, model overfitting is not a concern; underfit can be a problem however.

11 / 25

Stage 1: estimating the propensity score

  • The main purpose of estimating propensity score is to ensure overlap and balance of covariates between treatment groups, instead of “finding a perfect fit" of propensity score.

  • As long as the important covariates are balanced, model overfitting is not a concern; underfit can be a problem however.

  • Essentially any balancing score (not necessarily propensity score) would be good enough for practical use.

11 / 25

Stage 1: estimating the propensity score

  • A standard procedure for estimating propensity scores includes:
    1. initial fit;
12 / 25

Stage 1: estimating the propensity score

  • A standard procedure for estimating propensity scores includes:

    1. initial fit;

    2. discarding outliers (with too large or too small propensity scores);

12 / 25

Stage 1: estimating the propensity score

  • A standard procedure for estimating propensity scores includes:

    1. initial fit;

    2. discarding outliers (with too large or too small propensity scores);

    3. check covariate balance; and

12 / 25

Stage 1: estimating the propensity score

  • A standard procedure for estimating propensity scores includes:

    1. initial fit;

    2. discarding outliers (with too large or too small propensity scores);

    3. check covariate balance; and

    4. re-fit if necessary.

12 / 25

Stage 1: estimating the propensity score

  • Step 1. Estimate propensity score using a logistic regression:

    W_i | X_i \sim \textrm{Bernoulli}(\pi_i); \ \ \ \ \textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) = X_i\boldsymbol{\beta}.

13 / 25

Stage 1: estimating the propensity score

  • Step 1. Estimate propensity score using a logistic regression:

    W_i | X_i \sim \textrm{Bernoulli}(\pi_i); \ \ \ \ \textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) = X_i\boldsymbol{\beta}.

    Include all covariates in this initial model or do a stepwise selection on the covariates and interactions to get an initial estimate of the propensity scores. That is,

    \hat{e}^0(X_i) = \dfrac{e^{X_i\hat{\boldsymbol{\beta}}}}{1 + e^{X_i\hat{\boldsymbol{\beta}}}}.

13 / 25

Stage 1: estimating the propensity score

  • Step 1. Estimate propensity score using a logistic regression:

    W_i | X_i \sim \textrm{Bernoulli}(\pi_i); \ \ \ \ \textrm{log}\left(\dfrac{\pi_i}{1-\pi_i}\right) = X_i\boldsymbol{\beta}.

    Include all covariates in this initial model or do a stepwise selection on the covariates and interactions to get an initial estimate of the propensity scores. That is,

    \hat{e}^0(X_i) = \dfrac{e^{X_i\hat{\boldsymbol{\beta}}}}{1 + e^{X_i\hat{\boldsymbol{\beta}}}}.

    Can also use machine learning methods.

13 / 25

Stage 1: estimating the propensity score

  • Step 2. Check overlap of propensity score between treatment groups. If necessary, discard the observations with non-overlapping propensity scores.
14 / 25

Stage 1: estimating the propensity score

  • Step 2. Check overlap of propensity score between treatment groups. If necessary, discard the observations with non-overlapping propensity scores.

  • Step 3. Assess balance given by initial model in Step 1.

14 / 25

Stage 1: estimating the propensity score

  • Step 2. Check overlap of propensity score between treatment groups. If necessary, discard the observations with non-overlapping propensity scores.

  • Step 3. Assess balance given by initial model in Step 1.

  • Step 4. If one or more covariates are seriously unbalanced, include some of their higher order terms and/or interactions to re-fit the propensity score model and repeat Steps 1-3, until most covariates are balanced.

14 / 25

Stage 1: estimating the propensity score

  • Step 2. Check overlap of propensity score between treatment groups. If necessary, discard the observations with non-overlapping propensity scores.

  • Step 3. Assess balance given by initial model in Step 1.

  • Step 4. If one or more covariates are seriously unbalanced, include some of their higher order terms and/or interactions to re-fit the propensity score model and repeat Steps 1-3, until most covariates are balanced.

    Note: There are situations where some important covariates will still not be completely balanced after repeated trials. Then they should be taken into account in Stage 2 (outcome stage) of propensity score analysis.

14 / 25

Stage 1: estimating the propensity score

  • In practice, balance checking in the PS estimation stage can be done via sub-classification/stratification, matching or weighting.
15 / 25

Stage 1: estimating the propensity score

  • In practice, balance checking in the PS estimation stage can be done via sub-classification/stratification, matching or weighting.
    • sub-classification/stratification: check the balance of all important covariates within K blocks of \hat{e}^0(X_i) based on its quantiles.
15 / 25

Stage 1: estimating the propensity score

  • In practice, balance checking in the PS estimation stage can be done via sub-classification/stratification, matching or weighting.
    • sub-classification/stratification: check the balance of all important covariates within K blocks of \hat{e}^0(X_i) based on its quantiles.
    • matching: check the balance of all important covariates in the matched sample.
15 / 25

Stage 1: estimating the propensity score

  • In practice, balance checking in the PS estimation stage can be done via sub-classification/stratification, matching or weighting.

    • sub-classification/stratification: check the balance of all important covariates within K blocks of \hat{e}^0(X_i) based on its quantiles.
    • matching: check the balance of all important covariates in the matched sample.
    • in weighting, check the balance of the weighted covariates between treatment and control groups.
  • The workflow is the same: fit initial model, check balance (sub-classification, matching or weighting), refit.

15 / 25

Propensity score analysis workflow

16 / 25

Stage 2: estimating the causal effect

17 / 25

Stage 2: stratification

  • Given the estimated propensity score, we can estimate the causal estimands through sub-classification/stratification, weighting or matching.
18 / 25

Stage 2: stratification

  • Given the estimated propensity score, we can estimate the causal estimands through sub-classification/stratification, weighting or matching.

  • Let's start with stratification.

18 / 25

Stage 2: stratification

  • Given the estimated propensity score, we can estimate the causal estimands through sub-classification/stratification, weighting or matching.

  • Let's start with stratification.

  • Recall that the result of 5 strata of a single covariate removes 90% bias.

18 / 25

Stage 2: stratification

  • Given the estimated propensity score, we can estimate the causal estimands through sub-classification/stratification, weighting or matching.

  • Let's start with stratification.

  • Recall that the result of 5 strata of a single covariate removes 90% bias.

  • Stratification using propensity score as the summary score should have approximately the same effects.

18 / 25

Stage 2: stratification

  • Divide the subjects in to K strata by the corresponding quantiles of the estimated propensity scores.
19 / 25

Stage 2: stratification

  • Divide the subjects in to K strata by the corresponding quantiles of the estimated propensity scores.

  • ATE: estimate ATE within each stratum and then average by the block size. That is,

    \hat{\tau}^{ATE} = \sum_{k=1}^K \left(\bar{Y}_{k,1} - \bar{Y}_{k,0} \right) \dfrac{N_{k,1}+N_{k,0}}{N},

    with N_{k,1} and N_{k,0} being the numbers of units in class k under treated and control, respectively.

19 / 25

Stage 2: stratification

  • Divide the subjects in to K strata by the corresponding quantiles of the estimated propensity scores.

  • ATE: estimate ATE within each stratum and then average by the block size. That is,

    \hat{\tau}^{ATE} = \sum_{k=1}^K \left(\bar{Y}_{k,1} - \bar{Y}_{k,0} \right) \dfrac{N_{k,1}+N_{k,0}}{N},

    with N_{k,1} and N_{k,0} being the numbers of units in class k under treated and control, respectively.

  • ATT: weight within-block ATE by proportion of treated units N_{k,1}/N_1.

19 / 25

Stage 2: stratification

  • Divide the subjects in to K strata by the corresponding quantiles of the estimated propensity scores.

  • ATE: estimate ATE within each stratum and then average by the block size. That is,

    \hat{\tau}^{ATE} = \sum_{k=1}^K \left(\bar{Y}_{k,1} - \bar{Y}_{k,0} \right) \dfrac{N_{k,1}+N_{k,0}}{N},

    with N_{k,1} and N_{k,0} being the numbers of units in class k under treated and control, respectively.

  • ATT: weight within-block ATE by proportion of treated units N_{k,1}/N_1.

  • A variance estimator for \hat{\tau}^{ATE} is

    \mathbb{Var}\left[\hat{\tau}^{ATE}\right] = \sum_{k=1}^K \left(\mathbb{Var}[\bar{Y}_{k,1}] - \mathbb{Var}[\bar{Y}_{k,0}] \right) \left(\dfrac{N_{k,1}+N_{k,0}}{N}\right)^2,

    or use bootstrap.

19 / 25

Propensity score stratification: Remarks

  • 5 blocks is usually not enough, consider higher number such as 10.
20 / 25

Propensity score stratification: Remarks

  • 5 blocks is usually not enough, consider higher number such as 10.

  • Stratification is a coarsened version of matching.

20 / 25

Propensity score stratification: Remarks

  • 5 blocks is usually not enough, consider higher number such as 10.

  • Stratification is a coarsened version of matching.

  • Empirical results from real applications and situations: usually not as good as matching or weighting.

20 / 25

Propensity score stratification: Remarks

  • 5 blocks is usually not enough, consider higher number such as 10.

  • Stratification is a coarsened version of matching.

  • Empirical results from real applications and situations: usually not as good as matching or weighting.

  • Good for cases with extreme outliers (smoothing): less sensitive, but also less efficient.

20 / 25

Propensity score stratification: Remarks

  • 5 blocks is usually not enough, consider higher number such as 10.

  • Stratification is a coarsened version of matching.

  • Empirical results from real applications and situations: usually not as good as matching or weighting.

  • Good for cases with extreme outliers (smoothing): less sensitive, but also less efficient.

  • Can be combined with regression: first estimate causal effects using regression within each block and then average the within-subclass estimates.

20 / 25

Stage 2: matching

  • In propensity score matching, potential matches are compared using (estimated) propensity score.
21 / 25

Stage 2: matching

  • In propensity score matching, potential matches are compared using (estimated) propensity score.

  • 1-to-n closest neighbor matching is common when the control group is large compared to treatment group.

21 / 25

Stage 2: matching

  • In propensity score matching, potential matches are compared using (estimated) propensity score.

  • 1-to-n closest neighbor matching is common when the control group is large compared to treatment group.

  • In most software packages, the default is actually 1-to-1 closest neighbor matching.

21 / 25

Stage 2: matching

  • In propensity score matching, potential matches are compared using (estimated) propensity score.

  • 1-to-n closest neighbor matching is common when the control group is large compared to treatment group.

  • In most software packages, the default is actually 1-to-1 closest neighbor matching.

  • Pros: robust, matched pairs (so you can do within pair analysis).

21 / 25

Stage 2: matching

  • In propensity score matching, potential matches are compared using (estimated) propensity score.

  • 1-to-n closest neighbor matching is common when the control group is large compared to treatment group.

  • In most software packages, the default is actually 1-to-1 closest neighbor matching.

  • Pros: robust, matched pairs (so you can do within pair analysis).

  • Sometimes, dimension reduction via the propensity score may be too drastic, recent methods advocate matching on the multivariate covariates directly.

21 / 25

Stage 2: matching

  • In propensity score matching, potential matches are compared using (estimated) propensity score.

  • 1-to-n closest neighbor matching is common when the control group is large compared to treatment group.

  • In most software packages, the default is actually 1-to-1 closest neighbor matching.

  • Pros: robust, matched pairs (so you can do within pair analysis).

  • Sometimes, dimension reduction via the propensity score may be too drastic, recent methods advocate matching on the multivariate covariates directly.

  • Nonetheless, this is what we will focus on for our minimum wage data.

21 / 25

Stage 2: regression

  • Remember the key propensity score property:

    {Y_i(0), Y_i(1)} \perp W_i | X_i \ \ \Rightarrow \ \ {Y_i(0), Y_i(1)} \perp W_i | e(X_i)

22 / 25

Stage 2: regression

  • Remember the key propensity score property:

    {Y_i(0), Y_i(1)} \perp W_i | X_i \ \ \Rightarrow \ \ {Y_i(0), Y_i(1)} \perp W_i | e(X_i)

  • Idea: in a regression estimator, adjusting for e(X) instead of the whole X; thus in regression models of Y(w) use e(X) as the single predictor.

22 / 25

Stage 2: regression

  • Remember the key propensity score property:

    {Y_i(0), Y_i(1)} \perp W_i | X_i \ \ \Rightarrow \ \ {Y_i(0), Y_i(1)} \perp W_i | e(X_i)

  • Idea: in a regression estimator, adjusting for e(X) instead of the whole X; thus in regression models of Y(w) use e(X) as the single predictor.

  • Clearly, modeling \mathbb{Pr}(Y(w)|\hat{e}(X)) is simpler than modeling \mathbb{Pr}(Y(w)|X); effectively more data to estimate essential parameters due to the dimension reduction.

22 / 25

Stage 2: regression

  • Remember the key propensity score property:

    {Y_i(0), Y_i(1)} \perp W_i | X_i \ \ \Rightarrow \ \ {Y_i(0), Y_i(1)} \perp W_i | e(X_i)

  • Idea: in a regression estimator, adjusting for e(X) instead of the whole X; thus in regression models of Y(w) use e(X) as the single predictor.

  • Clearly, modeling \mathbb{Pr}(Y(w)|\hat{e}(X)) is simpler than modeling \mathbb{Pr}(Y(w)|X); effectively more data to estimate essential parameters due to the dimension reduction.

  • However,

    • we lose interpretation of the effects of individual covariates, e.g. age, sex; and
22 / 25

Stage 2: regression

  • Remember the key propensity score property:

    {Y_i(0), Y_i(1)} \perp W_i | X_i \ \ \Rightarrow \ \ {Y_i(0), Y_i(1)} \perp W_i | e(X_i)

  • Idea: in a regression estimator, adjusting for e(X) instead of the whole X; thus in regression models of Y(w) use e(X) as the single predictor.

  • Clearly, modeling \mathbb{Pr}(Y(w)|\hat{e}(X)) is simpler than modeling \mathbb{Pr}(Y(w)|X); effectively more data to estimate essential parameters due to the dimension reduction.

  • However,

    • we lose interpretation of the effects of individual covariates, e.g. age, sex; and

    • reduction to the one-dimensional propensity score may be too drastic.

22 / 25

Stage 2: regression

  • Idea: instead of using the estimated \hat{e}(X) as the single predictor, use it as an additional predictor in the model. That is, \mathbb{Pr}(Y(w)|X,\hat{e}(X)).
23 / 25

Stage 2: regression

  • Idea: instead of using the estimated \hat{e}(X) as the single predictor, use it as an additional predictor in the model. That is, \mathbb{Pr}(Y(w)|X,\hat{e}(X)).

  • Turns out that \mathbb{Pr}(Y(w)|X,\hat{e}(X)) gives both efficiency and robustness.

23 / 25

Stage 2: regression

  • Idea: instead of using the estimated \hat{e}(X) as the single predictor, use it as an additional predictor in the model. That is, \mathbb{Pr}(Y(w)|X,\hat{e}(X)).

  • Turns out that \mathbb{Pr}(Y(w)|X,\hat{e}(X)) gives both efficiency and robustness.

  • Also, if we are unable to achieve full balance on some of the predictors, using \mathbb{Pr}(Y(w)|X,\hat{e}(X)) will help further control for those unbalance predictors.

23 / 25

Stage 2: regression

  • Idea: instead of using the estimated \hat{e}(X) as the single predictor, use it as an additional predictor in the model. That is, \mathbb{Pr}(Y(w)|X,\hat{e}(X)).

  • Turns out that \mathbb{Pr}(Y(w)|X,\hat{e}(X)) gives both efficiency and robustness.

  • Also, if we are unable to achieve full balance on some of the predictors, using \mathbb{Pr}(Y(w)|X,\hat{e}(X)) will help further control for those unbalance predictors.

  • Empirical evidences (e.g. simulations) support.

23 / 25

Stage 2: regression

  • Idea: instead of using the estimated \hat{e}(X) as the single predictor, use it as an additional predictor in the model. That is, \mathbb{Pr}(Y(w)|X,\hat{e}(X)).

  • Turns out that \mathbb{Pr}(Y(w)|X,\hat{e}(X)) gives both efficiency and robustness.

  • Also, if we are unable to achieve full balance on some of the predictors, using \mathbb{Pr}(Y(w)|X,\hat{e}(X)) will help further control for those unbalance predictors.

  • Empirical evidences (e.g. simulations) support.

  • Why it works? Continuous version of regression after stratification

23 / 25

The minimum wage analysis

  • Now let's actually see how this works with the minimum wage example from last class.

    Variables Description
    NJ.PA indicator for which state the restaurant is in (1 if NJ, 0 if PA)
    EmploymentPre measures employment for each restaurant before the minimum wage raise in NJ
    EmploymentPost measures employment for each restaurant after the minimum wage raise in NJ
    WagePre measures the hourly wage for each restaurant before the minimum wage raise
    BurgerKing indicator for Burger King
    KFC indicator for KFC
    Roys indicator for Roys
    Wendys indicator for Wendys
  • In-class analysis: move to the R script here.

24 / 25

Acknowledgements

These slides contain materials adapted from courses taught by Dr. Fan Li.

25 / 25

Announcements

  • Reminder: please remember to respond to survey II.

  • No more HW5; more time to work on individual projects.

  • Final lab next Friday; will cover causal inference.

Outline

  • Propensity scores: definition and properties

  • Estimation

  • PS stratification

  • PS matching

  • PS regression

2 / 25
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow