THIS IS AN OPTIONAL HOMEWORK. IF YOU CHOOSE TO SUBMIT IT, THE SCORE WILL REPLACE THE LOWEST SCORE OUT OF THE FOUR PREVIOUS METHODS AND DATA ANALYSIS ASSIGNMENTS, AS LONG AS IT EXCEEDS AT LEAST ONE OF THE SCORES FROM THOSE PREVIOUS ASSIGNMENTS.
This assignment involves causal inference. Note that this is an individual assignment, so you must work alone. You can discuss basic details with classmates but your final work must be yours alone! Please type your solutions using R Markdown, LaTeX or any other word processor but be sure to knit or convert the final output file to “.pdf”. Submissions should be made on gradescope: go to Assignments \(\rightarrow\) Methods and Data Analysis 5.
ASTHMA PATIENTS IN CALIFORNIA.
The data for this question can be found in the file “Asthma.txt” on Sakai.
The data set is from a study to compare the quality of services provided by two physician groups for asthma patients in California. Specifically, for patient i, let Yi(w) be the quality of service as judged by the patient (1=satisfactory, 0=not satisfactory), if the patient is served by physician group \(w\), for \(w = 1,2\). The patients who visit the two groups can differ, and so a set of covariates are measured. The variables in the data are:
Variable | Description |
---|---|
pg (treatment assignment) | physician group; values = 1 and 2 |
i_age | age (continuous) |
i_sex | sex (binary) |
i_race | race (categorical) |
i_educ | education (categorical) |
i_insu | insurance status (categorical) |
i_drug | drug coverage status (categorical) |
i_seve | severity (categorical) |
com_t | total number of comorbidity (numeric) |
pcs_sd | standard physical comorbidity scale (continuous) |
mcs_sd | standard mental comorbidity scale (continuous) |
i_aqoc (outcome) | satisfaction status of patient (binary) |
Which of the methods do you consider most reliable (or feel most comfortable with) for estimating the causal effect? Why?
pg
to a binary variable with values 0 and 1.com_t
, pcs_sd
, and mcs_sd
and use the centered versions for all analyses.pg
and i.aqoc
, and the meaning of both are very clear here.i_sex
, i_educ
and i_seve
by doing Data$i_sex <- relevel(factor(Data$i_sex), ref = 1)
, Data$i_educ <- relevel(factor(Data$i_educ), ref = 5)
, and Data$i_seve <- relevel(factor(Data$i_seve), ref = 3)
.pg = 1
) and not ATE, because we will discard observations in the control group for which we cannot find matches for.20 points.