You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A minor issue that had me confused for a while. This concerns the R code for section 12.7.
In the calculations of SW^C and SW^{AC}, summaries of SW^C and SW^{AC} do not
match those of Python or Stata, nor does the summary of SW^{AC} matches that
given in the book (p. 159). In R we use all observations; in contrast, Python
sets the SW^C of the censored to 1 by decree (and I think Sata does not use the
censored when computing the SW^{C}).
Does it matter? Not for the final model fit. Since the censored observations
are not used (of course: they have NA for the dependent variable) they are
dropped when fitting the model in R. However, I find it confusing that this is
implicit, not explicit and, moreover, the intermediate summaries of SWs are not
correct.
What would I suggest?
Changing the computation of the SW^C, maybe as Python does or, even better
(?), setting the censored to NA. E.g. nhefs$sw.c[nhefs$cens == 1] <- NA
In the model fit for msm.sw explicitly using only the subset of individuals
that are not censored.
############### SW^C
nhefs$sw.c <- pn.cens / pd.cens
## Does not match Python output (chunk 77). Compare max and mean
summary(nhefs$sw.c)
## The Python code (chunk 76) does the equivalent of
nhefs$sw.c2 <- nhefs$sw.c
nhefs$sw.c2[nhefs$cens == 1] <- 1
## Compare to Python output: matches
summary(nhefs$sw.c2)
## (Though that is irrelevant, as Python will exclude the censored ones in the
## final fit)
## Stata also creates the sw_c variable, but only assigns for those uncensored
## (I am Stata ignorant; guessing the meaning of the expression).
############# SW^{AC}
nhefs$sw <- nhefs$sw.c * nhefs$sw.a
## Summary of this variable does not match the book, nor the stata output nor the
## Python output: max is 12.86, not 4.10 (same max in stata and Python)
summary(nhefs$sw)
## If we use only the uncensored, it matches Stata output and Python output (chunk
## 79)
summary(nhefs$sw[nhefs$cens == 0])
################ Why is the fit not affected?
## Because in this call
msm.sw <- geeglm(
wt82_71 ~ qsmk,
data = nhefs,
weights = sw,
id = seqn,
corstr = "independence"
)
## wt82_71 has NA, precisely those censored. And the summary shows it is fitted
## for 1566.
## Number of observations match Stata: see table of Stata
## output at the bottom of the output
## In Python, the final model is not using the censored ones:
## Chunk 80, expression sw_AC[nhefs_all.censored == 0]
The text was updated successfully, but these errors were encountered:
A minor issue that had me confused for a while. This concerns the R code for section 12.7.
In the calculations of SW^C and SW^{AC}, summaries of SW^C and SW^{AC} do not
match those of Python or Stata, nor does the summary of SW^{AC} matches that
given in the book (p. 159). In R we use all observations; in contrast, Python
sets the SW^C of the censored to 1 by decree (and I think Sata does not use the
censored when computing the SW^{C}).
Does it matter? Not for the final model fit. Since the censored observations
are not used (of course: they have NA for the dependent variable) they are
dropped when fitting the model in R. However, I find it confusing that this is
implicit, not explicit and, moreover, the intermediate summaries of SWs are not
correct.
What would I suggest?
(?), setting the censored to NA. E.g.
nhefs$sw.c[nhefs$cens == 1] <- NA
msm.sw
explicitly using only the subset of individualsthat are not censored.
Details follow for SW^C and SW^{AC}. I am using this for the Python code:
https://github.com/jrfiedler/causal_inference_python_code/blob/master/chapter12.ipynb
The text was updated successfully, but these errors were encountered: