Program 12.7: in R, SW^{AC} summaries do not match those in the book (nor in Stata or Python) #15

rdiaz02 · 2021-05-19T09:09:35Z

A minor issue that had me confused for a while. This concerns the R code for section 12.7.

In the calculations of SW^C and SW^{AC}, summaries of SW^C and SW^{AC} do not
match those of Python or Stata, nor does the summary of SW^{AC} matches that
given in the book (p. 159). In R we use all observations; in contrast, Python
sets the SW^C of the censored to 1 by decree (and I think Sata does not use the
censored when computing the SW^{C}).
Does it matter? Not for the final model fit. Since the censored observations
are not used (of course: they have NA for the dependent variable) they are
dropped when fitting the model in R. However, I find it confusing that this is
implicit, not explicit and, moreover, the intermediate summaries of SWs are not
correct.
What would I suggest?
- Changing the computation of the SW^C, maybe as Python does or, even better
  (?), setting the censored to NA. E.g. nhefs$sw.c[nhefs$cens == 1] <- NA
- In the model fit for msm.sw explicitly using only the subset of individuals
  that are not censored.

Details follow for SW^C and SW^{AC}. I am using this for the Python code:
https://github.com/jrfiedler/causal_inference_python_code/blob/master/chapter12.ipynb

############### SW^C
nhefs$sw.c <- pn.cens / pd.cens
## Does not match Python output (chunk 77). Compare max and mean
summary(nhefs$sw.c)


## The Python code (chunk 76) does the equivalent of
nhefs$sw.c2 <- nhefs$sw.c 
nhefs$sw.c2[nhefs$cens == 1] <- 1

## Compare to Python output: matches
summary(nhefs$sw.c2)
## (Though that is irrelevant, as Python will exclude the censored ones in the
## final fit)

## Stata also creates the sw_c variable, but only assigns for those uncensored 
## (I am Stata ignorant; guessing the meaning of the expression).


############# SW^{AC}

nhefs$sw <- nhefs$sw.c * nhefs$sw.a

## Summary of this variable does not match the book, nor the stata output nor the
##  Python output: max is 12.86, not 4.10 (same max in stata and Python)

summary(nhefs$sw)

## If we use only the uncensored, it matches Stata output and Python output (chunk
## 79)

summary(nhefs$sw[nhefs$cens == 0])


################ Why is the fit not affected?

## Because in this call
msm.sw <- geeglm(
  wt82_71 ~ qsmk,
  data = nhefs,
  weights = sw,
  id = seqn,
  corstr = "independence"
)

## wt82_71 has NA, precisely those censored. And the summary shows it is fitted
## for 1566.


##  Number of observations match Stata: see table of Stata
##  output at the bottom of the output

## In Python, the final model is not using the censored ones:
##     Chunk 80, expression sw_AC[nhefs_all.censored == 0]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Program 12.7: in R, SW^{AC} summaries do not match those in the book (nor in Stata or Python) #15

Program 12.7: in R, SW^{AC} summaries do not match those in the book (nor in Stata or Python) #15

rdiaz02 commented May 19, 2021

Program 12.7: in R, SW^{AC} summaries do not match those in the book (nor in Stata or Python) #15

Program 12.7: in R, SW^{AC} summaries do not match those in the book (nor in Stata or Python) #15

Comments

rdiaz02 commented May 19, 2021