Skip to content

Commit aeeb763

Browse files
author
Simon Bond
committed
real first commit
0 parents  commit aeeb763

37 files changed

+11722
-0
lines changed

Report v1.Rmd

Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
---
2+
title: "TACTIC-R Simulation"
3+
author: "Simon Bond"
4+
date: "`r format(Sys.Date(),'%d %b %Y')`"
5+
output:
6+
html_document:
7+
css: js/hideOutput.css
8+
editor_options:
9+
chunk_output_type: console
10+
---
11+
<script src="js/hideOutput.js"></script>
12+
13+
# Creating Data
14+
15+
We define some functions to simulate failure time data, which is discretized to days, 1,2,..14. Plus adding in some censoring at day 14, or earlier with a given dropout rate. There is also a site level random interecept on the log hazard scale.
16+
17+
18+
```{r setup, include=FALSE}
19+
knitr::opts_chunk$set(echo = TRUE, warning=FALSE)
20+
library(survival)
21+
library(magrittr)
22+
library(dplyr)
23+
library(ggplot2)
24+
library(lme4)
25+
library(knitr)
26+
library(kableExtra)
27+
28+
library(doParallel)
29+
library(foreach)
30+
cl<-makeCluster(parallel::detectCores())
31+
registerDoParallel(cl)
32+
getDoParWorkers()
33+
34+
read_chunk("sim_failure.R")
35+
read_chunk("priors.R")
36+
read_chunk("new_data.R")
37+
read_chunk("future_analysis.R")
38+
```
39+
40+
<div class="fold s">
41+
```{r functions}
42+
<<sim_failure>>
43+
df <- sim_failure(n_per_arm = 125,control_surv_rate = 0.7, hr=c(0.7,1.3)) %>%
44+
add_censor(admin_cens_time = 14, drop_out=0.1)
45+
46+
#add in site , Any other covars?
47+
#df$site <- rep(1:25,nrow(df)/25) %>% factor
48+
49+
50+
# surv_split the data
51+
52+
fail_times <- df %>% filter(status==1) %>% select(time) %>% unique() %>% unlist
53+
df_long <- survSplit(Surv(time, status)~rx+site, df, cut=fail_times)
54+
55+
56+
```
57+
</div>
58+
59+
So assume n=125 per arm, 3 arm with HR of 0.7 and 1.3. Site log Haz SD=0.1, and assuming 30% event rate in teh control arm at day 14.
60+
61+
```{r }
62+
df <- sim_failure(n_per_arm = 125,control_surv_rate = 0.7, hr=c(0.7,1.3), site_raneff_sd = 0.1) %>%
63+
add_censor(admin_cens_time = 14, drop_out=0.1)
64+
kable(df) %>% kable_styling() %>% scroll_box(height="200px")
65+
survfit(Surv(time,status)~rx, data=df) %>% plot(lty=1:3, mark.time=TRUE)
66+
legend(0,0.6,lty=1:3, legend=levels(df$rx))
67+
68+
```
69+
70+
# Digression into the Poisson trick
71+
72+
Any Cox Ph model can also be fitted using a GLM with a poisson distribution, if the individual patient history is broken up into individual time segments, and the dependent variable is an indicator as to whether they failed. The likelihoods coincide exactly, as the dependent variable only takes values 0 and 1. The latter can easily be implimented in Bayesian MCMC models/tools, but there is no readily available mapping directly to CoxPH. Alternative Bayes methods use a flexibility parametric smoothing approach to estimate the baseline hazard. But given that there are only 14 days, this is not required, and the implimentation is far more robust to use a GLM/Poisson model via the [rstanarm pacakge](https://mc-stan.org/rstanarm/) .
73+
74+
The match to within computing arithmetic accuracy in the fixed effects model, and when the random effects are added in, with different methods of quadrature, then they are practically identical.
75+
76+
```{r poisson}
77+
fail_times <- df %>% filter(status==1) %>% select(time) %>% unique() %>% unlist
78+
df_long <- survSplit(Surv(time, status)~rx+site, df, cut=fail_times)
79+
glm(status~-1+strata(time)+rx , family=poisson, data=df_long) %>% summary
80+
coxph(Surv(time,status)~rx, data=df, ties = "breslow") %>% summary
81+
82+
glmer(status~-1+strata(time)+rx +(1|site), family=poisson, data=df_long) %>% summary
83+
coxph(Surv(time,status)~rx+frailty.gaussian(site), data=df, ties = "breslow") %>% summary
84+
```
85+
86+
87+
# Bayesian version thereof
88+
89+
First I set some priors. These are defined by providing 5% and 95% quantiles, or 50% and 5/95% for the baseline hazard and hazard ratios, and then converting the quantiles to match up with a normal distribution. Currently the prior for the random effects uses the default option provide, which is likely to be too diffuse.
90+
91+
* baseline hazard: between 0.05 and 0.95 increment in cumulative hazard of failure in each day.
92+
* Vague HR between 0.3 and 3.33. Equivalent to trial with 7 events
93+
* Optimistic HR centred on 0.7, with 95% quantile at 1, equivlaent to 85-event trial.
94+
* Pessimistic HR centred on 1 with 5% quantile at 0.7, equivalent to 85-event trial.
95+
96+
<div class="fold s">
97+
```{r results=FALSE, message=FALSE}
98+
99+
library(rstanarm)
100+
#library(brms)
101+
options(mc.cores = parallel::detectCores())
102+
library(bayesplot)
103+
#theme_set(bayesplot::theme_default())
104+
<<priors>>
105+
prior_list <- list("vague"=vague, "optimistic"=optimistic, "pessimistic"=pessimistic)
106+
effects <- c("rxE1","rxE2")
107+
```
108+
</div>
109+
110+
Now we use the rstan and rstanarm to impliment MCMM regression, in the case of a vague prior first.
111+
112+
```{r bayes_fit, message=FALSE, results=FALSE}
113+
114+
# fits <- lapply(prior_list,
115+
# function(my_prior){
116+
# stan_glmer(status~-1+strata(time)+rx+(1|site), data=df_long, family=poisson,
117+
# chains=2, iter=1000,QR=FALSE, thin=2,
118+
# prior =my_prior)
119+
# }
120+
# )
121+
122+
fits <- foreach(i = 1:length(prior_list),
123+
.inorder = TRUE,
124+
.packages=c("rstanarm","survival"),
125+
.export=c("df_long")
126+
) %dopar% {
127+
stan_glmer(status~-1+strata(time)+rx+(1|site), data=df_long, family=poisson,
128+
chains=2, iter=2000,QR=FALSE, thin=2,
129+
prior =prior_list[[i]])
130+
}
131+
names(fits) <- names(prior_list)
132+
133+
output <- NULL
134+
for( index in 1:length(fits)){
135+
output <- c(output, knit_child("reporting_on_model.Rmd"))
136+
}
137+
```
138+
139+
`r knit_child(text=unlist(output))`
140+
141+
142+
143+
# PRedictive Power Calculation
144+
145+
This consist of a few steps iterated thousands of times to generate future theoretical data and then averaged
146+
147+
* Simulate from the posteriro distribution of the model paraemters
148+
* Use parameters to simulate failure times of future patients
149+
* analyse current data + simulated and estimate treatment effects
150+
151+
To use the Poisson-trick model, we first bootstrap the baseline covariates to generate an upper limit to the sample size needed. All 14 days are included, though, regarldess of the orignal patients failure time. Standard tools then can be used to generate the outcome variable from the predictive posterior distribution. The failure time is then taken to be the first day on which a non-zero outcome is observed. A Kaplan_Meier estimate of the censoring distributionis taken and this is then simulated from an applied on top of the failure times generated . We thus obtain a 1000 replicates of a 2000 patient data set.
152+
153+
For each data set we then use to consider only taken forwards a subset of the arms, and fixing the total sample size. The final analysis is performed using CoxPH (using a Bayesian MCMC version at this point would take a very long time to compute for a simulation ). Each simulation is thus declared to have show a statistically significant difference (one-side alpha=0.025). We average across teh simulatinos to caclulate the predictive power.
154+
155+
<div class="fold s">
156+
```{r power, message=FALSE}
157+
n_new <- 1500
158+
<<new_data>>
159+
# this gets us output : df_new_long, censor_dist, several functions add_*()
160+
<<future_analysis>>
161+
#creates some functions.
162+
163+
power_df <- data.frame(power=numeric(0),
164+
ss=numeric(0),
165+
arms=character(0),
166+
prior=character(0)
167+
)
168+
for( index in 1:length(fits)){
169+
#index <- 1
170+
fit <- fits[[index]]
171+
pred_data <- posterior_predict(fit, newdata = df_new_long)
172+
# all possible combination of arms, with Cx allways taken forwards
173+
arms <- combinations(levels(df$rx)[-1])
174+
arms <- lapply(arms, function(x){c("Cx",x)})
175+
ref_ss <- c(458,687,938,1407)
176+
#ans <- power(arms[[3]], n_total = ref_ss)
177+
power_list <- lapply(arms, power, n_total=ref_ss)
178+
arms_char <- sapply(arms, paste, collapse=", ")
179+
power_df <- rbind(power_df,
180+
data.frame(power=unlist(power_list),
181+
ss=rep(ref_ss, length(arms)),
182+
arms=rep(arms_char, rep(length(ref_ss),length(arms))),
183+
prior=names(fits)[index]
184+
)
185+
)
186+
}
187+
188+
ggplot(power_df, aes(x=ss,y=power, group=prior, colour=prior))+
189+
geom_line()+geom_point()+facet_grid(arms~.)
190+
191+
stopCluster(cl)
192+
```
193+
</div>
194+
195+
# Todo List
196+
197+
* Look into setting the prior for the random effects "better".
198+
* Futher look into options for parallelisation: gpu and [HPC docs](https://docs.hpc.cam.ac.uk/)
199+
* consider nesting of foreach() %:%
200+
* put in some timing metric
201+
* Embed in a simulation with decision rules applied and new data generated
202+
* ONGOING Vary the scenarios of the true data generating process

Report-v1.html

Lines changed: 9116 additions & 0 deletions
Large diffs are not rendered by default.

boundary_list.rds

262 Bytes
Binary file not shown.

computation_log.csv

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Scenario,model_sim_fit,Job_id,future_inference,Job_id,operating_char
2+
null,done,,done,25977068,done
3+
one_winner,done,,done,25977069,done
4+
medium_events,done,,done,25977070,done
5+
low_events,done,25950865,done,26048226,done
6+
worst_definition,done,25977378,done,26048264,done
7+
good_ugly,done,26309683,in prog,26364531,

create_scenario_df.R

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
scenario_df <- data.frame(scenario=c("one_winner","null"),
2+
n_per_arm = rep(125,2),control_surv_rate = rep(0.7,2), hr1=c(0.7,1), hr2=c(1.3,1),
3+
admin_cens_time = rep(14,2), drop_out=rep(0.1,2), stringsAsFactors = FALSE)
4+
5+
write.csv(scenario_df, file="scenario_df.csv",row.names = FALSE)

dash_board.txt

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
REMEMBER TO RUN
2+
rclone sync -u ~/Documents/ remote:PCDocuments --backup-dir remote:oldVersions/PCDocuments
3+
4+
AT THE END OF A DAYS work on LOCAL UBUNTI PC
5+
Else any deleted files from the googledrive copy will get put back at the start of the next day.
6+
7+
8+
9+
module load r-3.6.1-gcc-5.4.0-zrytncq
10+
11+
12+
cd ~/Documents/Work ; rsync -trvp --exclude "library/" --exclude "results/" --exclude ".git/" --exclude ".Rproj.user/" --delete --delete-after -e ssh "tactic" [email protected]:
13+
14+
15+
rsync -t -e ssh [email protected]:tactic/* .
16+
17+
#rsync -t -e ssh [email protected]:rds/hpc-work/* results/
18+
scp [email protected]:rds/hpc-work/* results
19+
scp [email protected]:rds/hpc-work/null/inference* results/null
20+
# and need to delete the file first...
21+
22+
#DONE be good to sort out openssh keys or similar, to avoid typing in password all teh tyime
23+
#DONE have set up command line to read a scenario. Want auto creation of result folders.., and #DONE createa copy of the script verbatim.
24+
25+
cd tactic
26+
27+
./slurm_wraper "null" 1
28+
#sbatch slurm_model_sim_fit_skylake.txt
29+
squeue -u sjb277
30+
scancel nnnn
31+
watch -n 30 squeue -u sjb277
32+
33+
# jobs that get 'held'
34+
scontrol release 25027069_671 25027069_760
35+
36+
37+
Stage 2:
38+
39+
Then apply decision rules and future data, analysis
40+
41+
#chmod +x slurm_wrapper_future_inference.sh
42+
./slurm_wrapper_future_inference "null" 1
43+
#sbatch slurm_model_sim_fit_skylake.txt
44+
squeue -u sjb277
45+
scancel nnnn
46+
watch -n 30 squeue -u sjb277
47+
48+
Rscript operating_characteristics.R good_ugly
49+
50+
51+
# jobs that get 'held'
52+
scontrol release 25027069_671 25027069_760
53+
54+
55+
56+
STage 3:
57+
58+
correlate the tools at the first interim with the ultimate inference..
59+
check type 1 error to start with. Generall Ooperating characteristics
60+
61+
got nearly all the scenarios with OCs, now join together and visualise.
62+
63+
64+

decision_rules.R

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
## Decision Rules
2+
3+
posterior probabilities with thresholds
4+
5+
optimum arms at fixed SS
6+
7+
optim arms and SS - find if the ss can dip
8+
9+
add in more interims
10+
11+
add in peto rule
12+
13+
do rules conflict
14+
15+
set a threshold for the number of tests passed?
16+
17+
18+
Tools to act on decsions
19+
- which arms
20+
- SS
21+
- more interims
22+
23+
Generate data
24+
analyse
25+
- p-value, estimate, SE ?
26+

docs/910015.pdf

149 KB
Binary file not shown.

docs/922505a.pdf

63.9 KB
Binary file not shown.

0 commit comments

Comments
 (0)