Meta-Analysis-Cox-PH.Rmd

---
title: "Meta-Analysis: Cox-PH Models"
date: "04/11/2023"
output:
  html_document:
    toc: true
    toc_float: true
    code_download: true
    theme: spacelab
---

This notebook conducts and demonstrates the results of the Random-effects meta-analysis performed on each healthcare system's Cox-proportional hazards (Cox-PH) model results.

Of note, this analysis is performed only on adult patients due to the pediatric cohort's low incidence of both poor health outcomes and neurological diagnoses during COVID-19 hospitalization.

```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(metafor)
library(meta)
library(cowplot)
source("R/forester_custom.R")
source("R/analyze_survival.R")
```

# **Import Data from each Healthcare system**

Each participating healthcare system ran the analysis locally on patient level data using our [customized R package](https://github.com/covidclinical/Phase2.1NeuroRPackage). The output of each local analysis consisted of only the summary results (rather than patient level data). The model summary results from each healthcare system were then aggregated and are imported and then combined using the following code:

```{r message=FALSE, warning=FALSE}
# read in files from results folder 
# this folder contains all of the local healthcare system level analyses
rdas <- list.files(
  path = "results",
  pattern = ".rda",
  full.names = TRUE
)
for (rda in rdas) {
  load(rda)
}

rm(rdas, rda)

# create a list of participating healthcare systems from our study tracking spreadsheet
site_google_url <- "https://docs.google.com/spreadsheets/d/1epcYNd_0jCUMktOHf8mz5v651zy1JALD6PgzobrGWDY/edit?usp=sharing"

# load site parameters
site_params <- googlesheets4::read_sheet(site_google_url, sheet = 1)
site_avails <- googlesheets4::read_sheet(site_google_url, sheet = 2)

# filter the list of sites who ran the analysis
sorted_sites <- site_avails %>%
  filter(!is.na(date_v4_received)) %>%
  pull(siteid) %>%
  paste("results", sep = "_")

# list sites without race
sites_wo_race <- site_params %>%
  filter(!include_race) %>%
  pull(siteid)

# combine all rda files with 'results' in name
results <- mget(ls(pattern = "results"))

## read in pt counts
pt_counts_df <- read.csv('tables/site_pt_counts.csv')
```
# **Pre-processing**

## Define parameters to specify models to evaluate

During the locally run analysis, each healthcare system ran a series of Cox-PH models in order to perform supplementary analysis to evaluate the optimal comorbidity adjustment method and censoring cutoff point.

```{r}
outcomes <- c("time_first_discharge_reg_elix", "deceased_reg_elix")

comorb_adj <- c("lpca", "score", "ind")

censor_cut <- c("30", "60", "90")
```

## Identify sites to include in the analysis

In this analysis, we only included a healthcare system if they had \>= 3 adult patients with a neurological diagnosis.

```{r}
sites_adult <- pt_counts_df %>% 
  filter(population == "Adult") %>% 
  mutate(neuro_sum = n_var_Central + n_var_Peripheral) %>% 
  filter(!neuro_sum < 3) %>% 
  distinct(site) %>% 
  mutate(site = gsub("_results", "", site))
```

# **Random-Effects Meta-Analysis**

## Get Cox-PH model results

This notebook specifies the parameters to load in the Cox-PH results using a `censor_cutoff`='90' (days since initial hospitalization) and the `comorb_method`='ind' which adjusts for patient comorbidity burden by treating each individual comorbidity of the Elixhauser Comorbidity Index as an individual model covariate (binary variable indicating whether or not the patient previously was diagnosed with the respective comorbidity).

```{r}
cox_results_adults <- list()

for (outcome_i in outcomes) {
  cox_results_adults[[outcome_i]] <-
    results %>%
    lapply(get_cox_row, population = "adults", comorb_method = "ind", censor_cutoff = "90", outcome = outcome_i) %>%
    bind_rows() %>%
    mutate(outcome = outcome_i)
}
```

## Random-effects meta-analysis - CNS

Next, we will perform a random-effects meta-analysis to evaluate the risk of mortality or prolonged hospital stay in the CNS group vs the NNC group. The following code will compute the random-effects model for each outcome (mortality and discharge).

Notes:

-   Set `sm = "HR"` when estimate is logHR (which is coef in our model).

    -   Search logHR in [cran](https://cran.r-project.org/web/packages/meta/meta.pdf) to see an example

```{r message=FALSE, warning=FALSE}
## adults 
meta_results_adults_cns <- list()

for (outcome_i in outcomes) {
  meta_results_adults_cns[[outcome_i]] <-
    cox_results_adults[[outcome_i]] %>%
    filter(site %in% sites_adult$site) %>% 
    bind_rows() %>%
    data.frame() %>%
    filter(variable == "neuro_postCentral") %>%
    metagen(
      TE = coef,
      seTE = se.coef.,
      data = .,
      sm = "HR", # hazard ratios
      comb.random = TRUE,
      comb.fixed = FALSE,
      method.tau = "DL", # default tau method
      hakn = FALSE,
      prediction = TRUE,
      studlab = site
    )
}
```

## Random-effects meta-analysis - PNS

Next, we will perform a random-effects meta-analysis to evaluate the risk of mortality or prolonged hospital stay in the PNS group vs the NNC group. The following code will compute the random-effects model for each outcome (mortality and discharge).

```{r message=FALSE, warning=FALSE}
## adults
meta_results_adults_pns <- list()

for (outcome_i in outcomes) {
  meta_results_adults_pns[[outcome_i]] <-
    cox_results_adults[[outcome_i]] %>%
    filter(site %in% sites_adult$site) %>% 
    bind_rows() %>%
    data.frame() %>%
    filter(variable == "neuro_postPeripheral") %>%
    metagen(
      TE = coef,
      seTE = se.coef.,
      data = .,
      sm = "HR", # hazard ratios
      comb.random = TRUE,
      comb.fixed = FALSE,
      method.tau = "DL", # default tau method
      hakn = FALSE,
      prediction = TRUE,
      studlab = site
    )
}
```

## Format meta-analysis results

Here we will format the output of the meta-analysis for both CNS and PNS models

```{r}
# cns
meta_results_adults_cns_df <- format_meta(meta_results_adults_cns)

# pns
meta_results_adults_pns_df <- format_meta(meta_results_adults_pns)
```

## Save model weights to generate the survival curves

Next, we will save the model weights for each healthcare system in order to help us generate the associated survival curves in our downstream analysis.

The benefit of the random-effects model is that it allows us to weight each healthcare system by considering the healthcare system variance ${s^2}$ and the between healthcare system variance $tau^2$.

**4.1.2.1 Estimators of the Between-Study Heterogeneity:** <https://bookdown.org/MathiasHarrer/Doing_Meta_Analysis_in_R/pooling-es.html>

> "inverse of the variance is used to determine the weight of each study"
>
> -   *where "study" in our analysis is the healthcare system*
>
> For random effects, the calculation is performed by calculating an adjusted random-effects weight w∗k for each observation. Where k is the healthcare system:

$w^*_k = \frac{1}{s^2 + tau^2}$

```{r}
# save all cns weights
for (outcome_i in outcomes) {
  
  save_weights(meta_results_df = meta_results_adults_cns_df,
               population = "adults", 
               outcome = outcome_i, 
               comorb_method = "ind", 
               censor_cutoff = "90",
               cns_pns = "CNS")
}


# save all pns weights
for (outcome_i in outcomes) {
  
  save_weights(meta_results_df = meta_results_adults_pns_df,
               population = "adults", 
               outcome = outcome_i, 
               comorb_method = "ind", 
               censor_cutoff = "90",
               cns_pns = "PNS")
}
```

# **Analyze Risk via Forest Plots**

**Prepare data for forest plots**

```{r}
# adults
meta_results_adults_cns_df_forest = prepare_data_forest_plots(meta_results_adults_cns_df)
meta_results_adults_pns_df_forest = prepare_data_forest_plots(meta_results_adults_pns_df)
```

**Tidy up results**

```{r}
meta_results_adults_cns_df_forest_tidy <- meta_results_adults_cns_df_forest %>% 
  distinct(analysis, TE.random, lower.random, upper.random,`p-value`, lower.predict, upper.predict, I2, lower.I2, upper.I2, H, lower.H, upper.H, tau2, lower.tau2, upper.tau2, CI) %>% 
  mutate(neuro_status = "CNS") %>% 
  select(neuro_status, everything())

meta_results_adults_pns_df_forest_tidy <- meta_results_adults_pns_df_forest %>% 
  distinct(analysis, TE.random, lower.random, upper.random,`p-value`, lower.predict, upper.predict, I2, lower.I2, upper.I2, H, lower.H, upper.H, tau2, lower.tau2, upper.tau2, CI) %>% 
  mutate(neuro_status = "PNS") %>% 
  select(neuro_status, everything())

meta_results_tidy <- rbind(meta_results_adults_cns_df_forest_tidy, 
                           meta_results_adults_pns_df_forest_tidy)

write.csv(meta_results_tidy, "tables/Table2_meta_results.csv", row.names = FALSE)
```

Next, we will create a scale function to identify the min and max CI for each healthcare system. This will help us exclude healthcare systems with values of 'Inf' or very large confidence intervals.

```{r}
scale_plot <- function(meta_results_df_forest, site_exclude = NULL) {
  scale <- meta_results_df_forest %>%
    filter(
      !Site %in% site_exclude,
      #!analysis %in% invalid_sites,
      !CI.Low == "Inf",
      !CI.High == "Inf"
    ) %>%
    mutate(
      min_est = min(CI.Low, na.rm = TRUE) - 1,
      max_est = max(CI.High, na.rm = TRUE) + 1,
      min_est = if_else(min_est + 1 >= -1, min(CI.Low, na.rm = TRUE)-0.1, min_est),
      min_est = if_else(min_est > 1, 0.5, min_est),
      max_est = if_else(max_est - 1 < 1, 1.1, max_est)) %>%
    distinct(min_est, max_est)

  return(scale)
}
```

## CNS - Risk of Discharge

```{r fig.width=8, message=FALSE, warning=FALSE}
te.random = meta_results_adults_cns_df_forest %>% 
  filter(analysis == "time_first_discharge_reg_elix") %>% 
  distinct(TE.random) %>% 
  as.numeric() %>% 
  round(.,2)

te.random.ci_lower = meta_results_adults_cns_df_forest %>% 
  filter(analysis == "time_first_discharge_reg_elix") %>% 
  distinct(lower.random) %>% 
  as.numeric() 

te.random.ci_upper = meta_results_adults_cns_df_forest %>% 
  filter(analysis == "time_first_discharge_reg_elix") %>% 
  distinct(upper.random) %>% 
  as.numeric() 

length_stay_first_results <- meta_results_adults_cns_df_forest %>%
  slice(grep(paste("time_first_discharge_reg_elix", collapse = "|"), analysis)) %>% 
  #add_row(.before = 1) %>%
  add_row(.after = nrow(.)) %>% 
  mutate(Site = ifelse(is.na(Site), "Effect size", Site),
         `Hazard Ratio` = ifelse(is.na(`Hazard Ratio`), round(te.random, 2), `Hazard Ratio`),
         `p-value` = ifelse(`p-value` < 0.001, "< .001*", round(`p-value`, 3))) %>% 
  fill(`p-value`, .direction = "down") %>% 
  mutate(`p-value` = ifelse(Site == "Effect size", `p-value`, ""),
         CI.Low = ifelse(Site == "Effect size", te.random.ci_lower, CI.Low),
         CI.High = ifelse(Site == "Effect size", te.random.ci_upper, CI.High),
         `Hazard Ratio ` = paste(`Hazard Ratio`, '(', CI.Low, ',', CI.High, ')'))


## define forest plot shapes
# shape #16 is normal circle; #18 is diamond
shapes <- rep(16, times = nrow(length_stay_first_results))
shapes[length(shapes)] <- 18
sizes <- rep(3.25, times = nrow(length_stay_first_results))
sizes[length(sizes)] <- 5
scale <- scale_plot(length_stay_first_results, site_exclude = c("ICSM"))

# note: `Hazard Ratio ` with confidence intervals has an extra space and used by `right_side_data` param
p1 <- forester(
  left_side_data = length_stay_first_results %>% 
    select(Site) %>% 
    rename("Healthcare System" = "Site"),
  estimate = length_stay_first_results$`Hazard Ratio`,
  ci_low = length_stay_first_results$CI.Low,
  ci_high = length_stay_first_results$CI.High,
  right_side_data = length_stay_first_results[, c("Hazard Ratio ", "p-value")],
  display = TRUE,
  nudge_x = .5,
  font_family = "arial",
  arrows = TRUE,
  arrow_labels = c("Decreased Risk", "Increased Risk"),
  null_line_at = 1,
  xlim = c(scale$min_est, scale$max_est),
  #xbreaks = c(scale$min_est, 1, scale$max_est),
  point_sizes = sizes,
  point_shapes = shapes,
  file_path = here::here(paste0("figures/forestplot_adult_cns_discharge.png")))

p1
```

## CNS - Risk of Mortality

```{r fig.width=8, message=FALSE, warning=FALSE}
te.random = meta_results_adults_cns_df_forest %>% 
  filter(analysis == "deceased_reg_elix") %>% 
  distinct(TE.random) %>% 
  as.numeric() %>% 
  round(.,2)

te.random.ci_lower = meta_results_adults_cns_df_forest %>% 
  filter(analysis == "deceased_reg_elix") %>% 
  distinct(lower.random) %>% 
  as.numeric() 

te.random.ci_upper = meta_results_adults_cns_df_forest %>% 
  filter(analysis == "deceased_reg_elix") %>% 
  distinct(upper.random) %>% 
  as.numeric() 

deceased_results <- meta_results_adults_cns_df_forest %>%
  slice(grep(paste("deceased_reg_elix", collapse = "|"), analysis)) %>% 
  #add_row(.before = 1) %>%
  add_row(.after = nrow(.)) %>% 
  mutate(Site = ifelse(is.na(Site), "Effect size", Site),
         `Hazard Ratio` = ifelse(is.na(`Hazard Ratio`), round(te.random, 2), `Hazard Ratio`),
         `p-value` = ifelse(`p-value` < 0.001, "< .001*", round(`p-value`, 3))) %>% 
  fill(`p-value`, .direction = "down") %>% 
  mutate(`p-value` = ifelse(Site == "Effect size", `p-value`, ""),
         CI.Low = ifelse(Site == "Effect size", te.random.ci_lower, CI.Low),
         CI.High = ifelse(Site == "Effect size", te.random.ci_upper, CI.High),
         `Hazard Ratio ` = paste(`Hazard Ratio`, '(', CI.Low, ',', CI.High, ')'))


## define forest plot shapes
# shape #16 is normal circle; #18 is diamond
shapes <- rep(16, times = nrow(deceased_results))
shapes[length(shapes)] <- 18
sizes <- rep(3.25, times = nrow(deceased_results))
sizes[length(sizes)] <- 5
scale <- scale_plot(deceased_results, site_exclude = c("ICSM"))

# note: `Hazard Ratio ` with confidence intervals has an extra space and used by `right_side_data` param
p2 <- forester(
  left_side_data = deceased_results %>% 
    select(Site) %>% 
    rename("Healthcare System" = "Site"),
  estimate = deceased_results$`Hazard Ratio`,
  ci_low = deceased_results$CI.Low,
  ci_high = deceased_results$CI.High,
  right_side_data = deceased_results[, c("Hazard Ratio ", "p-value")],
  display = TRUE,
  nudge_x = .5,
  font_family = "arial",
  arrows = TRUE,
  arrow_labels = c("Decreased Risk", "Increased Risk"),
  null_line_at = 1,
  xlim = c(scale$min_est, scale$max_est),
  #xbreaks = c(scale$min_est, 1, scale$max_est),
  point_sizes = sizes,
  point_shapes = shapes,
  file_path = here::here(paste0("figures/forestplot_adult_cns_mortality.png")))

p2
```

## PNS - Risk of Discharge

```{r fig.width=8, message=FALSE, warning=FALSE}
te.random = meta_results_adults_pns_df_forest %>% 
  filter(analysis == "time_first_discharge_reg_elix") %>% 
  distinct(TE.random) %>% 
  as.numeric() %>% 
  round(.,2)

te.random.ci_lower = meta_results_adults_pns_df_forest %>% 
  filter(analysis == "time_first_discharge_reg_elix") %>% 
  distinct(lower.random) %>% 
  as.numeric() 

te.random.ci_upper = meta_results_adults_pns_df_forest %>% 
  filter(analysis == "time_first_discharge_reg_elix") %>% 
  distinct(upper.random) %>% 
  as.numeric() 

length_stay_first_results <- meta_results_adults_pns_df_forest %>%
  slice(grep(paste("time_first_discharge_reg_elix", collapse = "|"), analysis)) %>% 
  #add_row(.before = 1) %>%
  add_row(.after = nrow(.)) %>% 
  mutate(Site = ifelse(is.na(Site), "Effect size", Site),
         `Hazard Ratio` = ifelse(is.na(`Hazard Ratio`), round(te.random, 2), `Hazard Ratio`),
         `p-value` = ifelse(`p-value` < 0.001, "< .001*", round(`p-value`, 3))) %>% 
  fill(`p-value`, .direction = "down") %>% 
  mutate(`p-value` = ifelse(Site == "Effect size", `p-value`, ""),
         CI.Low = ifelse(Site == "Effect size", te.random.ci_lower, CI.Low),
         CI.High = ifelse(Site == "Effect size", te.random.ci_upper, CI.High),
         `Hazard Ratio ` = paste(`Hazard Ratio`, '(', CI.Low, ',', CI.High, ')'))


## define forest plot shapes
# shape #16 is normal circle; #18 is diamond
shapes <- rep(16, times = nrow(length_stay_first_results))
shapes[length(shapes)] <- 18
sizes <- rep(3.25, times = nrow(length_stay_first_results))
sizes[length(sizes)] <- 5
scale <- scale_plot(length_stay_first_results, site_exclude = c("ICSM"))

# note: `Hazard Ratio ` with confidence intervals has an extra space and used by `right_side_data` param
p3 <- forester(
  left_side_data = length_stay_first_results %>% 
    select(Site) %>% 
    rename("Healthcare System" = "Site"),
  estimate = length_stay_first_results$`Hazard Ratio`,
  ci_low = length_stay_first_results$CI.Low,
  ci_high = length_stay_first_results$CI.High,
  right_side_data = length_stay_first_results[, c("Hazard Ratio ", "p-value")],
  display = TRUE,
  nudge_x = .5,
  font_family = "arial",
  arrows = TRUE,
  arrow_labels = c("Decreased Risk", "Increased Risk"),
  null_line_at = 1,
  xlim = c(scale$min_est, scale$max_est),
  #xbreaks = c(scale$min_est, 1, scale$max_est),
  point_sizes = sizes,
  point_shapes = shapes,
  file_path = here::here(paste0("figures/forestplot_adult_pns_discharge.png")))

p3
```

## PNS - Risk of Mortality

```{r fig.width=8, message=FALSE, warning=FALSE}
te.random = meta_results_adults_pns_df_forest %>% 
  filter(analysis == "deceased_reg_elix") %>% 
  distinct(TE.random) %>% 
  as.numeric() %>% 
  round(.,2)

te.random.ci_lower = meta_results_adults_pns_df_forest %>% 
  filter(analysis == "deceased_reg_elix") %>% 
  distinct(lower.random) %>% 
  as.numeric() 

te.random.ci_upper = meta_results_adults_pns_df_forest %>% 
  filter(analysis == "deceased_reg_elix") %>% 
  distinct(upper.random) %>% 
  as.numeric() 

deceased_results <- meta_results_adults_pns_df_forest %>%
  slice(grep(paste("deceased_reg_elix", collapse = "|"), analysis)) %>% 
  #add_row(.before = 1) %>%
  add_row(.after = nrow(.)) %>% 
  mutate(Site = ifelse(is.na(Site), "Effect size", Site),
         `Hazard Ratio` = ifelse(is.na(`Hazard Ratio`), round(te.random, 2), `Hazard Ratio`),
         `p-value` = ifelse(`p-value` < 0.001, "< .001*", round(`p-value`, 3))) %>% 
  fill(`p-value`, .direction = "down") %>% 
  mutate(`p-value` = ifelse(Site == "Effect size", `p-value`, ""),
         CI.Low = ifelse(Site == "Effect size", te.random.ci_lower, CI.Low),
         CI.High = ifelse(Site == "Effect size", te.random.ci_upper, CI.High),
         `Hazard Ratio ` = paste(`Hazard Ratio`, '(', CI.Low, ',', CI.High, ')'))


## define forest plot shapes
# shape #16 is normal circle; #18 is diamond
shapes <- rep(16, times = nrow(deceased_results))
shapes[length(shapes)] <- 18
sizes <- rep(3.25, times = nrow(deceased_results))
sizes[length(sizes)] <- 5
scale <- scale_plot(deceased_results, site_exclude = c("ICSM", "HPG23"))

# note: `Hazard Ratio ` with confidence intervals has an extra space and used by `right_side_data` param
p4 <- forester(
  left_side_data = deceased_results %>% 
    select(Site) %>% 
    rename("Healthcare System" = "Site"),
  estimate = deceased_results$`Hazard Ratio`,
  ci_low = deceased_results$CI.Low,
  ci_high = deceased_results$CI.High,
  right_side_data = deceased_results[, c("Hazard Ratio ", "p-value")],
  display = TRUE,
  nudge_x = .5,
  font_family = "arial",
  arrows = TRUE,
  arrow_labels = c("Decreased Risk", "Increased Risk"),
  null_line_at = 1,
  xlim = c(scale$min_est, scale$max_est),
  #xbreaks = c(scale$min_est, 1, scale$max_est),
  point_sizes = sizes,
  point_shapes = shapes,
  file_path = here::here(paste0("figures/forestplot_adult_pns_mortality.png")))

p4
```