-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathkmdata-intro.Rmd
186 lines (147 loc) · 5.22 KB
/
kmdata-intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
title: "kmdata - intro"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{kmdata - intro}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r, include=FALSE}
library('knitr')
opts_chunk$set(
cache = FALSE, fig.align = 'center', dev = 'png',
fig.width=9, fig.height=7, echo = TRUE, message = FALSE
)
```
## Data sets available
```{r}
library('kmdata')
data(package = 'kmdata')
```
This will show a list of the data sets available. Data objects can be
references by `ATTENTION_2A` or `` `TRIBE(2)_2A` `` (with backticks) for data
names with special characters.
The name consists of the study short-name and figure identifier:
```{r}
data <- ls('package:kmdata', pattern = '^[A-Z]')
cbind(
name = data,
study = gsub('_.*', '', data),
figure = gsub('.*_', '', data)
)[1:5, ]
```
For example, study "ATTENTION" has two figures: "2A" and "2B." Study "ACTSCC"
has only one figure: "2A." If data sets sourced from multi-panel figures, the
name will look similar to study "ATTENTION" with figure IDs "2A" and "2B."
All data sets are listed in `kmdata_key` along with some useful metadata for
each including the journal and publication identifiers, outcomes and study
arms, quality of the re-capitulated data, and other information.
## Working with the data
Each data set contains the same format for consistency:
```{r, echo=FALSE}
knitr::kable(
data.frame(
time = 'time-to-event (in units)', event = 'event indicator (0/1)',
arm = 'treatment arm identifier (e.g., arm-1 vs arm-2)'
)
)
```
The time unit, event type, and treatment arms can be found in the help page
for each data set, e.g., `?ACT1_2A`. Additionally, the data objects contain
metadata stored as attributes:
```{r}
head(ACT1_2A)
attr(ACT1_2A, 'event')
attributes(ACT1_2A)[-(1:3)]
```
Data may be examined and plotted using the built-in functions `summary` and
`kmplot`.
```{r, fig.show='hold', message = TRUE}
summary(ACT1_2A)
kmplot(ACT1_2A)
```
## Selecting data
The `kmdata` package contains a function, `select_kmdata`, to easily search
and filter data sets which share common features. Any of the columns in
`kmdata_key` may be used to filter.
For example, if we wanted a list of lung cancer data sets with overall
survival (OS) in months with fewer than 500 patients reporting at least a
1.2 hazard ratio for treatment compared to a reference arm, we can use the
following:
```{r}
select_kmdata(
Cancer %in% 'Lung' &
Outcome %in% 'OS' &
Units %in% 'months' &
ReportedSampleSize < 500 &
HazardRatio >= 1.2,
return = 'name'
)
```
By default, `select_kmdata` returns only the names of the data sets for
reference individually (i.e., `select_kmdata(..., return = 'name')`), but
it can also return the matching rows of `kmdata_key` or the matching data
sets as a list.
```{r, fig.height=11, fig.width=12, results='hide'}
key <- select_kmdata(
Cancer %in% 'Lung' &
Outcome %in% 'OS' &
Units %in% 'months' &
ReportedSampleSize < 500 &
HazardRatio >= 1.2,
return = 'key'
)
dat <- select_kmdata(
Cancer %in% 'Lung' &
Outcome %in% 'OS' &
Units %in% 'months' &
ReportedSampleSize < 500 &
HazardRatio >= 1.2,
return = 'data'
)
par(mfrow = n2mfrow(length(dat)))
for (dd in dat)
kmplot(dd)
```
## Data quality
Each figure and data set contains a quality score which represents how well
the re-capitulated agrees with the original publication. Scores range from
0 (worst) to 100% (best) and are an aggregation of four metrics: hazard ratio,
total events, median time-to-event, and number at-risk.
Each metric is score from 0 (worst) to 3 (best); the maximum score per figure
may vary with the metrics reported in the original publication. For example,
if only one was reported, the maximum score is 3/3.
A score of 3 points is given per metric per figure if the re-capitulated
metric is no more than 5% different than the published, 2 points are given
if the metric is 5-10% different, 1 point for 10-20%, and 0 points for more
than 20% different.
| % difference from publication | Quality points per metric |
|-------------------------------:|---------------------------:|
| 0-5 | 3 |
| 5-10 | 2 |
| 10-20 | 1 |
| > 20 | 0 |
---
## References
The publications and figures available in this package are listed below by
first author.
<details>
<summary>Click to expand</summary>
```{r, echo=FALSE}
cit <- system.file('docs', 'Citations_final.xlsx', package = 'kmdata')
cit <- as.data.frame(readxl::read_excel(cit, skip = 1L))
cit <- within(cit, {
Title <- gsub('^.*?\\.\\s+|\\.\\s+[A-z ]+\\d{4};.*$', '', Reference)
Author <- gsub('^([^.]+\\.)|.', '\\1', Reference)
PubData <- gsub('([A-z ]+\\s+\\d{4};.*)\\.$|.', '\\1', Reference)
Journal <- gsub('(.*?)\\d{4};|.', '\\1', PubData)
Year <- gsub('(\\d{4});|.', '\\1', PubData)
Location <- gsub('^.*?\\d{4};\\s*', '\\1', PubData)
})[, c('PMID', 'Author', 'Journal', 'Year', 'Title', 'Location')]
cit <- cit[order(cit$Author), ]
rownames(cit) <- NULL
knitr::kable(cit, format = 'markdown', caption = 'List of publications.')
```
</details>