Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #2480 derive pair of variables #2503

Merged
Show file tree
Hide file tree
Changes from 83 commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
97b58c7
initial draft
StefanThoma Sep 3, 2024
b79a558
now using exprs
StefanThoma Sep 5, 2024
5d2b1bc
added unit tests
StefanThoma Sep 5, 2024
95c1cc2
added manual and namespace
StefanThoma Sep 5, 2024
f1f1984
lint and style
StefanThoma Sep 5, 2024
dd38c99
udpate manual
StefanThoma Sep 5, 2024
8b58650
styler
StefanThoma Sep 5, 2024
7c8d4fb
update manual
StefanThoma Sep 5, 2024
b2a1a26
fix select
StefanThoma Sep 5, 2024
269d3b2
fix example
StefanThoma Sep 6, 2024
0433537
update news.md
StefanThoma Sep 6, 2024
ba088b1
fix VSTEST
StefanThoma Sep 6, 2024
ad514a0
fix example
StefanThoma Sep 6, 2024
7d4a3e3
fix documentation
StefanThoma Sep 6, 2024
74f1e81
update filter
StefanThoma Sep 6, 2024
234199e
update manual
StefanThoma Sep 6, 2024
ce4b739
added keyword and family
StefanThoma Sep 6, 2024
3d4bdb9
update ad_advs.R template
StefanThoma Sep 10, 2024
379c831
updated functionality
StefanThoma Sep 12, 2024
881f6f1
show recursive function
StefanThoma Sep 13, 2024
c19e4ee
fix bug and improve documentation
StefanThoma Sep 20, 2024
265ecea
updated tests
StefanThoma Sep 20, 2024
b5041be
add warning for forgotten by_vars
StefanThoma Sep 20, 2024
dcfe16a
style & lint
StefanThoma Sep 20, 2024
693e1a3
update manual
StefanThoma Sep 20, 2024
c186776
added helper documentation
StefanThoma Sep 20, 2024
b461453
Merge branch 'main' into 2480-feature-request-function-for-creating-v…
StefanThoma Sep 20, 2024
0f1fef6
update
StefanThoma Sep 20, 2024
d1db0a1
half way there :)
StefanThoma Sep 23, 2024
1dedbb7
clean up ad_adpc
StefanThoma Sep 23, 2024
3f6233d
UPDATE VIGNETTES
StefanThoma Sep 23, 2024
4267dfc
fix error
StefanThoma Sep 23, 2024
fa21e81
styler
StefanThoma Sep 23, 2024
0dbbe67
Update vignettes/adsl.Rmd
StefanThoma Sep 24, 2024
4fc3180
Update vignettes/bds_exposure.Rmd
StefanThoma Sep 24, 2024
d09a9c7
Update R/derive_vars_cat.R
StefanThoma Sep 24, 2024
daa6ead
Update R/derive_vars_cat.R
StefanThoma Sep 24, 2024
c2c2207
Update NEWS.md
StefanThoma Sep 24, 2024
acbf169
Update R/derive_vars_cat.R
StefanThoma Sep 24, 2024
33ffc12
Update inst/templates/ad_advs.R
StefanThoma Sep 24, 2024
cc210f1
update
StefanThoma Sep 24, 2024
bb74943
Update vignettes/bds_finding.Rmd
StefanThoma Sep 24, 2024
f4ec115
update style of tibbles
StefanThoma Sep 24, 2024
8e727d2
update error message for definition
StefanThoma Sep 24, 2024
a3d5751
Simplify assertions.
StefanThoma Sep 24, 2024
7a912c9
updated example
StefanThoma Sep 24, 2024
69d7ea4
add alternative way
StefanThoma Sep 24, 2024
2533193
improve wording
StefanThoma Sep 24, 2024
98043bf
Update R/derive_vars_cat.R
StefanThoma Sep 24, 2024
ede7f99
Update NEWS.md
StefanThoma Sep 24, 2024
16b2c38
Update vignettes/generic.Rmd
StefanThoma Sep 24, 2024
126a17d
update man
StefanThoma Sep 24, 2024
930f7f5
remove reliance on assertthat
StefanThoma Sep 24, 2024
9b537db
styler
StefanThoma Sep 24, 2024
e6213a7
Update vignettes/bds_finding.Rmd
StefanThoma Sep 25, 2024
a6011db
Update vignettes/bds_exposure.Rmd
StefanThoma Sep 25, 2024
030e478
Update vignettes/adsl.Rmd
StefanThoma Sep 25, 2024
bef70a4
Update R/derive_vars_cat.R
StefanThoma Sep 25, 2024
2f06d60
Update R/derive_vars_cat.R
StefanThoma Sep 25, 2024
e933605
Update R/derive_vars_cat.R
StefanThoma Sep 25, 2024
8b0775b
update format
StefanThoma Sep 25, 2024
381ec72
update format
StefanThoma Sep 25, 2024
dfc5289
format
StefanThoma Sep 25, 2024
089e90a
Update R/derive_vars_cat.R
StefanThoma Sep 25, 2024
eac433f
style & spelling
StefanThoma Sep 25, 2024
e98c228
update tests to tibbles
StefanThoma Sep 25, 2024
ab4390b
fix template
StefanThoma Sep 25, 2024
7c089eb
switch function
StefanThoma Sep 25, 2024
023a535
fix alignment
StefanThoma Sep 25, 2024
7e27b3a
tryout
StefanThoma Sep 25, 2024
867b66f
add ::: to internal function
StefanThoma Sep 25, 2024
8660179
Update R/derive_vars_cat.R
StefanThoma Sep 26, 2024
8caf227
Update R/derive_vars_cat.R
StefanThoma Sep 26, 2024
e7882f0
Update R/derive_vars_cat.R
StefanThoma Sep 26, 2024
96bdd8d
aligned again
StefanThoma Sep 26, 2024
15c1007
update manual
StefanThoma Sep 26, 2024
5a6c6ec
Update R/derive_vars_cat.R
StefanThoma Sep 30, 2024
0b803ee
Update R/derive_vars_cat.R
StefanThoma Sep 30, 2024
b3a925a
Update R/derive_vars_cat.R
StefanThoma Sep 30, 2024
05f973e
Update R/derive_vars_cat.R
StefanThoma Sep 30, 2024
09097ac
added test for error
StefanThoma Sep 30, 2024
e846b67
lint
StefanThoma Sep 30, 2024
5ab8252
update to cli_warn
StefanThoma Sep 30, 2024
e0a7995
fix style
StefanThoma Sep 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ export(derive_var_trtdurd)
export(derive_var_trtemfl)
export(derive_vars_aage)
export(derive_vars_atc)
export(derive_vars_cat)
export(derive_vars_computed)
export(derive_vars_crit_flag)
export(derive_vars_dt)
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

## New Features

- New function `derive_vars_cat()` for deriving pairs of variables or more, e.g.
`AVALCATx` & `AVALCAxN`. (#2480)
- New function `derive_vars_crit_flag()` for deriving criterion flag variables
(`CRITy`, `CRITyFL`, `CRITyFLN`). (#2468)

Expand Down
257 changes: 257 additions & 0 deletions R/derive_vars_cat.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
#' Derive Categorization Variables Like `AVALCATy` and `AVALCAyN`
#' @param dataset
#' `r roxygen_param_dataset(expected_vars = c("by_vars", "definition"))`
#' @param definition List of expressions created by `exprs()`.
#' Must be in rectangular format and specified using the same syntax as when creating
#' a `tibble` using the `tribble()` function.
#' The `definition` object will be converted to a `tibble` using `tribble()` inside this function.
#'
#' Must contain:
#' - the column `condition` which will be converted to a logical expression and
#' will be used on the `dataset` input.
#' - at least one additional column with the new column name and
#' the category value(s) used by the logical expression.
#' - the column specified in `by_vars` (if `by_vars` is specified)
#'
#' e.g. if `by_vars` is not specified:
#'
#' ```{r}
#' #| eval: false
#' exprs(~condition, ~AVALCAT1, ~AVALCA1N,
#' AVAL >= 140, ">=140 cm", 1,
#' AVAL < 140, "<140 cm", 2)
#' ```
#'
#' e.g. if `by_vars` is specified as `exprs(VSTEST)`:
#'
#' ```{r}
#' #| eval: false
#' exprs(~VSTEST, ~condition, ~AVALCAT1, ~AVALCA1N,
#' "Height", AVAL >= 140, ">=140 cm", 1,
#' "Height", AVAL < 140, "<140 cm", 2)
#' ```
#'
#' @param by_vars list of expressions with one element. `NULL` by default.
StefanThoma marked this conversation as resolved.
Show resolved Hide resolved
#' Allows for specifying by groups, e.g. `exprs(PARAMCD)`.
#' Variable must be present in both `dataset` and `definition`.
#' The conditions in `definition` are applied only to those records that match `by_vars`.
#' The categorization variables are set to `NA` for records
#' not matching any of the by groups in `definition`.
#'
#'
#' @details
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR - but just thinking should we alert the user that something is amiss with their logic if they don't abide by our warning? Not sure if we can detect this first?? @bundfussr

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bms63 , do you mean issuing a warning if there are records which doesn't match any of the by groups?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the overlap discussion that is in the documentation - like something should be finer in logic??

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this would be helpful. There are use cases where it makes sense to use overlapping categories, e.g.,

~condition,    ~ANRIND,
AVAL > ANRHI,  "high",
AVAL <= ANRHI, "normal",
TRUE,          "unknown"

Issuing a warning in this case would force the users to suppress the warning or replace TRUE with is.na(AVAL) | is.na(ANRHI).

#' If conditions are overlapping, the row order of `definitions` must be carefully considered.
#' The **first** match will determine the category.
#' i.e. if
#'
#' `AVAL = 155`
#'
#' and the `definition` is:
#'
#' ```{r}
#' #| eval: false
#' definition <- exprs(
#' ~VSTEST, ~condition, ~AVALCAT1, ~AVALCA1N,
#' "Height", AVAL > 170, ">170 cm", 1,
#' "Height", AVAL <= 170, "<=170 cm", 2,
#' "Height", AVAL <= 160, "<=160 cm", 3
#' )
#' ```
#' then `AVALCAT1` will be `"<=170 cm"`, as this is the first match for `AVAL`.
#' If you specify:
#'
#' ```{r}
#' #| eval: false
#' definition <- exprs(
#' ~VSTEST, ~condition, ~AVALCAT1, ~AVALCA1N,
#' "Height", AVAL <= 160, "<=160 cm", 3,
#' "Height", AVAL <= 170, "<=170 cm", 2,
#' "Height", AVAL > 170, ">170 cm", 1
#' )
#' ```
#'
#' Then `AVAL <= 160` will lead to `AVALCAT1 == "<=160 cm"`,
#' `AVAL` in-between `160` and `170` will lead to `AVALCAT1 == "<=170 cm"`,
#' and `AVAL <= 170` will lead to `AVALCAT1 == ">170 cm"`.
#'
#' However, we suggest to be more explicit when defining the `condition`, to avoid overlap.
#' In this case, the middle condition should be:
#' `AVAL <= 170 & AVAL > 160`
#'
#' @return The input dataset with the new variables defined in `definition` added
#' @family der_gen
bundfussr marked this conversation as resolved.
Show resolved Hide resolved
#' @keywords der_gen
bundfussr marked this conversation as resolved.
Show resolved Hide resolved
#' @export
#'
#' @examples
#' library(dplyr)
#' library(tibble)
#'
StefanThoma marked this conversation as resolved.
Show resolved Hide resolved
#' advs <- tibble::tribble(
#' ~USUBJID, ~VSTEST, ~AVAL,
#' "01-701-1015", "Height", 147.32,
#' "01-701-1015", "Weight", 53.98,
#' "01-701-1023", "Height", 162.56,
#' "01-701-1023", "Weight", NA,
#' "01-701-1028", "Height", NA,
#' "01-701-1028", "Weight", NA,
#' "01-701-1033", "Height", 175.26,
#' "01-701-1033", "Weight", 88.45
#' )
#'
#' definition <- exprs(
#' ~condition, ~AVALCAT1, ~AVALCA1N, ~NEWCOL,
#' VSTEST == "Height" & AVAL > 160, ">160 cm", 1, "extra1",
#' VSTEST == "Height" & AVAL <= 160, "<=160 cm", 2, "extra2"
#' )
bms63 marked this conversation as resolved.
Show resolved Hide resolved
#' derive_vars_cat(
#' dataset = advs,
#' definition = definition
#' )
#'
#' # Using by_vars:
#' definition2 <- exprs(
#' ~VSTEST, ~condition, ~AVALCAT1, ~AVALCA1N,
#' "Height", AVAL > 160, ">160 cm", 1,
#' "Height", AVAL <= 160, "<=160 cm", 2,
#' "Weight", AVAL > 70, ">70 kg", 1,
#' "Weight", AVAL <= 70, "<=70 kg", 2
#' )
#'
#' derive_vars_cat(
#' dataset = advs,
#' definition = definition2,
#' by_vars = exprs(VSTEST)
#' )
#'
#' # With three conditions:
#' definition3 <- exprs(
#' ~VSTEST, ~condition, ~AVALCAT1, ~AVALCA1N,
#' "Height", AVAL > 170, ">170 cm", 1,
#' "Height", AVAL <= 170 & AVAL > 160, "<=170 cm", 2,
#' "Height", AVAL <= 160, "<=160 cm", 3
#' )
#'
#' derive_vars_cat(
#' dataset = advs,
#' definition = definition3,
#' by_vars = exprs(VSTEST)
#' )
#'
#' # Let's derive both the MCRITyML and the MCRITyMN variables
#' adlb <- tibble::tribble(
#' ~USUBJID, ~PARAM, ~AVAL, ~AVALU, ~ANRHI,
#' "01-701-1015", "ALT", 150, "U/L", 40,
#' "01-701-1023", "ALT", 70, "U/L", 40,
#' "01-701-1036", "ALT", 130, "U/L", 40,
#' "01-701-1048", "ALT", 30, "U/L", 40,
#' "01-701-1015", "AST", 50, "U/L", 35
#' )
#'
#' definition_mcrit <- exprs(
#' ~PARAM, ~condition, ~MCRIT1ML, ~MCRIT1MN,
#' "ALT", AVAL <= ANRHI, "<=ANRHI", 1,
#' "ALT", ANRHI < AVAL & AVAL <= 3 * ANRHI, ">1-3*ANRHI", 2,
#' "ALT", 3 * ANRHI < AVAL, ">3*ANRHI", 3
#' )
#'
#' adlb %>%
#' derive_vars_cat(
#' definition = definition_mcrit,
#' by_vars = exprs(PARAM)
#' )
derive_vars_cat <- function(dataset,
definition,
by_vars = NULL) {
assert_expr_list(definition)
assert_vars(by_vars, optional = TRUE)
if (length(by_vars) > 1) {
cli_abort("{.arg by_vars} must contain just one variable, e.g. {.code exprs(PARAMCD)}")
}

assert_data_frame(dataset,
required_vars = c(
admiraldev::extract_vars(definition) %>% unique(),
by_vars
)
)

# transform definition to tibble
names(definition) <- NULL
definition <- tryCatch(
{
tibble::tribble(!!!definition)
},
error = function(e) {
# Catch the error and append your own message
cli_abort(
c(
paste(
"Failed to convert {.arg definition} to {.cls tibble}.",
"{.arg definition} should be specified similarly to how you would",
"specify a {.cls tibble} using the {.fun tibble::tribble} function so it",
"can be converted to {.cls tibble} using {.fun tibble::tribble}."
),
e$message
)
)
}
)
assert_data_frame(definition, required_vars = c(exprs(condition), by_vars))
if (!is.null(by_vars)) {
# add condition
definition <- definition %>%
mutate(
condition = extend_condition(as.character(condition),
as.character(by_vars),
is = !!sym(as.character(by_vars))
) %>%
parse_exprs()
) %>%
select(-by_vars[[1]])
}

# extract new variable names and conditions
new_col_names <- names(definition)[!names(definition) == "condition"]
condition <- definition[["condition"]]

# warn if new variables already exist
if (any(new_col_names %in% names(dataset))) {
cli_warn(paste("Column(s) in {.arg definition} already exist in {.arg dataset}.",
"Did you forget to specify {.arg by_vars},",
"or are you rerunning your code?"
))
}

# (re)apply the function for each new variable name and iteratively derive the categories
new_dataset <- reduce(new_col_names, function(.data, col_name) {
# extract conditions
values <- definition[[col_name]]

.data %>%
mutate(!!sym(col_name) := eval(rlang::call2(
"case_when",
!!!map2(condition, values, ~ expr(!!.x ~ !!.y))
)))
}, .init = dataset)

return(new_dataset)
}

#' Extend a condition string by adding a new condition based on a variable and its value
#'
#' This internal helper function extends a condition string by appending a new condition
#' that checks if a variable equals a specific value.
#'
#' @param cond A character string representing an existing condition.
#' @param var A character string representing the name of the variable to check.
#' @param is A character string representing the value the variable should be equal to.
#'
#' @return A character string representing the extended condition.
#' @examples
#' # Extend an existing condition to include a check for 'AGE == "30"'
#' admiral:::extend_condition("SEX == 'M'", "AGE", "30")
#' @keywords internal
extend_condition <- function(cond, var, is) {
paste(cond, " & ", var, " == '", is, "'", sep = "")
}
93 changes: 35 additions & 58 deletions inst/templates/ad_adeg.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,67 +31,45 @@ eg <- convert_blanks_to_na(eg)

# Assign PARAMCD, PARAM, and PARAMN
param_lookup <- tibble::tribble(
~EGTESTCD, ~PARAMCD, ~PARAM, ~PARAMN,
"ECGINT", "EGINTP", "ECG Interpretation", 1,
"HR", "HR", "Heart Rate (beats/min)", 2,
"RR", "RR", "RR Duration (msec)", 3,
"RRR", "RRR", "RR Duration Rederived (msec)", 4,
"QT", "QT", "QT Duration (msec)", 10,
"QTCBR", "QTCBR", "QTcB - Bazett's Correction Formula Rederived (msec)", 11,
"QTCFR", "QTCFR", "QTcF - Fridericia's Correction Formula Rederived (msec)", 12,
"QTLCR", "QTLCR", "QTlc - Sagie's Correction Formula Rederived (msec)", 13,
~EGTESTCD, ~PARAMCD, ~PARAM, ~PARAMN,
"ECGINT", "EGINTP", "ECG Interpretation", 1,
"HR", "HR", "Heart Rate (beats/min)", 2,
"RR", "RR", "RR Duration (msec)", 3,
"RRR", "RRR", "RR Duration Rederived (msec)", 4,
"QT", "QT", "QT Duration (msec)", 10,
"QTCBR", "QTCBR", "QTcB - Bazett's Correction Formula Rederived (msec)", 11,
"QTCFR", "QTCFR", "QTcF - Fridericia's Correction Formula Rederived (msec)", 12,
"QTLCR", "QTLCR", "QTlc - Sagie's Correction Formula Rederived (msec)", 13,
)

range_lookup <- tibble::tribble(
~PARAMCD, ~ANRLO, ~ANRHI,
"EGINTP", NA, NA,
"HR", 40, 100,
"RR", 600, 1500,
"QT", 350, 450,
"RRR", 600, 1500,
"QTCBR", 350, 450,
"QTCFR", 350, 450,
"QTLCR", 350, 450,
"EGINTP", NA, NA,
"HR", 40, 100,
"RR", 600, 1500,
"QT", 350, 450,
"RRR", 600, 1500,
"QTCBR", 350, 450,
"QTCFR", 350, 450,
"QTLCR", 350, 450
)

# ASSIGN AVALCAT1
avalcat_lookup <- tibble::tribble(
~AVALCA1N, ~AVALCAT1,
1, "<= 450 msec",
2, ">450<=480 msec",
3, ">480<=500 msec",
4, ">500 msec"
# Assign AVALCAx
avalcax_lookup <- exprs(
~condition, ~AVALCAT1, ~AVALCA1N,
startsWith(PARAMCD, "QT") & AVAL <= 450, "<= 450 msec", 1,
startsWith(PARAMCD, "QT") & AVAL > 450 & AVAL <= 480, ">450<=480 msec", 2,
startsWith(PARAMCD, "QT") & AVAL > 480 & AVAL <= 500, ">480<=500 msec", 3,
startsWith(PARAMCD, "QT") & AVAL > 500, ">500 msec", 4
)

# ASSIGN CHGCAT1
chgcat_lookup <- tibble::tribble(
~CHGCAT1N, ~CHGCAT1,
1, "<= 30 msec",
2, ">30<=60 msec",
3, ">60 msec"
# Assign CHGCAx
chgcax_lookup <- exprs(
~condition, ~CHGCAT1, ~CHGCAT1N,
startsWith(PARAMCD, "QT") & CHG <= 30, "<= 30 msec", 1,
startsWith(PARAMCD, "QT") & CHG > 30 & CHG <= 60, ">30<=60 msec", 2,
startsWith(PARAMCD, "QT") & CHG > 60, ">60 msec", 3
)

# Here are some examples of how you can create your own functions that
# operates on vectors, which can be used in `mutate()`. Info then used for
# lookup table
format_avalca1n <- function(paramcd, aval) {
case_when(
str_detect(paramcd, "QT") & aval <= 450 ~ 1,
str_detect(paramcd, "QT") & aval > 450 & aval <= 480 ~ 2,
str_detect(paramcd, "QT") & aval > 480 & aval <= 500 ~ 3,
str_detect(paramcd, "QT") & aval > 500 ~ 4
)
}

format_chgcat1n <- function(paramcd, chg) {
case_when(
str_detect(paramcd, "QT") & chg <= 30 ~ 1,
str_detect(paramcd, "QT") & chg > 30 & chg <= 60 ~ 2,
str_detect(paramcd, "QT") & chg > 60 ~ 3
)
}


# Derivations ----

# Get list of ADSL vars required for derivations
Expand Down Expand Up @@ -316,14 +294,13 @@ adeg <- adeg %>%
check_type = "error"
) %>%
# Derive AVALCA1N and AVALCAT1
mutate(AVALCA1N = format_avalca1n(param = PARAMCD, aval = AVAL)) %>%
derive_vars_merged(
dataset_add = avalcat_lookup,
by_vars = exprs(AVALCA1N)
derive_vars_cat(
definition = avalcax_lookup
) %>%
# Derive CHGCAT1N and CHGCAT1
mutate(CHGCAT1N = format_chgcat1n(param = PARAMCD, chg = CHG)) %>%
derive_vars_merged(dataset_add = chgcat_lookup, by_vars = exprs(CHGCAT1N)) %>%
derive_vars_cat(
definition = chgcax_lookup
) %>%
# Derive PARAM and PARAMN
derive_vars_merged(
dataset_add = select(param_lookup, -EGTESTCD),
Expand Down
Loading
Loading