Skip to content

Commit

Permalink
Merge branch 'release/0.2.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
pbchase committed Feb 16, 2022
2 parents 1829901 + 307dfe7 commit 9c643e3
Show file tree
Hide file tree
Showing 23 changed files with 752 additions and 40 deletions.
27 changes: 27 additions & 0 deletions .github/workflows/run-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Run tests

on:
push:
branches: [ develop ]
pull_request:
branches: [ develop ]

jobs:
test:
runs-on: ubuntu-latest

container:
image: ghcr.io/ctsit/rstudio-ci:4.1.0
credentials:
username: ${{ github.repository_owner }}
password: ${{ secrets.CR_PAT }}

env:
CI: "TRUE"

steps:
- uses: actions/checkout@v2

- name: Check
run: devtools::test(stop_on_failure = TRUE)
shell: Rscript {0}
4 changes: 3 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: redcapcustodian
Type: Package
Title: System data cleaning for REDCap
Version: 0.1.0
Version: 0.2.0
Authors@R: c(
person("Philip", "Chase", email = "[email protected]", role = c("aut", "cre")),
person("Laurence", "James-Woodley", email = "[email protected]", role = "aut"),
Expand All @@ -23,11 +23,13 @@ Imports:
glue,
lubridate,
magrittr,
mRpostman,
purrr,
rjson,
rlang,
rstudioapi,
sendmailR,
stringr,
sqldf,
tibble,
tidyr,
Expand Down
20 changes: 17 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,26 @@ RUN apt update -y && apt install -y libmariadb-dev libmariadbclient-dev

# install necessary libraries
RUN R -e "install.packages(c('sendmailR', 'dotenv', 'RCurl', 'checkmate', 'janitor', 'sqldf', 'DBI', 'RMariaDB', 'digest','rjson'))"

ADD . /home/rocker/redcapcustodian
RUN R -e "install.packages(c('REDCapR'))"

# build and install this package
ADD . /home/rocker/redcapcustodian
RUN R CMD build redcapcustodian
RUN R CMD INSTALL redcapcustodian_*.tar.gz
RUN rm -rf redcapcustodian

# Add non-package things
ADD . /home/rocker
RUN rm -rf .Rbuildignore
RUN rm -rf NAMESPACE
RUN rm -rf R
RUN rm -rf .dockerignore
RUN rm -rf DESCRIPTION
RUN rm -rf hosts
RUN rm -rf host_template
RUN rm -rf make_host.sh
RUN rm -rf .Rhistory
RUN rm -rf Dockerfile

# Note where we are, what is there, and what's in the package dir
CMD pwd && ls -AlhF ./ && echo /home/rocker/redcapcustodian && ls -AlhF ./redcapcustodian
CMD pwd && ls -AlhF ./
5 changes: 5 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,16 @@

export(connect_to_redcap_db)
export(create_test_table)
export(create_test_tables)
export(get_bad_emails_from_listserv_digest)
export(get_current_time)
export(get_institutional_person_data)
export(get_redcap_db_connection)
export(get_redcap_email_revisions)
export(get_redcap_emails)
export(get_script_name)
export(get_script_run_time)
export(get_test_table_names)
export(set_script_name)
export(set_script_run_time)
export(write_error_log_entry)
Expand Down
16 changes: 16 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,22 @@
All notable changes to the redcapcustodian package and its contained scripts will be documented in this file.
This project adheres to [Semantic Versioning](http://semver.org/).

## [0.2.0] - 2022-02-16
### Added
- Add get_redcap_email_revisions (Michael Bentz)
- Add automated tests (Michael Bentz)
- Add create_test_tables (ChemiKyle)
- Add test tables (Kyle Chesney)
- Add get_bad_emails_from_listserv_digest (Philip Chase)
- Add get_institutional_person_data (Philip Chase)
- Add get_redcap_emails (Philip Chase)
- Add create_test_table (Philip Chase)
- Add site concept and docs (Philip Chase)
- Add add_get_redcap_db_connection (Philip Chase)
- Store rc_conn in env (Philip Chase)
- Add add_connect_to_redcap_db (Philip Chase)
- Add basic logging (Michael Bentz)


## [0.1.0] - 2021-06-22
### Summary
Expand Down
48 changes: 48 additions & 0 deletions R/devtools.R
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,51 @@ create_test_table <- function(conn, table_name, data_file = NA_character_, empty
)
}
}


#' Provides a list of table names which have schema and data files as part of the package
#'
#' @return A list of table names which have schema and data files as part of the package
#' @export
#'
#' @examples
#' get_test_table_names()
get_test_table_names <- function() {
table_names <- c(
"redcap_projects",
"redcap_user_information"
)
return (table_names)
}


#' A wrapper around \code{\link{create_test_table}} to create all tables, or a specified subset of them
#'
#' @param conn A DBI Connection object
#' @param table_names A character list of the names of all tables you wish
#' to create, if nothing is provided, the result of
#' \code{\link{get_test_table_names}} will be used to create all test tables
#'
#' @return NA
#' @export
#'
#' @examples
#' \dontrun{
#' conn <- dbConnect(RSQLite::SQLite(), dbname = ":memory:")
#' create_test_tables(conn) # create all test tables
#'
#' }
create_test_tables <- function(conn, table_names = c()) {
if (length(table_names) == 0) {
table_names <- get_test_table_names()
}

purrr::pmap(
.l = list(
"conn" = c(conn),
"table_name" = c(table_names)
),
.f = create_test_table
)

}
34 changes: 34 additions & 0 deletions R/institutional.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#' A template function for fetching authoritative email address data
#' and other institutional data
#'
#' @param user_ids a optional vector of REDCap user IDs to be used in a query
#' against the institutional data
#'
#' @return A Dataframe
#' \itemize{
#' \item user_id - a column of redcap user_ids / institutional IDs
#' \item email - a column of with the authoritative email address for user_id
#' \item ... - Additional columns are allowed in the return data frame
#' }
#' @export
#'
#' @examples
#' redcap_users <- c("jane_doe", "john_q_public")
#' get_institutional_person_data(user_ids = redcap_users)
get_institutional_person_data <- function(user_ids = c(NA_character_)) {
email_data <- dplyr::tribble(
~user_id, ~email,
"inappropriate_user_id", "[email protected]",
"jane_doe", "[email protected]",
"john_q_public", "[email protected]"
)

if (length(user_ids) == 1 && is.na(user_ids)) {
result <- email_data
} else {
result <- email_data %>%
dplyr::filter(.data$user_id %in% user_ids)
}

return(result)
}
75 changes: 75 additions & 0 deletions R/listserv.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@

#' Enumerate bad email addresses described in LISTSERV email digests
#'
#' Connect to an imap mailbox, identify LISTSERV digest emails sent
#' after `messages_since_date`, and extract bounced email addresses
#' from those digest messages.
#'
#' @param url The IMAP URL of the host that houses the mailbox
#' @param username The username of the IMAP mailbox
#' @param password The password of the IMAP mailbox
#' @param messages_since_date The sent date of the oldest message that should be inspected
#'
#' @return A dataframe of bad email addresses
#' \itemize{
#' \item email - a column of bad email address
#' }
#' @export
#' @importFrom magrittr "%>%"
#' @importFrom rlang .data
#'
#' @examples
#' \dontrun{
#' get_bad_emails_from_listserv_digest(
#' username = "jdoe",
#' password = "jane_does_password",
#' url ="imaps://outlook.office365.com",
#' messages_since_date = as.Date("2022-01-01", format = "%Y-%m-%d")
#' )
#' }

get_bad_emails_from_listserv_digest <- function(username,
password,
url = "imaps://outlook.office365.com",
messages_since_date) {
utils::globalVariables(c("."))

imap_con <- mRpostman::configure_imap(
url = url,
username = username,
password = password
)

imap_con$select_folder("INBOX")
error_emails <- imap_con$search_string(expr = "Daily error monitoring report", where = "SUBJECT")
messages_since_date <- imap_con$search_since(date_char = format(messages_since_date, format = "%d-%b-%Y"))
digest_emails <- dplyr::intersect(error_emails, messages_since_date)

if (!is.na(digest_emails)) {
bounced_email_addresses <- digest_emails %>%
imap_con$fetch_body() %>%
# key on Err First Last Address row
stringr::str_extract_all("\\d{1} \\d{2}/\\d{2} \\d{2}/\\d{2}.*") %>%
# flatten nested list of email -> address rows
unlist() %>%
# extract email address portion
sub(".*\\s(.*@.*).*", "\\1", .) %>%
# remove html encoded < and > characters
sub("&lt;", "", .) %>%
sub("&gt;", "", .) %>%
# remove literal < and > characters
sub("<", "", .) %>%
sub(">", "", .) %>%
# remove html newline
sub("<br>", "", .) %>%
unique()
} else {
bounced_email_addresses <- NA_character_
}

bounce_data <- dplyr::tibble(bounced_email_addresses) %>%
dplyr::mutate(email = tolower(.data$bounced_email_addresses)) %>%
dplyr::select(.data$email)

return(bounce_data)
}
54 changes: 52 additions & 2 deletions R/redcap.R
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,8 @@ get_redcap_db_connection <- function() {
#'
#' @return a dataframe with these columns:
#' \itemize{
#' \item ui_id - row_id of table
#' \item username - REDCap username"
#' \item ui_id - ui_id for the associated user in REDCap's redcap_user_information table
#' \item username - REDCap username
#' \item email_field_name - the name of the column containing the email address
#' \item email - the email address in email_field_name
#' }
Expand All @@ -94,3 +94,53 @@ get_redcap_emails <- function(conn) {

return(redcap_emails)
}

#' Get redcap user email revisions
#'
#' @param bad_redcap_user_emails bad redcap user email data
#' @param person institutional person data keyed on user_id
#'
#' @return a dataframe with these columns:
#' \itemize{
#' \item ui_id - ui_id for the associated user in REDCap's redcap_user_information table
#' \item username - REDCap username
#' \item email_field_name - the name of the column containing the email address
#' \item corrected_email - the corrected email address in email_field_name
#' }
#'
#' @export
#' @importFrom rlang .data
#' @importFrom magrittr "%>%"
#' @examples
#' \dontrun{
#' conn <- dbConnect(RSQLite::SQLite(), dbname = ":memory:")
#' bad_emails <- get_bad_redcap_user_emails()
#' persons <- get_institutional_person_data(conn)
#' email_revisions <- get_redcap_email_revisions(bad_emails, persons)
#' }
get_redcap_email_revisions <- function(bad_redcap_user_emails, person) {
person_data_for_redcap_users_with_bad_emails <- person %>%
dplyr::select(.data$user_id, .data$email) %>%
dplyr::filter(.data$user_id %in% bad_redcap_user_emails$username)

redcap_email_revisions <- bad_redcap_user_emails %>%
dplyr::inner_join(person_data_for_redcap_users_with_bad_emails, by = c("username" = "user_id"), suffix = c(".bad", ".replacement")) %>%
dplyr::filter(.data$email.bad != .data$email.replacement) %>%
dplyr::filter(!is.na(.data$email.replacement)) %>%
dplyr::filter(.data$email.replacement != "") %>%
dplyr::mutate(corrected_email = .data$email.replacement) %>%
dplyr::group_by(.data$ui_id, .data$email_field_name) %>%
# columnar equivalent of coalesce for each row
# ensures retention of corrected_email where marked for deletion
# https://stackoverflow.com/a/60645992/7418735
dplyr::summarise_all(~ na.omit(.)[1]) %>%
dplyr::ungroup() %>%
dplyr::select(
.data$ui_id,
.data$username,
.data$email_field_name,
.data$corrected_email
)

return(redcap_email_revisions)
}
18 changes: 6 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,33 +31,27 @@ To build an updated `redcapcustodian` image for an _example_ host, create a host
./make_host.sh example
```

Build `redcapcustodian` and the `example` host image:
Build `redcapcustodian` and the `rcc.site` image:

```bash
./build.sh example
./build.sh
```

This will build two images: `redcapcustodian` and `rcc_example`. The latter is built on top of the former.
This will build two images: `redcapcustodian` and `rcc.site`. The latter is built on top of the former.

To see the working directory and the contents of the `redcapcustodian` directory run
To see the working directory contents run

```bash
docker run --rm rcc_example
docker run --rm rcc.site
```

To run the shared `hello.R` report within the shared container, run

```bash
# run the script inside the container
docker run --env-file .env --rm redcapcustodian Rscript redcapcustodian/report/hello.R
docker run --env-file .env --rm redcapcustodian Rscript report/hello.R
```

To run the localized `hello-local.R` report from the _example_ host within the container, run

```bash
# run the script inside the container
docker run --env-file .env --rm rcc_example Rscript redcapcustodian/report/hello-local.R
```

## Writing your own redcapcustodian Rscripts

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.1.0
0.2.0
Loading

0 comments on commit 9c643e3

Please sign in to comment.