Skip to content

Commit 39ffe96

Browse files
authored
Merge pull request #118 from RobLBaker/master
add function document_missing_values
2 parents 268c7aa + 69bc74f commit 39ffe96

29 files changed

+725
-20
lines changed

DESCRIPTION

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,6 @@ Imports:
4949
stringr,
5050
base,
5151
readr,
52-
lifecycle,
5352
huxtable,
5453
crayon,
5554
data.table,
@@ -61,7 +60,8 @@ Imports:
6160
sp,
6261
withr,
6362
cli,
64-
purrr
63+
purrr,
64+
lifecycle
6565
RoxygenNote: 7.3.1
6666
Suggests:
6767
knitr,

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ export(convert_datetime_format)
77
export(convert_long_to_utm)
88
export(convert_utm_to_ll)
99
export(create_datastore_script)
10+
export(document_missing_values)
1011
export(fix_utc_offset)
1112
export(fuzz_location)
1213
export(generate_ll_from_utm)

NEWS.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# QCkit v0.1.8 (not yet released)
22

3+
2024-07-16
4+
* Added experimental function `document_missing_values()`, which searches a file for multiple missing value codes, replaces them all with NA, and generates a new column with the missing value codes so that they can be properly documented in EML. This is a work-around for the fact that there is currently not a good way to get multiple missing value codes in a single column via EMLassemblyline. This function is still under development; expect substantial changes an improvements up to and including removing the function entirely.
5+
36
2024-07-09
47
* Added function `get_user_email()`, which accesses NPS active directory via a powershell function to return the user's email address. Probably won't work for non-NPS users and probably won't work for non-windows users.
58
* Updated rest API from legacy v6 to current v7.

R/replace_blanks.R

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,3 +93,86 @@ replace_blanks <- function(directory = here::here(), missing_val_code = NA) {
9393
}
9494
return(invisible())
9595
}
96+
97+
98+
#' Handles multiple missing values
99+
#'
100+
#' @description
101+
#' `r lifecycle::badge("experimental")`
102+
#' `r lifecycle::badge("questioning")`
103+
#' Given a file name (.csv only) and path, the function will search the
104+
#' columns for any that contain multiple user-specified missing value codes.
105+
#' For any column with multiple missing value codes, all the missing values
106+
#' (including blanks) will be replaced with NA. A new column will be generated
107+
#' and, populated with the given missing value code from the origin column.
108+
#' Values that were not missing will be populated with "not_missing". The
109+
#' newly generate column of categorical variables can be used do describe
110+
#' the various/multiple reasons for why data is absent in the original column.
111+
#'
112+
#' The function will then write the new dataframe to a file, overwriting the
113+
#' original file. If it is important to keep a copy of the original file, make
114+
#' a copy prior to running the function.
115+
#'
116+
#' WARNING: this function will replace any blank cells in your data with NA!
117+
#'
118+
#' @details Blank cells will be treated as NA.
119+
#'
120+
#' @param file_name String. The name of the file to inspect
121+
#' @param directory String. Location of file to read/write. Defaults to the current working directory.
122+
#' @param colname `r lifecycle::badge("experimental")` String. The columns to inspect. CURRENTLY ONLY WORKS AS SET TO DEFAULT "NA".
123+
#' @param missing_val_codes List. A list of strings containing the missing value code or codes to search for.
124+
#' @param replace_value String. The value (singular) to replace multiple missing values with. Defaults to NA.
125+
#'
126+
#' @return writes a new dataframe to file. Return invisible.
127+
#' @export
128+
#'
129+
#' @examples
130+
#' \dontrun{
131+
#' document_missing_values(file_name = "mydata.csv",
132+
#' directory = here::here(),
133+
#' colname = NA, #do not change during function development
134+
#' missing_val_codes = c("missing", "blank", "no data"),
135+
#' replace_value = NA)
136+
#' }
137+
document_missing_values <- function(file_name,
138+
directory = here::here(),
139+
colname = NA,
140+
missing_val_codes = NA,
141+
replace_value = NA) {
142+
143+
#read in a dataframe:
144+
df <- readr::read_csv(paste0(directory, "/", file_name),
145+
show_col_types = FALSE)
146+
#generate list of missing values
147+
missing_val_codes <- append(missing_val_codes, NA)
148+
missing_val_codes <- unique(missing_val_codes)
149+
150+
data_names <- colnames(df)
151+
152+
if (is.na(colname)) {
153+
y <- ncol(df)
154+
for (i in 1:y) {
155+
#if here are multiple missing value codes in a column:
156+
if (sum(df[[data_names[i]]] %in% missing_val_codes) >
157+
sum(is.na(df[[data_names[i]]]))) {
158+
#generate new column of data:
159+
df$x <- with(df,
160+
ifelse(df[[data_names[i]]] %in% missing_val_codes,
161+
df[[data_names[i]]], "not_missing"))
162+
#replace old missing values with replacement value
163+
df[[data_names[i]]] = ifelse(df[[data_names[i]]] %in%
164+
missing_val_codes,
165+
replace_value, df[[data_names[i]]])
166+
#rename new column:
167+
names(df)[names(df) == "x"] <- paste0("custom_",
168+
data_names[i],
169+
"_MissingValues")
170+
}
171+
}
172+
}
173+
#write the file back out:
174+
readr::write_csv(df, paste0(directory, "/", file_name))
175+
176+
return(invisible)
177+
178+
}

docs/index.html

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/news/index.html

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/pkgdown.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,5 @@ articles:
55
DRR_Purpose_and_Scope: DRR_Purpose_and_Scope.html
66
Starting-a-DRR: Starting-a-DRR.html
77
Using-the-DRR-Template: Using-the-DRR-Template.html
8-
last_built: 2024-07-09T14:49Z
8+
last_built: 2024-07-16T15:01Z
99

docs/reference/document_missing_values.html

Lines changed: 174 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)