Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Responding to rOpenSci reviewer's comments #184

Open
1 of 7 tasks
RayStick opened this issue Feb 3, 2025 · 2 comments
Open
1 of 7 tasks

Responding to rOpenSci reviewer's comments #184

RayStick opened this issue Feb 3, 2025 · 2 comments
Assignees

Comments

@RayStick
Copy link
Member

RayStick commented Feb 3, 2025

In this issue I will group together comments from my 2 reviewers: ropensci/software-review#674 (comment) in an attempt to tackle the changes in a few PRs that relate to similar files.

Changes to tackle:

After I finish the changes:

  • Update the news file
  • Add "rev" tag for the reviewer's contribution (@Lextuga007 @ymansiaux ) - check guidance on that, is there also an editor contribution tag for @maelle?
  • Summarise changes I have made back to reviewers (in case of further discussion)
  • Make a new release

Reviewer 1: ropensci/software-review#674 (comment)

Function and argument naming

[1] To align with the naming of the other exported functions, map_compare() and map_convert(), I would suggest renaming metadata_map() to map_metadata(). While I understand you may have concerns about potential confusion with the package name, I believe this change would harmonize the naming scheme, enhance clarity, and facilitate autocomplete for users. resolved on review thread: ropensci/software-review#674 (comment)

[2] For the map_convert() function, I recommend renaming the argument output_csv to csv_to_convert to make its purpose clearer. Additionally, introducing a new argument, csv_to_convert_dir, could address an issue I’ve identified in the provided function example (see the Examples section below). If you implement this change, the output_dir argument could represent the directory where the generated CSV file is stored (defaulting to csv_to_convert_dir if not specified).

[3] Input checking

I suggest adding explicit input validation to your exported functions. This would enhance the user experience by providing clear and informative error messages or warnings.

  • For map_compare(), consider verifying that session_dir exists and that the CSV files specified in the user input are present.
  • For map_convert(), you might want to check that both output_csv and output_dir exist.
  • For metadata_map() and the other functions, validating file existence, expected dimensions, and column names (if any) could further improve robustness.

[4] README

The demo example in the README feels somewhat extensive and might benefit from being simplified to improve accessibility. Currently, interacting with up to 20 variables requires substantial effort, which could deter users from running the full demo.

For sections involving custom inputs (metadata input, domain list input, and custom lookup), I recommend moving these to the vignette and referencing it in the README. For example:

"If you wish to use custom metadata, domain lists, or lookup tables, please refer to vignette XXX, section YYY."

Additionally, examples in the README should be copy-pasteable and work seamlessly without requiring additional setup. For instance, the following code snippet requires downloading a file.

new_csv_file <- "path/your_new_csv.csv"
demo_domains_file <- system.file("inputs/domain_list_demo.csv", package = "mapmetadata")

metadata_map(csv_file = new_csv_file, domain_file = demo_domains_file)

This is why I would present only the demo example in the README.

Providing examples for custom domain list inputs and custom lookup tables would also be beneficial. For these examples, linking to demo files in inst/inputs could clarify the expected file formats. I guess an explanation about the location of the files used by default and the way to access them programmatically with system.file() would be useful to your users (If this function is probably familiar to R packages developpers, it might not be straightforward for package “users” only).

[5] Documentation

For non-exported functions, you might want to prevent .Rd files from being generated in the man/ folder. Adding the #' @noRd tag to these functions' documentation can achieve this. For more details, see [the rOpenSci dev guide](https://devguide.ropensci.org/pkg_building.html#roxygen-2-use).

Editor says:

As an alternative to @nord you might enjoy https://github.com/moodymudskipper/devtag @RayStick (not in the dev guide yet), it generates Rd pages for internal functions, that developers can read with ?fun, but Rbuildignores them so the users don't see them.

[6] Testing

In the tests/testthat.R file, it seems unnecessary to load the {mockery} and {withr} packages explicitly, as you are already using explicit namespacing.

A note regarding {mockery}: its README states that the package is superseded and recommends using testthat::local_mocked_bindings() instead. More details can be found [here](https://github.com/r-lib/mockery).

[7] Examples

Some examples are enclosed in dontrun tags, which may pose an issue if you plan to submit the package to CRAN.

  • For map_compare(), consider using a temporary directory to avoid creating files in the user’s workspace or package’s installation directory:
# Locate file paths for the example files in the package
demo_session_dir <- system.file("outputs", package = "mapmetadata")
demo_session1_base <- "360_NationalCommunityChildHealthDatabase(NCCHD)_CHILD_2024-12-19-14-11-55"
demo_session2_base <- "360_NationalCommunityChildHealthDatabase(NCCHD)_CHILD_2024-12-19-14-17-45"
demo_metadata_file <- system.file("inputs", "360_NationalCommunityChildHealthDatabase(NCCHD)_Structural_Metadata.csv", package = "mapmetadata")
demo_domain_file <- system.file("inputs", "domain_list_demo.csv", package = "mapmetadata")
output_dir <- tempdir()
message("Output directory: ", output_dir)
 
# Run the function - requires user interaction
map_compare(
 session_dir = demo_session_dir,
session1_base = demo_session1_base,
session2_base = demo_session2_base,
metadata_file = demo_metadata_file,
domain_file = demo_domain_file,
output_dir = output_dir
)

Returning the path of the output file and including a browseURL() call to open the file in the example could enhance usability. This would allow us to delete the temp directory once the generated file has been opened.

  • The example for map_convert() writes to the package’s installation directory, which is problematic. Modifying this to use temporary directories would improve clarity and compliance. If you implement my earlier suggestion to add a csv_to_convert_dir argument, this example could be significantly simplified. Moreover, I think a message telling the user the path of the output could be nice, like you did with map_compare(). I think this could be nice that this function returns the path of the output file so we could open it in the example with a browseURL() call.

  • For metadata_map(), since it requires user interaction, you should enclose it in an if(interactive()) block:

#' @examples
# Demo run requires no function inputs but requires user interaction
if(interactive()) {
 	metadata_map()
 }

[8] file path construction

In your file-handling functions, you are constructing paths using paste() or paste0(). To ensure platform independence, I recommend replacing these with file.path(). For example, in map_compare():

You could replace

csv_1a <- read.csv(paste0(session_dir, "/", "MAPPING_LOG_", session1_base, ".csv"))
with
csv_1a <- read.csv(file.path(session_dir, paste0("MAPPING_LOG_", session1_base, ".csv")))

This ensures compatibility across operating systems.

[9] Feedback based on devchecks()

  • Portable file names: Removing parentheses from file names should resolve this issue.
  • Package size: Large files in inst/outputs, such as BAR_360_NationalCommunityChildHealthDatabase.html, could be removed if not essential to the package.
  • Future file timestamps: Likely specific to my computer and unrelated to your code.
  • Global variable bindings: Create an R/globals.R file and use utils::globalVariables(). For guidance, see [this forum post](https://forum.posit.co/t/how-to-solve-no-visible-binding-for-global-variable-note/28887).
  • Rd line widths: Some lines exceed 100 characters. Adjusting your code style should address this:
demo_metadata_file <- system.file(
"inputs",
"360_NationalCommunityChildHealthDatabase(NCCHD)_Structural_Metadata.csv",
package = "mapmetadata"
)

Reviewer 2: ropensci/software-review#674 (comment)

[1] ROpenSci packaging guidelines

Console messages asks to avoid the use of cat and print as these are not easy to suppress.

Avoiding verbosity or suppressing messages is hard within this package as the interactions from the user require the messages.

Editor says:

Regarding verbosity, we've published a blog post a while ago (with the goal of updating the dev guide later), that might be relevant: https://ropensci.org/blog/2024/02/06/verbosity-control-packages/

[2] Accessibility#185

Thank you for the alt text on images!

To support further accessibility I recommend being more descriptive in the text where a link is highlighted in text like for the word here particularly as this is something that helps with accessibility.

Also appears in the vignette which shows an image but only says download here:.

It took me a while to navigate the website as I wasn't familiar with it so I wonder if the image would benefit from a direct hyperlink to the same page?

And the non-descriptive here in see here for outputs generated from the demo run.

[3] Technical language#185

It might be useful to expand on the point about changing the save location by adjusting the output_dir argument because argument is a technical term that might not be familiar to researchers - perhaps using an example to expand on this would help?

[4] Expectations of functionality

Only the Return key works to continue where the instruction says to finish here, or press any other key to continue with mapping variables. Other keys don't appear to "finish" and also, to be very clear, need to be executed in the Console.

The demo requires initials to be entered but it isn't explained why it is required. It gets recorded in the logs but this might not be apparent to an initial user of the package.

Demo mode is said to process only the first 20 variables from the selected table and it's not clear what the 20 variables refer to particularly when directed to selecting data elements. Are data elements the same as variables in this example? I got a bit confused with this so couldn't complete the instructions.

The parameters used in the metadata_map() function example don't appear to work. The code is metadata_map(csv_file = new_csv_file, domain_file = demo_domains_file) but there isn't a parameter called csv_file - should this be metadata_file? I first of all tried to use the 14_Mortality (Death registration)_Metadata.csv data from the website but got the error because the use of structural metadata is only referred to in the making of metadata. I found this link via the arguments help for metadata_file which is excellent documentation, however, will be missed I think so would be really great to be clear in the README file as well. An enhancement may be to have a message coded if a file is missing structural in the file name that the error is because it's the wrong file?

get("look_up") should perhaps come earlier in the vignette because it's not possible to use this function when in the process of selecting variables from using metadata_map() that appears just before.

For the prompt Enter one or more numbers separated by spaces and then ENTER, or 0 to cancel I tried just using the Enter/Return key to finish which gave an output. Pressing Enter/Return key though isn't a listed option and I just guessed it would work so possibly better to be explicit.

I recommend linking to the function reference for map_compare() in tutorial.

I recommend being explicit in the argument help for metadata_file as ?metadata won't return any files if the package hasn't been loaded: ?mapmetadata::metadata. Although this is functionality in RStudio it could be missed by someone new to R and RStudio and their experience of not finding help might be incorrectly associated with the package.

The reference to data/metadata.rda might not make sense as packages are often just used through the library() so perhaps better to use refer to the code mapmetadata::metadata?

For Using a custom domain list input where domain file inputs are detailed as automatically appending so do not include these in your domain list is this because they get replicated? It would be helpful to be explicit that this wouldn't be likely to break code but just duplicate.

For Using a custom lookup table input (advanced) I couldn't work out the corresponding example csv that the instruction ensure that all domain codes in the lookup file are also included in your *domain file* without opening all the csvs and looking for the column name. It might be good to expand on this with an example.

Function documentation sometimes includes references to other functions in the package, for example valid_comparison() refers to it being used in the the map_compare.R function. Cross links are mentioned in the ROpenSci package guidelines and if you use RStudio I recommend using my colleague, Matt Dray's, snorkel package to add these in as you can highlight the text and use the RStudio add in.

[5] Spelling#185

Minor spelling have been submitted for correction on a PR but where functions have spelling mistakes:

concensus is a function name and will require updates to corresponding files: concensus_on_mismatch(), test-concensus_on_mismatch.R.

It is also in the message from the map_compare() function and is in the documentation for map_compare().

Object in function metadata_map() (which won't have any effect to the user) mistmatch.

[6] Package build

Notes on no visible binding for global variable can be managed using the utils package and a specific R file (globals.R). More details on more sophisticated treatment can be found in the R packages book. I tend to get these issues particularly when referring to SQL columns in R code.

portable file names... NOTE is appearing because the file names are over 100 characters long. These files are referred to in the map_convert() and map_compare() documentation example so it might be ok to rename this longer name by the parts for generalising? For example:

#' @examples
#' # Locate file path and file name for the example files in the package
#' demo_output_dir <- system.file("outputs", package = "mapmetadata")
#' demo_output_csv <- "MAPPING_360_example_datetime.csv"
#'
#' # Run the function
#' map_convert(output_csv = demo_output_csv, output_dir = demo_output_dir)

[7] Formatting consistency

I noticed that there are line breaks put into function metadata_map.R so almost all the text is within an 80 character limit, but this isn't the case for other scripts. It's quite nice to have a line break set because it makes it easier to read across the screen and as it's been used in one script I wondered if it could be applied to all scripts? I use lintr to highlight the lines that are over 80 characters.

@RayStick RayStick self-assigned this Feb 3, 2025
@RayStick RayStick added this to the rOpenSci Review Process milestone Feb 3, 2025
@maelle
Copy link
Collaborator

maelle commented Feb 3, 2025

is there also an editor contribution tag for @maelle

No, thank you for asking. 🙏 In the dev guide we state "Please do not list editors as contributors. Your participation in and contribution to rOpenSci is thanks enough!" https://devguide.ropensci.org/pkg_building.html#authorship (I'm actually realizing you listed me as contributor in some other metadata!)

@RayStick
Copy link
Member Author

RayStick commented Feb 3, 2025

No, thank you for asking. 🙏 In the dev guide we state "Please do not list editors as contributors. Your participation in and contribution to rOpenSci is thanks enough!" https://devguide.ropensci.org/pkg_building.html#authorship (I'm actually realizing you listed me as contributor in some other metadata!)

Ah, missed that guide note! Well I am happy to keep you as a contributor (if you are okay with that) as I think you gave ideas - and contributions in this repo - above and beyond your editor role, but correct me if I am wrong. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants