Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix assess_downloads_1yr() error #283

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

parmsam-pfizer
Copy link
Collaborator

No description provided.

@parmsam-pfizer parmsam-pfizer changed the title Fix downloads error Fix assess_downloads_1yr() error Mar 22, 2023
@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2023

Codecov Report

Merging #283 (f36df15) into master (4bc6f95) will increase coverage by 0.18%.
The diff coverage is 100.00%.

❗ Current head f36df15 differs from pull request most recent head c3fea3b. Consider uploading reports for the commit c3fea3b to get more accurate results

@@            Coverage Diff             @@
##           master     #283      +/-   ##
==========================================
+ Coverage   59.23%   59.41%   +0.18%     
==========================================
  Files          64       64              
  Lines         991      993       +2     
==========================================
+ Hits          587      590       +3     
+ Misses        404      403       -1     
Impacted Files Coverage Δ
R/pkg_ref_cache_downloads.R 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@parmsam-pfizer parmsam-pfizer marked this pull request as draft March 22, 2023 04:00
Copy link
Collaborator

@emilliman5 emilliman5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a warning as well. to let users know there was a problem and that we automatically set the look back

@parmsam-pfizer
Copy link
Collaborator Author

Good idea. I'll add that. However, I'm having problems changing the lookback argument. Doesn't look like that is working.

@parmsam-pfizer parmsam-pfizer marked this pull request as ready for review March 22, 2023 15:59
@parmsam-pfizer
Copy link
Collaborator Author

Based on our discussion, I've updated pkg_ref_cache.downloads() to grab the earliest downloads record date which seems to be "2012-10-01". I wasn't able to find an easy way to grab the package CRAN release date, so used this as an alternative. The assess_downloads_1yr.pkg_ref() subsets the pkg_ref downloads dataframe up to the past year then returns the downloads sum.

Copy link
Collaborator

@emilliman5 emilliman5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a filter if the "archive_release_dates" field is populated/available. we do not need to display/store download information for dates before a package's first release. This slot is only populated for pkg_cran_remote class pkg_refs.

@parmsam-pfizer
Copy link
Collaborator Author

parmsam-pfizer commented Mar 30, 2023

Sorry for the delay. Just added default and pkg_cran_remote classes to pkg_ref_cache_downloads. Is this what you were thinking? @emilliman5

@emilliman5
Copy link
Collaborator

Almost there. I think pkg_ref_cache.downloads.default should return NA. For pkg_ref_cache.downloads.pkg_cran_remote I would add some logic to that sets a default from date in case there is no date in archive_release_dates (e.g. if there is/was a parsing error).

and can we implement a downloads ref cache for bioconductor and github???? (can be a later PR)

@parmsam-pfizer
Copy link
Collaborator Author

parmsam-pfizer commented Apr 14, 2023

Yeah, I'll create an issue to implement a downloads ref cache for bioconductor and github. I've tried setting
pkg_ref_cache.downloads.default to return NA, but the challenge is that is leads to a NA return unless the source is explicitly declared as pkg_cran_remote. This leads to my test for pkg_ref_cache_downloads failing. Is that the intention with the default class?

Example after updating the code and installing package:

pr1 <- pkg_ref(c("dplyr")) %>% pkg_assess() %>% pkg_score()
print(pr1$downloads)
pr2 <- pkg_ref(c("riskmetric"), source = "pkg_cran_remote") %>% pkg_assess() %>% pkg_score() 
print(pr2$downloads)

Code snippet I've updated based on your suggestions is below:

#' @importFrom cranlogs cran_downloads
#' @keywords internal
pkg_ref_cache.downloads.default <- function(x, ...) {
  return(NA)
}

#' @importFrom cranlogs cran_downloads
#' @keywords internal
pkg_ref_cache.downloads.pkg_cran_remote <- function(x, name, ...) {
  from_date <- tryCatch({
    x$archive_release_dates[1, 'date'][[1]]
    },
    error = function(e){
      # default to downloads from_date will be Oct2012
      # Oct2012 is when RStudio CRAN mirror started publishing logs
    as.Date("2012-10-01", format = "%Y-%m-%d")
    }
  )

  cran_downloads(x$name, from = from_date, to = Sys.Date())
}

@emilliman5
Copy link
Collaborator

I see what you mean.

Maybe there should be no default method dispatch??? My thinking is, e.g., that for a pkg_ref_source we should not even be trying to populate pkg_ref_cache.downloads because the pkg_ref is not from CRAN. So maybe it is better to have no pkg_ref_cache.downloads.default and instead just the cran_remote method. and then we add a github and bioc_remote at a later date? I will have to check what happens when you run an assessment if there is no method to cache the information though.

My goal is to be intentional/explicit about what info we cache based on the ref source. Maybe we need a similar system like for metrics where we specify the reason for the NA????

I like the error handling. I need to think more on the method dispatching

@parmsam-pfizer
Copy link
Collaborator Author

I'll create a new PR with a temp fix based on @elimillera's suggestion. That will at least get the metric working again. Keeping this PR active for us to discuss further, since we're still evaluating the general approach for the metric.

@emilliman5
Copy link
Collaborator

I will resolve these conflicts and merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants