Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub workflow with sanitizers #6746

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

GitHub workflow with sanitizers #6746

wants to merge 4 commits into from

Conversation

aitap
Copy link
Contributor

@aitap aitap commented Jan 21, 2025

There are multiple ways to write the workflow in order to detect the issues like #6733.

One is to be quite close to the reference R-hub workflow description, which is in the HEAD of the current pull request. Here's how it fails on a recent master.

The other is to write a more custom workflow, still reusing the container maintained by Gábor Csárdi and the infrastructure providing binary packages linked with libc++ for the clang+sanitizers check. It's in the parent of the current pull request or the GHA-sanitizers branch. Here's what the failure looks like.

(The supported option is to use rhub::rhub_setup().)

Comments from people experienced with GHA are very welcome; this is the third time I'm doing any CI and the first time I'm writing a non-throwaway GitHub workflow.

@aitap aitap requested a review from MichaelChirico as a code owner January 21, 2025 11:40
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.62%. Comparing base (fd2a915) to head (0c32a51).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6746   +/-   ##
=======================================
  Coverage   98.62%   98.62%           
=======================================
  Files          79       79           
  Lines       14642    14642           
=======================================
  Hits        14440    14440           
  Misses        202      202           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MichaelChirico
Copy link
Member

MichaelChirico commented Jan 21, 2025

Did you have a look at the clang-ubsan .devcontainer we have? I have used that in the past. I am not sure about performance in GHA.

https://github.com/Rdatatable/data.table/blob/7445df7473869a73103717a3f34d4e233d1c012b/.devcontainer/r-devel-clang-ubsan/Dockerfile

I'm not sure why the most recent run was cancelled, I'll close & re-open to re-trigger:

https://github.com/Rdatatable/data.table/actions/runs/12886173136

At a higher level, if performance is bad (say >=20% slower than any current PR-level action), I would recommend only running such an action on master.

To date, master-only checks have been done in GLCI -- @ben-schwen / @jangorecki do you know if we've ever had a UBSan (or MSan etc) action set up on GitLab?

@aitap
Copy link
Contributor Author

aitap commented Jan 22, 2025

Thank you for pointing me towards the devcontainer! Unlike rocker/r-devel-ubsan, rocker/r-devel-san-clang does have AddressSanitizer enabled, and we can try to additionally enable LeakSanitizer like here.

I don't think it we'll get it significantly faster than 4 minutes (or anywhere close to 40 seconds), especially if optional dependencies are introduced, so let's leave it on master.

aitap added 2 commits January 28, 2025 17:38
Also:
- Do not check pull requests
- Try to cache installed dependencies
@aitap
Copy link
Contributor Author

aitap commented Jan 28, 2025

Using rocker/r-devel-ubsan-clang with GitHub Actions takes a bit more effort.

Installing the dependencies from source (including those from inst/tests/other.Rraw) takes more than 20 minutes. The installed packages can be cached, saving time for future re-runs, but the cache is immutable, leaving us the choice between keying the cache by R-devel version only (making the cached packages gradually more out of date until a new R version is released) and keying the cache by versions of all involved packages (expiring the cache as a whole the moment a single package needs to be updated).

I suppose we could try to abuse the cache system by making an individual cache key for every needed package (plus its version plus the version of R-devel), but it's definitely not intended to be used this way.

Is GitLab CI better for tasks like these, with slowly changing 100-megabyte caches?

@MichaelChirico
Copy link
Member

MichaelChirico commented Jan 28, 2025

GLCI is probably the right way to go (cc @jangorecki & @ben-schwen).

Just want to note that because of our minimal approach to dependencies, the cache approach should be OK in general -- I guess the most important package to check with the sanitizer will be bit64, which is relatively stable / won't change noticeably more often than R (I am the maintainer & certainly don't plan a lot of activity 😉). The other packages are probably not important for testing with the sanitizer per se, so if they are slightly behind CRAN it's unlikely to cause issues.

@ben-schwen
Copy link
Member

ben-schwen commented Jan 29, 2025

In GLCI we also cache the pkgs and install them but it seems that in this GHA we also install all dependencies for these pkgs?

Installing package(s) 'BiocVersion', 'bit64', 'bit', 'R.utils', 'xts', 'zoo',
  'yaml', 'knitr', 'markdown', 'ggplot2', 'hexbin', 'plyr', 'dplyr', 'caret',
  'gdata', 'nlme', 'sf', 'nanotime'
also installing the dependencies ‘listenv’, ‘parallelly’, ‘future’, ‘globals’, ‘digest’, ‘shape’, ‘future.apply’, ‘numDeriv’, ‘progressr’, ‘SQUAREM’, ‘diagram’, ‘lava’, ‘colorspace’, ‘tzdb’, ‘cpp11’, ‘prodlim’, ‘timechange’, ‘stringi’, ‘farver’, ‘labeling’, ‘munsell’, ‘RColorBrewer’, ‘viridisLite’, ‘fansi’, ‘pkgconfig’, ‘utf8’, ‘proxy’, ‘iterators’, ‘data.table’, ‘clock’, ‘gower’, ‘hardhat’, ‘ipred’, ‘lubridate’, ‘purrr’, ‘tidyr’, ‘timeDate’, ‘stringr’, ‘wk’, ‘R.oo’, ‘R.methodsS3’, ‘evaluate’, ‘highr’, ‘xfun’, ‘commonmark’, ‘cli’, ‘glue’, ‘gtable’, ‘isoband’, ‘lifecycle’, ‘rlang’, ‘scales’, ‘tibble’, ‘vctrs’, ‘withr’, ‘Rcpp’, ‘generics’, ‘magrittr’, ‘pillar’, ‘R6’, ‘tidyselect’, ‘e1071’, ‘foreach’, ‘ModelMetrics’, ‘pROC’, ‘recipes’, ‘reshape2’, ‘gtools’, ‘classInt’, ‘DBI’, ‘s2’, ‘units’, ‘RcppCCTZ’, ‘RcppDate’

In comparison in GLCI we only install dependencies from DESCRIPTION and

‘R.oo’, ‘R.methodsS3’, ‘lattice’, ‘evaluate’, ‘highr’, ‘xfun’, ‘commonmark’

@aitap
Copy link
Contributor Author

aitap commented Jan 29, 2025

These are the strong dependencies of the packages from inst/tests/other.Rraw:

parse('inst/tests/other.Rraw')[[1]][[3]] |> eval() |>
 tools::package_dependencies(which = 'strong', recursive=TRUE) |>
 unlist() |> unique()
  [1] "cli"          "glue"         "grDevices"    "grid"         "gtable"
  [6] "isoband"      "lifecycle"    "MASS"         "mgcv"         "rlang"
 [11] "scales"       "stats"        "tibble"       "vctrs"        "withr"
 [16] "utils"        "methods"      "nlme"         "graphics"     "Matrix"
 [21] "splines"      "farver"       "labeling"     "munsell"      "R6"
 [26] "RColorBrewer" "viridisLite"  "fansi"        "magrittr"     "pillar"
 [31] "pkgconfig"    "colorspace"   "lattice"      "utf8"         "Rcpp"
 [36] "generics"     "tidyselect"   "ggplot2"      "e1071"        "foreach"
 [41] "ModelMetrics" "plyr"         "pROC"         "recipes"      "reshape2"
 [46] "stats4"       "class"        "proxy"        "codetools"    "iterators"
 [51] "data.table"   "dplyr"        "clock"        "gower"        "hardhat"
 [56] "ipred"        "lubridate"    "purrr"        "tidyr"        "timeDate"
 [61] "stringr"      "tzdb"         "cpp11"        "rpart"        "survival"
 [66] "nnet"         "prodlim"      "timechange"   "stringi"      "diagram"
 [71] "KernSmooth"   "lava"         "tools"        "shape"        "future.apply"
 [76] "numDeriv"     "progressr"    "SQUAREM"      "future"       "globals"
 [81] "parallel"     "digest"       "listenv"      "parallelly"   "zoo"
 [86] "gtools"       "bit"          "evaluate"     "highr"        "xfun"
 [91] "yaml"         "classInt"     "DBI"          "s2"           "units"
 [96] "wk"           "bit64"        "RcppCCTZ"     "RcppDate"     "R.oo"
[101] "R.methodsS3"

(Including data.table itself, which is interesting.)

BiocVersion is caused by me being lazy and relying on BiocManager::install to only install/update the packages where the latest version is not already installed.

Not installing the packages from other.Rraw will indeed save a lot of time.

@MichaelChirico
Copy link
Member

oh, I definitely would not concern myself with getting other.Rraw working in the sanitizer GHA :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants