Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues when running on UNIX HPC environment #615

Open
stephenashton-dhsc opened this issue May 4, 2022 · 4 comments
Open

Issues when running on UNIX HPC environment #615

stephenashton-dhsc opened this issue May 4, 2022 · 4 comments

Comments

@stephenashton-dhsc
Copy link

stephenashton-dhsc commented May 4, 2022

Describe the bug

I can run a future containing a custom method on a Windows laptop within RStudio without issue, but when I launch this onto a HPC node, it fails.

Reproduce example

methods::setGeneric(
  "my_custom_method",
  function(x) {
    standardGeneric("my_custom_method")
  }
)

methods::setMethod(
  "my_custom_method",
  methods::signature(x = "numeric"),
  function(x) {
    y <- x^2
    return(y)
  }
)

library(future)
future::plan("future::multicore")
value(future({my_custom_method(1)}))

Return:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘my_custom_method’ for signature ‘"numeric"’

Expected behaviour

[1] 1

Session information

Please share your session information after the error has occurred so that we also see which packages and versions are involved;

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /usr/local/packages/R/4.1.1/lib64/R/lib/libRblas.so
LAPACK: /usr/local/packages/R/4.1.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] future_1.25.0

loaded via a namespace (and not attached):
[1] compiler_4.1.1    parallelly_1.31.1 tools_4.1.1       parallel_4.1.1   
[5] listenv_0.8.0     codetools_0.2-18  digest_0.6.29     globals_0.14.0   

> future::futureSessionInfo()
*** Package versions
future 1.25.0, parallelly 1.31.1, parallel 4.1.1, globals 0.14.0, listenv 0.8.0

*** Allocations
availableCores():
        system cgroups.cpuset          nproc          Slurm 
            96             96             96             96 
availableWorkers():
$Slurm
 [1] "hpccol027" "hpccol027" "hpccol027" "hpccol027" "hpccol027" "hpccol027"
 [6] "hpccol027" "hpccol027" "hpccol027" "hpccol027" "hpccol027" "hpccol027"
 …
[91] "hpccol027" "hpccol027" "hpccol027" "hpccol027" "hpccol027" "hpccol027"

$system
 [1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
 [6] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
 …
[91] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"


*** Settings
- future.plan=<not set>
- future.fork.multithreading.enable=<not set>
- future.globals.maxSize=<not set>
- future.globals.onReference=<not set>
- future.resolve.recursive=<not set>
- future.rng.onMisuse=<not set>
- future.wait.timeout=<not set>
- future.wait.interval=<not set>
- future.wait.alpha=<not set>
- future.startup.script=<not set>

*** Backends
Number of workers: 96
List of future strategies:
1. multicore:
   - args: function (..., workers = availableCores(constraints = "multicore"), envir = parent.frame())
   - tweaked: FALSE
   - call: future::plan("future::multicore")

*** Basic tests
   worker   pid     r sysname                     release
1       1 87160 4.1.1   Linux 3.10.0-1160.31.1.el7.x86_64
2       2 87165 4.1.1   Linux 3.10.0-1160.31.1.el7.x86_64
3       3 87171 4.1.1   Linux 3.10.0-1160.31.1.el7.x86_6496     96 87639 4.1.1   Linux 3.10.0-1160.31.1.el7.x86_64
                               version                       nodename machine
1  #1 SMP Thu Jun 10 13:32:12 UTC 2021 hpccol027.smed.unix.MYDOMAIN  x86_64
2  #1 SMP Thu Jun 10 13:32:12 UTC 2021 hpccol027.smed.unix.MYDOMAIN  x86_64
3  #1 SMP Thu Jun 10 13:32:12 UTC 2021 hpccol027.smed.unix.MYDOMAIN  x86_6496 #1 SMP Thu Jun 10 13:32:12 UTC 2021 hpccol027.smed.unix.MYDOMAIN  x86_64
     login                      user            effective_user
1  unknown stephen.ashton@MYDOMAIN stephen.ashton@MYDOMAIN
2  unknown stephen.ashton@MYDOMAIN stephen.ashton@MYDOMAIN
3  unknown stephen.ashton@MYDOMAIN stephen.ashton@MYDOMAIN96 unknown stephen.ashton@MYDOMAIN stephen.ashton@MYDOMAIN
Number of unique PIDs: 96 (as expected)
@stephenashton-dhsc
Copy link
Author

Please note MYDOMAIN is covering over the domain containing my user profile and the server location.

@stephenashton-dhsc
Copy link
Author

stephenashton-dhsc commented May 4, 2022

Having done some further testing, I believe this is to do with the future::multicore strategy.

It also fails when using future::sequential but succeeds using future::multisession

I suspect this means that the error is actually present in future::sequential, and my session on the HPC is defaulting to this, rather than using the future::multicore strategy as intended (although future::supportsMulticore() returns TRUE on the HPC environment)

@stephenashton-dhsc
Copy link
Author

I can also confirm that this behaviour is present on my Windows laptop - the code fails when using future::sequential, but succeeds using future::multisession. It also fails when using future::multicore, but I assume this is due to it defaulting the future::sequential where future::supportsMulticore() is FALSE.

@HenrikBengtsson
Copy link
Collaborator

Hi, I can reproduce this. I was surprised that it worked for multisession, but not sequential and multicore; normally it's the other way around. However, it turns out, this is most likely related to the recent #608 bug. The workaround is the same: set (hidden) option future.globals.keepWhere to TRUE as is:

library(future)
library(methods)
options(future.globals.keepWhere = TRUE)

setGeneric("my_custom_method", function(x) {
    standardGeneric("my_custom_method")
})

setMethod("my_custom_method", methods::signature(x = "numeric"), function(x) {
  x^2
})

plan(sequential)                   ## works with future.globals.keepWhere = TRUE
# plan(multicore, workers = 2L)    ## works with future.globals.keepWhere = TRUE
# plan(multisession, workers = 2L)

f <- future({ my_custom_method(2) })
v <- value(f)
print(v)
stopifnot(v == my_custom_method(2))

@HenrikBengtsson HenrikBengtsson added this to the Next release milestone May 6, 2022
HenrikBengtsson added a commit that referenced this issue May 8, 2022
…d to be found in sequential and multicore futures since future 1.22.0 [#615]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants