Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault #63

Closed
rimorob opened this issue Sep 29, 2020 · 3 comments
Closed

segfault #63

rimorob opened this issue Sep 29, 2020 · 3 comments

Comments

@rimorob
Copy link

rimorob commented Sep 29, 2020

I'm getting a segfault when running doFuture; the problem seems to be in the future.batchtools backend. Here's the stack trace:
*** caught segfault ***
address 0x7ffe38150ff8, cause 'memory not mapped'

Traceback:
1: dir_map(old, identity, all, recurse, type, fail)
2: dir_ls(old, type = "directory", recurse = TRUE, all = TRUE)
3: dir_delete(old[dirs])
4: fs::file_delete(x[fs::file_exists(x)])
5: file_remove(file)
6: (function (object, file, compress = "gzip") { file_remove(file) saveRDS(object, file = file, version = 2L, compress = compress) waitForFile(file, 300) invisible(TRUE)})(object = dots[[1L]]
[[23L]], file = dots[[2L]][[23L]], compress = dots[[3L]][[1L]])
7: mapply(FUN = f, ..., SIMPLIFY = FALSE)
8: Map(writeRDS, object = export, file = fn, compress = reg$compress)
9: batchExport(export = future$globals, reg = reg)
10: run.BatchtoolsFuture(future)
11: run(future)
12: batchtools_by_template(expr, envir = envir, substitute = FALSE, globals = globals, label = label, template = template, type = "slurm", resources = resources, workers = workers, registry = regi
stry, ...)
13: makeFuture(...)
14: .makeFuture(expr, substitute = FALSE, envir = envir, globals = globals, packages = packages, seed = seed, lazy = lazy, ...)
15: future(expr, substitute = FALSE, envir = envir, globals = globals_ii, packages = packages_ii, seed = seed, stdout = stdout, conditions = conditions, label = labels[ii])
16: e$fun(obj, substitute(ex), parent.frame(), e$data)
17: foreach(si = 1:self$nBoot, .options.future = list(scheduling = 5, future.delete = TRUE), .packages = c("tidyverse", "glinternet", "R6"), .export = c("self", "as.glm.glinternet.cv")) %dopar%
{ set.seed(si) if (self$debug) { sink("log.txt", append = TRUE) print(paste("job", si, "at time", date())) sink() } sIdx = sample(x = 1:nrow(
X), size = round(nrow(X) * trainFraction), replace = FALSE) maxVars = min(floor(trainFraction * nrow(X)) - 1, ncol(X)) pbVec = aggWeights(self$featureWeights) if (trainFra
ction < 1) { varIdx = sample(x = 1:ncol(X), size = maxVars, prob = pbVec) trainX = as.matrix(X[sIdx, varIdx]) testX = as.matrix(X[-sIdx, varIdx]) trainY = Y[sId
x, , drop = F] testY = Y[-sIdx, , drop = F] if (self$randomize) { trainY = trainY %>% sample } if (self$debug) { sink("log.txt", a
ppend = TRUE) print("pre-glinternet") sink() } } else { trainX = X[, self$bootColNames[[si]]] trainY = Y } nLevel
s = numLevels(trainX) status = tryCatch({ cvModel = glinternet.cv(trainX, trainY, numLevels = nLevels, nFolds = self$nFolds, family = self$family, numCores = 1) },
error = function(e) { return(-1) }, { }) if (class(status) != "glinternet.cv") { print("cv model crashed; retrying once") cvModel = glinternet.cv(tra
inX, trainY, numLevels = nLevels, nFolds = self$nFolds, family = self$family, numCores = 1) } if (self$debug) { sink("log.txt", append = TRUE) print("po
st-cast") sink() } if (trainFraction < 1) { print("fitting a glm") glmModel = as.glm(cvModel, testX, testY, rebuildInteractions = FALSE, simp
lify = FALSE, k = log(nrow(trainX))) print("predicting out of sample") predOos = predict(cvModel, X = as.matrix(testX), lambdaType = "lambdaHat1Std", type = "response
") print("calculating r2") mRsq = do.call("glm", list(testY ~ predOos, data = data.frame(testY, predOos), family = self$family)) rsq = nagelkerke(mRsq)$Pse
udo.R.squared.for.model.vs.null["Cox and Snell (ML)", ] weight = 1/(1 - rsq) } else { predOos = NA weight = NA glmModel = as.glm(c
vModel, trainX, trainY, rebuildInteractions = FALSE, simplify = FALSE, k = log(nrow(trainX))) } print("here") fname = tempfile(pattern = "file", tmpdir = "/fsx/home/bh
ayete/Projects/EMP-prioritization/tmp", fileext = paste(".", si, ".RData", sep = "")) save(list = ls(), file = fname) print("there") print(class(glmModel)) print("-
--") featureWeights = private$calcFeatureWeights(glmModel, simplify = FALSE) rm(fname) return(list(mGli = cvModel, mGlm = glmModel, weight = weight, bootColNam
es = colnames(trainX), featureWeights = featureWeights)) }
18: private$bootstrapGlinternetIteration(trainFraction = trainFraction)

^^^ this last one is my own function containing the foreach loop.

Other noteworthy things: the machine has 128GB RAM, lightly used except by me, and the remote workers also have a lot of available memory. In any case, segfault seems to happen locally and in relation to a file storage or removal operation. I can provide other information if it would be helpful. For instance, the final run of doFuture left a number of files on disk in the .future directory and they amount to 19GB. Not sure how it gets to that #, since the data frame I'm passing around is about 1000x1000.

@rimorob
Copy link
Author

rimorob commented Sep 29, 2020

Re space usage: got to the bottom of that - it's the export folder of 0.5GB repeated 40 times. I think I've mentioned in another thread that large size of exports - which are shared among the repeats - makes for a very high disk usage. But that's not what's crashing the program. It seems to be trying to delete a file, possibly a file that doesn't exist. Might be a simple bug to fix?

@HenrikBengtsson
Copy link
Collaborator

Whatever it is, I'm 99.9999% certain it's nothing in the future framework per se. Segfaults indicates bugs in native code; future framework is all R code

@HenrikBengtsson
Copy link
Collaborator

Somewhat related: I've now documented option future.delete, cf. commit d3d4f7e

Also, I see that you've moved this to mllg/batchtools#266, so closing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants