-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault #63
Comments
Re space usage: got to the bottom of that - it's the export folder of 0.5GB repeated 40 times. I think I've mentioned in another thread that large size of exports - which are shared among the repeats - makes for a very high disk usage. But that's not what's crashing the program. It seems to be trying to delete a file, possibly a file that doesn't exist. Might be a simple bug to fix? |
Whatever it is, I'm 99.9999% certain it's nothing in the future framework per se. Segfaults indicates bugs in native code; future framework is all R code |
Somewhat related: I've now documented option Also, I see that you've moved this to mllg/batchtools#266, so closing here. |
I'm getting a segfault when running doFuture; the problem seems to be in the future.batchtools backend. Here's the stack trace:
*** caught segfault ***
address 0x7ffe38150ff8, cause 'memory not mapped'
Traceback:
1: dir_map(old, identity, all, recurse, type, fail)
2: dir_ls(old, type = "directory", recurse = TRUE, all = TRUE)
3: dir_delete(old[dirs])
4: fs::file_delete(x[fs::file_exists(x)])
5: file_remove(file)
6: (function (object, file, compress = "gzip") { file_remove(file) saveRDS(object, file = file, version = 2L, compress = compress) waitForFile(file, 300) invisible(TRUE)})(object = dots[[1L]]
[[23L]], file = dots[[2L]][[23L]], compress = dots[[3L]][[1L]])
7: mapply(FUN = f, ..., SIMPLIFY = FALSE)
8: Map(writeRDS, object = export, file = fn, compress = reg$compress)
9: batchExport(export = future$globals, reg = reg)
10: run.BatchtoolsFuture(future)
11: run(future)
12: batchtools_by_template(expr, envir = envir, substitute = FALSE, globals = globals, label = label, template = template, type = "slurm", resources = resources, workers = workers, registry = regi
stry, ...)
13: makeFuture(...)
14: .makeFuture(expr, substitute = FALSE, envir = envir, globals = globals, packages = packages, seed = seed, lazy = lazy, ...)
15: future(expr, substitute = FALSE, envir = envir, globals = globals_ii, packages = packages_ii, seed = seed, stdout = stdout, conditions = conditions, label = labels[ii])
16: e$fun(obj, substitute(ex), parent.frame(), e$data)
17: foreach(si = 1:self$nBoot, .options.future = list(scheduling = 5, future.delete = TRUE), .packages = c("tidyverse", "glinternet", "R6"), .export = c("self", "as.glm.glinternet.cv")) %dopar%
{ set.seed(si) if (self$debug) { sink("log.txt", append = TRUE) print(paste("job", si, "at time", date())) sink() } sIdx = sample(x = 1:nrow(
X), size = round(nrow(X) * trainFraction), replace = FALSE) maxVars = min(floor(trainFraction * nrow(X)) - 1, ncol(X)) pbVec = aggWeights(self$featureWeights) if (trainFra
ction < 1) { varIdx = sample(x = 1:ncol(X), size = maxVars, prob = pbVec) trainX = as.matrix(X[sIdx, varIdx]) testX = as.matrix(X[-sIdx, varIdx]) trainY = Y[sId
x, , drop = F] testY = Y[-sIdx, , drop = F] if (self$randomize) { trainY = trainY %>% sample } if (self$debug) { sink("log.txt", a
ppend = TRUE) print("pre-glinternet") sink() } } else { trainX = X[, self$bootColNames[[si]]] trainY = Y } nLevel
s = numLevels(trainX) status = tryCatch({ cvModel = glinternet.cv(trainX, trainY, numLevels = nLevels, nFolds = self$nFolds, family = self$family, numCores = 1) },
error = function(e) { return(-1) }, { }) if (class(status) != "glinternet.cv") { print("cv model crashed; retrying once") cvModel = glinternet.cv(tra
inX, trainY, numLevels = nLevels, nFolds = self$nFolds, family = self$family, numCores = 1) } if (self$debug) { sink("log.txt", append = TRUE) print("po
st-cast") sink() } if (trainFraction < 1) { print("fitting a glm") glmModel = as.glm(cvModel, testX, testY, rebuildInteractions = FALSE, simp
lify = FALSE, k = log(nrow(trainX))) print("predicting out of sample") predOos = predict(cvModel, X = as.matrix(testX), lambdaType = "lambdaHat1Std", type = "response
") print("calculating r2") mRsq = do.call("glm", list(testY ~ predOos, data = data.frame(testY, predOos), family = self$family)) rsq = nagelkerke(mRsq)$Pse
udo.R.squared.for.model.vs.null["Cox and Snell (ML)", ] weight = 1/(1 - rsq) } else { predOos = NA weight = NA glmModel = as.glm(c
vModel, trainX, trainY, rebuildInteractions = FALSE, simplify = FALSE, k = log(nrow(trainX))) } print("here") fname = tempfile(pattern = "file", tmpdir = "/fsx/home/bh
ayete/Projects/EMP-prioritization/tmp", fileext = paste(".", si, ".RData", sep = "")) save(list = ls(), file = fname) print("there") print(class(glmModel)) print("-
--") featureWeights = private$calcFeatureWeights(glmModel, simplify = FALSE) rm(fname) return(list(mGli = cvModel, mGlm = glmModel, weight = weight, bootColNam
es = colnames(trainX), featureWeights = featureWeights)) }
18: private$bootstrapGlinternetIteration(trainFraction = trainFraction)
^^^ this last one is my own function containing the foreach loop.
Other noteworthy things: the machine has 128GB RAM, lightly used except by me, and the remote workers also have a lot of available memory. In any case, segfault seems to happen locally and in relation to a file storage or removal operation. I can provide other information if it would be helpful. For instance, the final run of doFuture left a number of files on disk in the .future directory and they amount to 19GB. Not sure how it gets to that #, since the data frame I'm passing around is about 1000x1000.
The text was updated successfully, but these errors were encountered: