Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eqw on SGE cluster while R code finishes without error #71

Open
Luqing-Zhang opened this issue Mar 6, 2021 · 0 comments
Open

Eqw on SGE cluster while R code finishes without error #71

Luqing-Zhang opened this issue Mar 6, 2021 · 0 comments

Comments

@Luqing-Zhang
Copy link

Hi,

I use future.batchtools a lot with our SGE cluster. All of a sudden, I begin to encounter a problem, the future.batchtools chunk of code inside R finishes without any error. But among 1563 jobs submitted, a few (1~5 randomly) will finally become Eqw with an error like below (by qstat -j $jobname).

03/06/2021 17:58:23 [1506654697:17781]: error: can't open stdout output file "/pQTL/.future/20210306_161441-5LTXzg/future_lapply-72_293396299/logs/job99655decd726a355cf6b8d8746efadcf.log": No such file or directory
scheduling info: (Collecting of scheduler job information is turned off)

Inside the .future folder, there are about 200 job folders not removed by future and the specific folder in the error message doesn't exist at all( I suppose it has been removed by future automatically).

Based on my previous experience with future.batchtools, if a job finishes correctly, the folder of the job inside .future will be removed. If the job has any error, the folder will not be removed and there will be an error message inside R.
Questions:
The situation is R code finishes without error, why there are hundreds of future jobs folders left inside the .future folder (among 1563 jobs, roughly 200 folders left and the log file is empty)? Why they are not removed even finished correctly?

Why there are few Eqw on the SGE cluster that has no corresponding folder(The folder does exist at the time we start future SGE jobs, it seems future considers them as successfully finished jobs and removed them).

Do you think this is an issue of the future.batchools package or an issue I should go to our HPC infrastructure team? Thanks much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant