-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PBS: detection of qstat failures no longer working #6531
Comments
I've found I can trigger various PBS errors by setting the PBS 2022.1.7:
PBS 18.2.6:
I think we should be safe to change the search string to "cannot connect to server". |
oliver-sanders
added a commit
to oliver-sanders/cylc-flow
that referenced
this issue
Jan 22, 2025
* Addresses cylc#6531 on the 7.8.x branch. * Update the PBS job runner to reflect a change in error message. * The new pattern will also match the old format.
8 tasks
oliver-sanders
added a commit
to oliver-sanders/cylc-flow
that referenced
this issue
Jan 22, 2025
* Addresses cylc#6531 on the 7.8.x branch. * Update the PBS job runner to reflect a change in error message. * The new pattern will also match the old format.
8 tasks
8 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
#2691 added code to detect qstat failures by searching for "Connection refused" in stderr. However this is not working on our new system which is resulting in jobs being incorrectly reported as failed when polled.
Information at the time indicated we could expect to see errors like this if qstat failed to contact the server:
However, we now seeing errors like this from PBS 2022.1.7:
For the moment I think we would be safe to change the search string to "cannot connect" (or possibly "errno").
Longer term we should consider other ways to make the polling more robust, see #3436
The text was updated successfully, but these errors were encountered: