New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Q] Offline currently running jobs shows as "Finished" in the UI, am I doing it wrong? #7376
Comments
Hi @ankile, this does seem like a bug on our end where a live run is being marked as finished upon being synced. We did fix this issue a few years ago and it seems like this is a regression. Here's the condition in our code which handles this, we'll investigate further internally and identify the root cause. I was able to reproduce this with the following script as well:
On a side note, W&B recommends syncing an offline run once it finishes (after
in your console when syncing an active run. |
Thank you so much for your response and this info! Have you had a chance to look more into why this is happening and how to fix it @anmolmann? Alternatively, are there any escape hatches I could implement directly in the wandb code to fix it while we wait for a fix to be released? |
Hi
I'm running model training on a compute cluster where the compute nodes do not have access to the internet. Therefore, while the jobs are running, I'm calling
wandb sync
at regular intervals from a node with internet access that shares the same file system, so that I can follow the training in the UI. When I do this, however, the jobs that are running are listed as "Finished" in the UI throughout the whole run, which makes downstream evaluation problematic as it's set to only run eval on finished jobs.My question is, is the expected behavior, or am I doing something wrong?
Best, Lars
The text was updated successfully, but these errors were encountered: