Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

muscle3 profile command fails because it cannot find entry in perfromance.sqlite database associated with a name of an instance #272

Open
YehorYudinIPP opened this issue Oct 6, 2023 · 3 comments

Comments

@YehorYudinIPP
Copy link

YehorYudinIPP commented Oct 6, 2023

Calling muscle3 profile -r performance.sqlite fails with KeyError: 'stop', where stop is the name of a workflow instance. The version of MUSCLE3 library is 0.7.0

The total error stack in Python is:

Traceback (most recent call last):
File "/u/yyudin/conda-envs/python3114/bin/muscle3", line 8, in
sys.exit(muscle3())
^^^^^^^^^
File "/u/yyudin/conda-envs/python3114/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yyudin/conda-envs/python3114/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/u/yyudin/conda-envs/python3114/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yyudin/conda-envs/python3114/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yyudin/conda-envs/python3114/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yyudin/conda-envs/python3114/lib/python3.11/site-packages/muscle3/muscle3.py", line 73, in profile
plot_resources(Path(performance_file))
File "/u/yyudin/conda-envs/python3114/lib/python3.11/site-packages/muscle3/profiling.py", line 52, in plot_resources
stats = db.resource_stats()
^^^^^^^^^^^^^^^^^^^
File "/u/yyudin/conda-envs/python3114/lib/python3.11/site-packages/libmuscle/manager/profile_database.py", line 161, in resource_stats
instances, run_times, comm_times, _ = self.instance_stats()
^^^^^^^^^^^^^^^^^^^^^
File "/u/yyudin/conda-envs/python3114/lib/python3.11/site-packages/libmuscle/manager/profile_database.py", line 131, in instance_stats
total_times = [(stop_run[i] - start_run[i]) * 1e-9 for i in instances]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yyudin/conda-envs/python3114/lib/python3.11/site-packages/libmuscle/manager/profile_database.py", line 131, in
total_times = [(stop_run[i] - start_run[i]) * 1e-9 for i in instances]
~~~~~~~~^^^
KeyError: 'stop'

@LourensVeen
Copy link
Contributor

LourensVeen commented Oct 6, 2023

It looks like this instance started its run but didn't shut down cleanly. When that happens, the shutdown isn't recorded in the database, and so MUSCLE3 doesn't know how long it ran and cannot calculate which percentage of the core's time it used.

There should be a test and a nice error message here at least, so thanks for reporting this!

Could you check the log and see if it says anything about the stop instance crashing or shutting down because something else crashed?

Also, which version of MUSCLE3 are you using?

@YehorYudinIPP
Copy link
Author

YehorYudinIPP commented Oct 6, 2023

Thanks Lourens! I updated the comment, it's MUSCLE3 0.7.0
Indeed, the workflow failed due to an error in a turbulence_sim component:

muscle_manager 2023-10-06 00:15:14,185 ERROR libmuscle.manager.instance_manag
er: Instance turbulence_sim quit with error 38

Which in its turn failed due to overcrowding my cluster's hard drive, unfortunately:

forrtl: Disk quota exceeded
forrtl: severe (38): error during write, unit 42, file /cobra/u/yyudin/code/MFW
/muscle3/workflow/run_fusion_gem_multiimpl_20231005_criteria/instances/turbulen
ce_sim/workdir/p02.dat

@LourensVeen
Copy link
Contributor

Okay, yes, then the issue here is that there should be a better error message. I'll go fix that. Note that 0.7.1 is out with several fixes to the profiling system (including that muscle3 profile -t is now working), so you may want to upgrade 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants