Parallelize tool source parsing to reduce startup wall time?#21633
Parallelize tool source parsing to reduce startup wall time?#21633guerler wants to merge 1 commit intogalaxyproject:devfrom
Conversation
d86a051 to
f86388b
Compare
lib/galaxy/tools/__init__.py
Outdated
| MutableMapping, | ||
| Sequence, | ||
| ) | ||
| from concurrent.futures import ThreadPoolExecutor |
There was a problem hiding this comment.
highly doubtful that this will do anything for performance, this is CPU bound. And i would urge you not to switch to ProcessPoolExecutor, if you want to do anything here I would pick one of the options in #21247 and/or do some profiling on main's toolbox as available through cvmfs
There was a problem hiding this comment.
It does not seem to be CPU bound on cold CVMFS. Profiling shows ~99 percent of time spent in blocking file reads during XML source parsing. Threading is used to overlap IO wait in this phase. No ProcessPoolExecutor is used, and Tool object creation remains serial. The executor is local and optional.
8,869 tools from cold cache:
| Mode | Workers | Wall clock | Speedup |
|---|---|---|---|
| Sequential | 1 | 30:54 | 1x |
| Parallel | 16 | 3:00 | 10x |
https://github.com/guerler/galaxy/blob/tool_profiler/scripts/tool_loading_profile.md
This is orthogonal to #21247 and aligned with it. Parallel source parsing reduces cold-start now and directly applies to a future toolbox microservice that preloads and serves tool sources independently imho.
There was a problem hiding this comment.
https://github.com/guerler/galaxy/blob/tool_profiler/scripts/tool_loading_profile.md looks comprehensive but does not reflect what I see, which is
Galaxy app startup finished (1160363.393 ms)
or 19 minutes for a cold startup with 10 workers on an M2 mac. What exactly did you time ? Your markdown document says
get_tool_source (I/O) | 1826.48s (98.6%)
but ... that's not really doing much of the work and dumping with py-spy shows most activity in building pydantic models. Note also that:
Test methodology clears system-wide CVMFS cache, which may not represent real cold start scenarios in shared environments
| desc: | | ||
| If true, tool XML files will be parsed in parallel during Galaxy startup. | ||
| This can reduce startup time for instances with many tools by parallelizing | ||
| the XML reading and macro expansion phase. Set to false if you experience |
There was a problem hiding this comment.
The bottleneck is with the pydantic model construction, that's why job and worklflow handlers boot as normal
There was a problem hiding this comment.
On cold CVMFS the bottleneck is not pydantic model construction. For the full tool set it accounts for ~0.2 percent of total load time. The 30+ minute startup is dominated by get_tool_source IO during XML source parsing.
e3542de to
d7ab6de
Compare
d7ab6de to
3c2844c
Compare
Explores a targeted mitigation for long startup times on cold, high latency filesystems by parallelizing tool XML source parsing. This affects only the IO bound XML parsing phase and provides no benefit on warm caches or fast local storage. The same pre parsed tool source objects could later be reused by a longer lived service that owns tool metadata outside of process startup.
How to test the changes?
(Select all options that apply)
License