Parallelize tool source parsing to reduce startup wall time? by guerler · Pull Request #21633 · galaxyproject/galaxy

guerler · 2026-01-21T10:24:19Z

Explores a targeted mitigation for long startup times on cold, high latency filesystems by parallelizing tool XML source parsing. This affects only the IO bound XML parsing phase and provides no benefit on warm caches or fast local storage. The same pre parsed tool source objects could later be reused by a longer lived service that owns tool metadata outside of process startup.

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

mvdbeek · 2026-01-21T11:12:56Z

lib/galaxy/tools/__init__.py

    MutableMapping,
    Sequence,
 )
+from concurrent.futures import ThreadPoolExecutor


highly doubtful that this will do anything for performance, this is CPU bound. And i would urge you not to switch to ProcessPoolExecutor, if you want to do anything here I would pick one of the options in #21247 and/or do some profiling on main's toolbox as available through cvmfs

It does not seem to be CPU bound on cold CVMFS. Profiling shows ~99 percent of time spent in blocking file reads during XML source parsing. Threading is used to overlap IO wait in this phase. No ProcessPoolExecutor is used, and Tool object creation remains serial. The executor is local and optional.

8,869 tools from cold cache:

Mode Workers Wall clock Speedup

Sequential 1 30:54 1x

Parallel 16 3:00 10x

https://github.com/guerler/galaxy/blob/tool_profiler/scripts/tool_loading_profile.md

This is orthogonal to #21247 and aligned with it. Parallel source parsing reduces cold-start now and directly applies to a future toolbox microservice that preloads and serves tool sources independently imho.

https://github.com/guerler/galaxy/blob/tool_profiler/scripts/tool_loading_profile.md looks comprehensive but does not reflect what I see, which is

Galaxy app startup finished (1160363.393 ms)

or 19 minutes for a cold startup with 10 workers on an M2 mac. What exactly did you time ? Your markdown document says

get_tool_source (I/O) | 1826.48s (98.6%)

but ... that's not really doing much of the work and dumping with py-spy shows most activity in building pydantic models. Note also that:

Test methodology clears system-wide CVMFS cache, which may not represent real cold start scenarios in shared environments

mvdbeek · 2026-01-21T11:14:18Z

lib/galaxy/config/schemas/config_schema.yml

+        desc: |
+          If true, tool XML files will be parsed in parallel during Galaxy startup.
+          This can reduce startup time for instances with many tools by parallelizing
+          the XML reading and macro expansion phase. Set to false if you experience


The bottleneck is with the pydantic model construction, that's why job and worklflow handlers boot as normal

On cold CVMFS the bottleneck is not pydantic model construction. For the full tool set it accounts for ~0.2 percent of total load time. The 30+ minute startup is dominated by get_tool_source IO during XML source parsing.

guerler added this to the 26.0 milestone Jan 21, 2026

guerler added the area/performance label Jan 21, 2026

guerler force-pushed the parallel_tool_loading branch from d86a051 to f86388b Compare January 21, 2026 10:31

mvdbeek reviewed Jan 21, 2026

View reviewed changes

guerler force-pushed the parallel_tool_loading branch 4 times, most recently from e3542de to d7ab6de Compare January 22, 2026 14:30

Load tool xmls in parallel

3c2844c

guerler force-pushed the parallel_tool_loading branch from d7ab6de to 3c2844c Compare January 22, 2026 14:38

guerler modified the milestones: 26.0, 26.1 Jan 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize tool source parsing to reduce startup wall time?#21633

Parallelize tool source parsing to reduce startup wall time?#21633
guerler wants to merge 1 commit intogalaxyproject:devfrom
guerler:parallel_tool_loading

guerler commented Jan 21, 2026 •

edited

Loading

Uh oh!

mvdbeek Jan 21, 2026

Uh oh!

guerler Jan 22, 2026

Uh oh!

mvdbeek Jan 26, 2026 •

edited

Loading

Uh oh!

mvdbeek Jan 21, 2026

Uh oh!

guerler Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guerler commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to test the changes?

License

Uh oh!

mvdbeek Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

guerler Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

mvdbeek Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mvdbeek Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

guerler Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guerler commented Jan 21, 2026 •

edited

Loading

mvdbeek Jan 26, 2026 •

edited

Loading