[26.0] Fix IndexError in SplitPairedAndUnpairedTool for unpaired elements by ernestprovo23 · Pull Request #22282 · galaxyproject/galaxy

ernestprovo23 · 2026-03-28T02:00:10Z

Summary

Fix IndexError: list index out of range when __SPLIT_PAIRED_AND_UNPAIRED__ processes a list:paired_or_unpaired collection containing unpaired singleton elements.

Fixes #22204

Changes

In SplitPairedAndUnpairedTool.produce_outputs() (lib/galaxy/tools/__init__.py):

Replace history_content_type == "dataset" check with len(sub_collection.elements) == 1 to distinguish unpaired singletons from paired elements in list:paired_or_unpaired collections
Unwrap singleton sub-collections in _handle_unpaired since the original assert history_content_type == "dataset" also fails for wrapped singletons

Root Cause

In a list:paired_or_unpaired collection, every element is a DatasetCollection sub-collection — even unpaired singletons are wrapped in a 1-element sub-collection of type paired_or_unpaired. The history_content_type == "dataset" check was always False, routing all elements to _handle_paired, which crashes accessing elements[1] on 1-element sub-collections. Day-one bug since PR #19377.

Credit to @dannon for the thorough triage that identified the root cause and fix direction.

mvdbeek

Awesome, thanks so much @ernestprovo23, pushed a minor tweak and a test.

mvdbeek · 2026-03-30T10:21:30Z

Ah, that is fun, the tool test specifies

<collection type="list:paired_or_unpaired">
  <element name="el1">
    <collection type="paired">
      <element name="forward" value="simple_line.txt" />
      <element name="reverse" value="simple_line_alternative.txt" />
    </collection>
  </element>
  <element name="el2" value="simple_line.txt">
  </element>
  <element name="el3" value="simple_line_alternative.txt">
  </element>
</collection>

i.e unpaired elements appear without the extra wrapping.

@jmchilton what do you want to do here ? I was surprised about the extra unpaired layer, otoh it makes the structure a little more homogenous ... we should probably have just one way to do this ?

jmchilton · 2026-03-30T12:29:05Z

I read the workflow test case here and that made good sense to me - is the tool XML layer doing extra magic here I guess to produce that same structure or is the tool XML producing a list structure with varying degrees of the depth in the tree. The Github interface is falling over for me on that original PR - it was too big - I'll need to research this more but my gut says you're right this should consistent. I'd be more anxious about actual inconsistent lists in the database - if the XML tool parser is just doing some magic during creation of these lists - that seems harmless but arguably too much magic. I've got a busy morning but I will try to return to this unless you want to keep working on it.

mvdbeek · 2026-03-30T12:43:03Z

The test interactor does create a list:paired_or_unpaired exactly as specified, i.e. there's no intermediate unpaired DCE, which caused

ERROR    galaxy.tools:__init__.py:2535 Exception caught while attempting to execute tool with id '__SPLIT_PAIRED_AND_UNPAIRED__':
Traceback (most recent call last):
  File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/tools/__init__.py", line 2509, in handle_single_execution
    rval = self._execute(
  File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/tools/__init__.py", line 2647, in _execute
    return self.tool_action.execute(
  File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/tools/actions/model_operations.py", line 121, in execute
    self._produce_outputs(
  File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/tools/actions/model_operations.py", line 160, in _produce_outputs
    tool.produce_outputs(
  File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/tools/__init__.py", line 4119, in produce_outputs
    if len(sub_collection.elements) == 1:
AttributeError: 'HistoryDatasetAssociation' object has no attribute 'elements'
WARNING  galaxy.tools.execute:execute.py:451 There was a failure executing a job for tool [__SPLIT_PAIRED_AND_UNPAIRED__] - Error executing tool with id '__SPLIT_PAIRED_AND_UNPAIRED__': 'HistoryDatasetAssociation' object has no attribute 'elements'
DEBUG    galaxy.tools.execute:execute.py:361 Created 1 job(s) for tool __SPLIT_PAIRED_AND_UNPAIRED__ request (44.946 ms)

in https://github.com/galaxyproject/galaxy/actions/runs/23736910319/job/69143831597

jmchilton · 2026-03-30T18:25:40Z

I would have assumed we were running these tool tests as part of the framework tests. That sucks - I'm sorry about the bug.

ernestprovo23 · 2026-03-30T23:12:15Z

Hi — the three CI failures look unrelated to this change:

Test (3.10) — 5 test_trs_import.py tests failing on WorkflowHub TRS imports (timeout/import issues)
Test (3.10, 1) — Selenium test_refresh_preserves_state assertion failure (UI flake)
Test (3.10, 3) — test_data_manager_workflow_bundle invocation timeout

None of these touch the SplitPairedAndUnpairedTool fix in lib/galaxy/tools/__init__.py. Happy to re-run CI if that would help.

mvdbeek · 2026-03-31T15:00:58Z

Per @jmchilton, the tool test should be invalid, we want the extra layer "unpaired" layer.

The list:paired_or_unpaired branch in produce_outputs used history_content_type == "dataset" to distinguish paired from unpaired elements. This check is always False because even singletons are wrapped in sub-collections, so all elements were routed to _handle_paired which crashes on elements[1] for 1-element collections. Replace the history_content_type check with element count inspection on the sub-collection. Also unwrap singletons in _handle_unpaired since the assert there fails for the same reason. Fixes galaxyproject#22204

Replace defensive getattr check with proper assertions: verify the inner element identifier matches SINGLETON_IDENTIFIER and that the unwrapped object is a dataset.

Test splitting a list:paired_or_unpaired collection with mixed paired and unpaired elements, verifying both output collections receive the correct elements.

In list:paired_or_unpaired collections, unpaired elements may be raw HDAs rather than wrapped in sub-collections. Use getattr to safely check for elements before dispatching.

For nested collection types like list:paired_or_unpaired and list:paired, verify in the manager that every element value is a DatasetCollection whose collection_type matches the expected sub-collection type. Otherwise the API silently accepted raw HDAs as children, producing collections that crashed downstream tools (e.g. SplitPairedAndUnpairedTool). The third tool test in split_paired_and_unpaired.xml relied on the unvalidated path: el2/el3 were raw datasets rather than paired_or_unpaired sub-collections wrapping a singleton "unpaired" element. Update the test to use the canonical wrapped form so it passes against the new validator.

Now that the dataset collection manager rejects list:paired_or_unpaired collections containing raw HDAs, every element_object reaching the dispatch loop is guaranteed to be a paired_or_unpaired sub-collection. Drop the defensive getattr fallback added in 07832df.

mvdbeek · 2026-04-07T15:37:38Z

@jmchilton i've added validation that rejects constructing these invalid collections, let me know if that makes sense

The third tool test in split_paired_and_unpaired.xml wrapped el1 as a plain `paired` sub-collection, producing a list:paired_or_unpaired whose children had mixed types. The new strict validator correctly rejected that as a structurally invalid input at test staging time. Update the test XML so every child uses the canonical paired_or_unpaired form, matching the workflow test in split_paired_and_unpaired.gxwf-tests.yml. With the canonical form, SplitPairedAndUnpairedTool._handle_paired may receive 2-element paired_or_unpaired sub-collections and previously forwarded the verbatim copy (with the wider collection_type) into its list:paired output, which would then fail the strict validator on output creation. Normalize the copy to collection_type="paired" before handing it to the output builder -- a 2-element paired_or_unpaired is structurally identical to a paired (same forward/reverse identifiers).

github-project-automation bot added this to Galaxy Dev - weeklies Mar 28, 2026

github-project-automation bot moved this to Needs Review in Galaxy Dev - weeklies Mar 28, 2026

github-actions bot added the area/tool-framework label Mar 28, 2026

github-actions bot added this to the 26.1 milestone Mar 28, 2026

mvdbeek approved these changes Mar 30, 2026

View reviewed changes

mvdbeek added the Backport stable Backport this to last stable released branch label Mar 30, 2026

ernestprovo23 and others added 6 commits April 7, 2026 17:34

Assert unpaired element identifier in SplitPairedAndUnpairedTool

02695a3

Replace defensive getattr check with proper assertions: verify the inner element identifier matches SINGLETON_IDENTIFIER and that the unwrapped object is a dataset.

Add workflow test for SplitPairedAndUnpairedTool

dfacf98

Test splitting a list:paired_or_unpaired collection with mixed paired and unpaired elements, verifying both output collections receive the correct elements.

Fix unwrapped unpaired elements in split tool

49853d8

In list:paired_or_unpaired collections, unpaired elements may be raw HDAs rather than wrapped in sub-collections. Use getattr to safely check for elements before dispatching.

mvdbeek force-pushed the fix/issue-22204-split-paired-unpaired branch from 07832df to ecd7c36 Compare April 7, 2026 15:35

mvdbeek changed the base branch from dev to release_26.0 April 7, 2026 15:36

mvdbeek changed the title ~~Fix IndexError in SplitPairedAndUnpairedTool for unpaired elements~~ [26.0] Fix IndexError in SplitPairedAndUnpairedTool for unpaired elements Apr 7, 2026

mvdbeek requested a review from jmchilton April 7, 2026 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[26.0] Fix IndexError in SplitPairedAndUnpairedTool for unpaired elements#22282

[26.0] Fix IndexError in SplitPairedAndUnpairedTool for unpaired elements#22282
ernestprovo23 wants to merge 7 commits intogalaxyproject:release_26.0from
ernestprovo23:fix/issue-22204-split-paired-unpaired

ernestprovo23 commented Mar 28, 2026

Uh oh!

mvdbeek left a comment

Uh oh!

mvdbeek commented Mar 30, 2026 •

edited

Loading

Uh oh!

jmchilton commented Mar 30, 2026

Uh oh!

mvdbeek commented Mar 30, 2026

Uh oh!

jmchilton commented Mar 30, 2026

Uh oh!

ernestprovo23 commented Mar 30, 2026

Uh oh!

mvdbeek commented Mar 31, 2026

Uh oh!

mvdbeek commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ernestprovo23 commented Mar 28, 2026

Summary

Changes

Root Cause

Uh oh!

mvdbeek left a comment

Choose a reason for hiding this comment

Uh oh!

mvdbeek commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmchilton commented Mar 30, 2026

Uh oh!

mvdbeek commented Mar 30, 2026

Uh oh!

jmchilton commented Mar 30, 2026

Uh oh!

ernestprovo23 commented Mar 30, 2026

Uh oh!

mvdbeek commented Mar 31, 2026

Uh oh!

mvdbeek commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mvdbeek commented Mar 30, 2026 •

edited

Loading