Skip to content

[26.0] Fix IndexError in SplitPairedAndUnpairedTool for unpaired elements#22282

Open
ernestprovo23 wants to merge 7 commits intogalaxyproject:release_26.0from
ernestprovo23:fix/issue-22204-split-paired-unpaired
Open

[26.0] Fix IndexError in SplitPairedAndUnpairedTool for unpaired elements#22282
ernestprovo23 wants to merge 7 commits intogalaxyproject:release_26.0from
ernestprovo23:fix/issue-22204-split-paired-unpaired

Conversation

@ernestprovo23
Copy link
Copy Markdown

Summary

Fix IndexError: list index out of range when __SPLIT_PAIRED_AND_UNPAIRED__ processes a list:paired_or_unpaired collection containing unpaired singleton elements.

Fixes #22204

Changes

In SplitPairedAndUnpairedTool.produce_outputs() (lib/galaxy/tools/__init__.py):

  • Replace history_content_type == "dataset" check with len(sub_collection.elements) == 1 to distinguish unpaired singletons from paired elements in list:paired_or_unpaired collections
  • Unwrap singleton sub-collections in _handle_unpaired since the original assert history_content_type == "dataset" also fails for wrapped singletons

Root Cause

In a list:paired_or_unpaired collection, every element is a DatasetCollection sub-collection — even unpaired singletons are wrapped in a 1-element sub-collection of type paired_or_unpaired. The history_content_type == "dataset" check was always False, routing all elements to _handle_paired, which crashes accessing elements[1] on 1-element sub-collections. Day-one bug since PR #19377.

Credit to @dannon for the thorough triage that identified the root cause and fix direction.

Copy link
Copy Markdown
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks so much @ernestprovo23, pushed a minor tweak and a test.

@mvdbeek mvdbeek added the Backport stable Backport this to last stable released branch label Mar 30, 2026
@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Mar 30, 2026

Ah, that is fun, the tool test specifies

<collection type="list:paired_or_unpaired">
  <element name="el1">
    <collection type="paired">
      <element name="forward" value="simple_line.txt" />
      <element name="reverse" value="simple_line_alternative.txt" />
    </collection>
  </element>
  <element name="el2" value="simple_line.txt">
  </element>
  <element name="el3" value="simple_line_alternative.txt">
  </element>
</collection>

i.e unpaired elements appear without the extra wrapping.

@jmchilton what do you want to do here ? I was surprised about the extra unpaired layer, otoh it makes the structure a little more homogenous ... we should probably have just one way to do this ?

@jmchilton
Copy link
Copy Markdown
Member

I read the workflow test case here and that made good sense to me - is the tool XML layer doing extra magic here I guess to produce that same structure or is the tool XML producing a list structure with varying degrees of the depth in the tree. The Github interface is falling over for me on that original PR - it was too big - I'll need to research this more but my gut says you're right this should consistent. I'd be more anxious about actual inconsistent lists in the database - if the XML tool parser is just doing some magic during creation of these lists - that seems harmless but arguably too much magic. I've got a busy morning but I will try to return to this unless you want to keep working on it.

@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Mar 30, 2026

The test interactor does create a list:paired_or_unpaired exactly as specified, i.e. there's no intermediate unpaired DCE, which caused

ERROR    galaxy.tools:__init__.py:2535 Exception caught while attempting to execute tool with id '__SPLIT_PAIRED_AND_UNPAIRED__':
Traceback (most recent call last):
  File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/tools/__init__.py", line 2509, in handle_single_execution
    rval = self._execute(
  File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/tools/__init__.py", line 2647, in _execute
    return self.tool_action.execute(
  File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/tools/actions/model_operations.py", line 121, in execute
    self._produce_outputs(
  File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/tools/actions/model_operations.py", line 160, in _produce_outputs
    tool.produce_outputs(
  File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/tools/__init__.py", line 4119, in produce_outputs
    if len(sub_collection.elements) == 1:
AttributeError: 'HistoryDatasetAssociation' object has no attribute 'elements'
WARNING  galaxy.tools.execute:execute.py:451 There was a failure executing a job for tool [__SPLIT_PAIRED_AND_UNPAIRED__] - Error executing tool with id '__SPLIT_PAIRED_AND_UNPAIRED__': 'HistoryDatasetAssociation' object has no attribute 'elements'
DEBUG    galaxy.tools.execute:execute.py:361 Created 1 job(s) for tool __SPLIT_PAIRED_AND_UNPAIRED__ request (44.946 ms)

in https://github.com/galaxyproject/galaxy/actions/runs/23736910319/job/69143831597

@jmchilton
Copy link
Copy Markdown
Member

I would have assumed we were running these tool tests as part of the framework tests. That sucks - I'm sorry about the bug.

@ernestprovo23
Copy link
Copy Markdown
Author

Hi — the three CI failures look unrelated to this change:

  1. Test (3.10) — 5 test_trs_import.py tests failing on WorkflowHub TRS imports (timeout/import issues)
  2. Test (3.10, 1) — Selenium test_refresh_preserves_state assertion failure (UI flake)
  3. Test (3.10, 3)test_data_manager_workflow_bundle invocation timeout

None of these touch the SplitPairedAndUnpairedTool fix in lib/galaxy/tools/__init__.py. Happy to re-run CI if that would help.

@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Mar 31, 2026

Per @jmchilton, the tool test should be invalid, we want the extra layer "unpaired" layer.

ernestprovo23 and others added 6 commits April 7, 2026 17:34
The list:paired_or_unpaired branch in produce_outputs used
history_content_type == "dataset" to distinguish paired from unpaired
elements. This check is always False because even singletons are
wrapped in sub-collections, so all elements were routed to
_handle_paired which crashes on elements[1] for 1-element collections.

Replace the history_content_type check with element count inspection
on the sub-collection. Also unwrap singletons in _handle_unpaired
since the assert there fails for the same reason.

Fixes galaxyproject#22204
Replace defensive getattr check with proper assertions: verify
the inner element identifier matches SINGLETON_IDENTIFIER and that
the unwrapped object is a dataset.
Test splitting a list:paired_or_unpaired collection with mixed paired
and unpaired elements, verifying both output collections receive the
correct elements.
In list:paired_or_unpaired collections, unpaired elements may be raw
HDAs rather than wrapped in sub-collections. Use getattr to safely
check for elements before dispatching.
For nested collection types like list:paired_or_unpaired and list:paired,
verify in the manager that every element value is a DatasetCollection
whose collection_type matches the expected sub-collection type. Otherwise
the API silently accepted raw HDAs as children, producing collections
that crashed downstream tools (e.g. SplitPairedAndUnpairedTool).

The third tool test in split_paired_and_unpaired.xml relied on the
unvalidated path: el2/el3 were raw datasets rather than paired_or_unpaired
sub-collections wrapping a singleton "unpaired" element. Update the test
to use the canonical wrapped form so it passes against the new validator.
Now that the dataset collection manager rejects list:paired_or_unpaired
collections containing raw HDAs, every element_object reaching the
dispatch loop is guaranteed to be a paired_or_unpaired sub-collection.
Drop the defensive getattr fallback added in 07832df.
@mvdbeek mvdbeek force-pushed the fix/issue-22204-split-paired-unpaired branch from 07832df to ecd7c36 Compare April 7, 2026 15:35
@mvdbeek mvdbeek changed the base branch from dev to release_26.0 April 7, 2026 15:36
@mvdbeek mvdbeek changed the title Fix IndexError in SplitPairedAndUnpairedTool for unpaired elements [26.0] Fix IndexError in SplitPairedAndUnpairedTool for unpaired elements Apr 7, 2026
@mvdbeek mvdbeek requested a review from jmchilton April 7, 2026 15:37
@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Apr 7, 2026

@jmchilton i've added validation that rejects constructing these invalid collections, let me know if that makes sense

The third tool test in split_paired_and_unpaired.xml wrapped el1 as a
plain `paired` sub-collection, producing a list:paired_or_unpaired whose
children had mixed types. The new strict validator correctly rejected
that as a structurally invalid input at test staging time. Update the
test XML so every child uses the canonical paired_or_unpaired form,
matching the workflow test in split_paired_and_unpaired.gxwf-tests.yml.

With the canonical form, SplitPairedAndUnpairedTool._handle_paired may
receive 2-element paired_or_unpaired sub-collections and previously
forwarded the verbatim copy (with the wider collection_type) into its
list:paired output, which would then fail the strict validator on output
creation. Normalize the copy to collection_type="paired" before handing
it to the output builder -- a 2-element paired_or_unpaired is
structurally identical to a paired (same forward/reverse identifiers).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tool-framework Backport stable Backport this to last stable released branch

Projects

Status: Needs Review

Development

Successfully merging this pull request may close these issues.

Exception caught while attempting to execute tool with id '__SPLIT_PAIRED_AND_UNPAIRED__':

3 participants