[26.0] Fix IndexError in SplitPairedAndUnpairedTool for unpaired elements#22282
[26.0] Fix IndexError in SplitPairedAndUnpairedTool for unpaired elements#22282ernestprovo23 wants to merge 7 commits intogalaxyproject:release_26.0from
Conversation
mvdbeek
left a comment
There was a problem hiding this comment.
Awesome, thanks so much @ernestprovo23, pushed a minor tweak and a test.
|
Ah, that is fun, the tool test specifies <collection type="list:paired_or_unpaired">
<element name="el1">
<collection type="paired">
<element name="forward" value="simple_line.txt" />
<element name="reverse" value="simple_line_alternative.txt" />
</collection>
</element>
<element name="el2" value="simple_line.txt">
</element>
<element name="el3" value="simple_line_alternative.txt">
</element>
</collection>i.e unpaired elements appear without the extra wrapping. @jmchilton what do you want to do here ? I was surprised about the extra |
|
I read the workflow test case here and that made good sense to me - is the tool XML layer doing extra magic here I guess to produce that same structure or is the tool XML producing a list structure with varying degrees of the depth in the tree. The Github interface is falling over for me on that original PR - it was too big - I'll need to research this more but my gut says you're right this should consistent. I'd be more anxious about actual inconsistent lists in the database - if the XML tool parser is just doing some magic during creation of these lists - that seems harmless but arguably too much magic. I've got a busy morning but I will try to return to this unless you want to keep working on it. |
|
The test interactor does create a in https://github.com/galaxyproject/galaxy/actions/runs/23736910319/job/69143831597 |
|
I would have assumed we were running these tool tests as part of the framework tests. That sucks - I'm sorry about the bug. |
|
Hi — the three CI failures look unrelated to this change:
None of these touch the |
|
Per @jmchilton, the tool test should be invalid, we want the extra layer "unpaired" layer. |
The list:paired_or_unpaired branch in produce_outputs used history_content_type == "dataset" to distinguish paired from unpaired elements. This check is always False because even singletons are wrapped in sub-collections, so all elements were routed to _handle_paired which crashes on elements[1] for 1-element collections. Replace the history_content_type check with element count inspection on the sub-collection. Also unwrap singletons in _handle_unpaired since the assert there fails for the same reason. Fixes galaxyproject#22204
Replace defensive getattr check with proper assertions: verify the inner element identifier matches SINGLETON_IDENTIFIER and that the unwrapped object is a dataset.
Test splitting a list:paired_or_unpaired collection with mixed paired and unpaired elements, verifying both output collections receive the correct elements.
In list:paired_or_unpaired collections, unpaired elements may be raw HDAs rather than wrapped in sub-collections. Use getattr to safely check for elements before dispatching.
For nested collection types like list:paired_or_unpaired and list:paired, verify in the manager that every element value is a DatasetCollection whose collection_type matches the expected sub-collection type. Otherwise the API silently accepted raw HDAs as children, producing collections that crashed downstream tools (e.g. SplitPairedAndUnpairedTool). The third tool test in split_paired_and_unpaired.xml relied on the unvalidated path: el2/el3 were raw datasets rather than paired_or_unpaired sub-collections wrapping a singleton "unpaired" element. Update the test to use the canonical wrapped form so it passes against the new validator.
Now that the dataset collection manager rejects list:paired_or_unpaired collections containing raw HDAs, every element_object reaching the dispatch loop is guaranteed to be a paired_or_unpaired sub-collection. Drop the defensive getattr fallback added in 07832df.
07832df to
ecd7c36
Compare
|
@jmchilton i've added validation that rejects constructing these invalid collections, let me know if that makes sense |
The third tool test in split_paired_and_unpaired.xml wrapped el1 as a plain `paired` sub-collection, producing a list:paired_or_unpaired whose children had mixed types. The new strict validator correctly rejected that as a structurally invalid input at test staging time. Update the test XML so every child uses the canonical paired_or_unpaired form, matching the workflow test in split_paired_and_unpaired.gxwf-tests.yml. With the canonical form, SplitPairedAndUnpairedTool._handle_paired may receive 2-element paired_or_unpaired sub-collections and previously forwarded the verbatim copy (with the wider collection_type) into its list:paired output, which would then fail the strict validator on output creation. Normalize the copy to collection_type="paired" before handing it to the output builder -- a 2-element paired_or_unpaired is structurally identical to a paired (same forward/reverse identifiers).
Summary
Fix
IndexError: list index out of rangewhen__SPLIT_PAIRED_AND_UNPAIRED__processes alist:paired_or_unpairedcollection containing unpaired singleton elements.Fixes #22204
Changes
In
SplitPairedAndUnpairedTool.produce_outputs()(lib/galaxy/tools/__init__.py):history_content_type == "dataset"check withlen(sub_collection.elements) == 1to distinguish unpaired singletons from paired elements inlist:paired_or_unpairedcollections_handle_unpairedsince the originalassert history_content_type == "dataset"also fails for wrapped singletonsRoot Cause
In a
list:paired_or_unpairedcollection, every element is aDatasetCollectionsub-collection — even unpaired singletons are wrapped in a 1-element sub-collection of typepaired_or_unpaired. Thehistory_content_type == "dataset"check was always False, routing all elements to_handle_paired, which crashes accessingelements[1]on 1-element sub-collections. Day-one bug since PR #19377.Credit to @dannon for the thorough triage that identified the root cause and fix direction.