[26.0] Fix get_structure when it is passed a DCE with nested elements#22453
Open
mvdbeek wants to merge 2 commits intogalaxyproject:release_26.0from
Open
[26.0] Fix get_structure when it is passed a DCE with nested elements#22453mvdbeek wants to merge 2 commits intogalaxyproject:release_26.0from
get_structure when it is passed a DCE with nested elements#22453mvdbeek wants to merge 2 commits intogalaxyproject:release_26.0from
Conversation
… input Add test_can_map_over_dce_from_larger_list_paired that reproduces the KeyError when mapping a DCE (from a list:paired with 3+ pairs) over a tool with a simple data input. The parent list collection structure was incorrectly used instead of the child paired collection, causing an index mismatch. Also update test_can_map_over_dce_on_non_multiple_data_param to expect 2 jobs (one per dataset in the pair) instead of 1. Fixes GALAXY-MAIN-4KSCZZZ0014G1
b698478 to
ba8ab68
Compare
DatasetCollectionElement has two collection references:
- collection: the parent collection this element belongs to
- child_collection: the nested collection this element contains
get_structure() builds a Tree describing a collection's shape. It
needs two things that must agree: a collection_type_description and
a DatasetCollection to enumerate elements from.
matching.for_collections() derived the type description from the
child collection (via get_child_collection), but get_structure()
unconditionally used instance.collection — which for a DCE is the
*parent*. This mismatch went unnoticed when both happened to have
compatible element counts, but crashed when they differed.
Bug scenario: given a list:paired with 3 pairs, mapping a single DCE
(one pair with forward+reverse) over a simple tool input:
get_structure(dce, "paired" desc, leaf_subcollection_type=None)
The old code did dce.collection → parent list:paired DC (3 elements),
building Tree("paired", 3 children). But the expansion used
dce.child_collection → paired DC (2 elements), creating 2 jobs.
precreate_dataset_collection iterated the 3 structure children and
accessed completed_collection[2], raising KeyError since only indices
0 and 1 existed.
Fix: add an explicit ``collection`` parameter to get_structure() so
callers that already resolved the contained collection (via
get_child_collection) can pass it directly. matching.for_collections
now passes the resolved collection, keeping type description and
element source in sync.
Also add get_collection() helper (like get_child_collection in
matching.py but with isinstance checks) and use it in
walk_collections() so the walked collection matches the structure.
ba8ab68 to
27ca320
Compare
Member
Author
|
This seems to make sense to me now, @jmchilton could you take a look ? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DatasetCollectionElement has two collection references:
get_structure() and walk_collections() unconditionally used .collection,
which for a DCE returns the parent (e.g. the full list:paired). Meanwhile
the job expansion correctly used child_collection (e.g. the paired
sub-collection). This mismatch caused a KeyError when the parent had more
elements than the child.
Bug scenario: given a list:paired with 3 pairs, mapping a single DCE
(one pair with forward+reverse) over a simple tool input:
get_structure(dce, "paired" desc, leaf_subcollection_type=None)
The old code did dce.collection → parent list:paired DC (3 elements),
building Tree("paired", 3 children). But the expansion used
dce.child_collection → paired DC (2 elements), creating 2 jobs.
precreate_dataset_collection iterated the 3 structure children and
accessed completed_collection[2], raising KeyError since only indices
0 and 1 existed.
Extract a _get_collection() helper that uses isinstance checks to return
child_collection for DCE and collection for HDCA/adapters, and use it in
both get_structure() and walk_collections().
Fixes #22450 and probably #22423 and #19554
Note how test_can_map_over_dce_on_non_multiple_data_param actually tested the broken behavior, this has been verifiably broken since #19900, though the broken behavior predates the PR. A pair mapped over a single input should create 2 jobs.
How to test the changes?
(Select all options that apply)
License