Fix/qualify columns and support unpivot #4867
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix struct field lookup conversions for PIVOT/UNPIVOT aliases
Issue Description
The
_convert_columns_to_dots
function in the qualify module incorrectly converts column references to PIVOT/UNPIVOT aliases into Dot expressions. This causes incorrect query results when working with PIVOT/UNPIVOT operations.Changes Made
This PR improves the handling of struct field lookups by preventing incorrect conversion of PIVOT/UNPIVOT aliases to Dot expressions:
Added early return patterns to skip columns that don't need conversion:
Added PIVOT/UNPIVOT alias detection:
Preserved the original conversion logic for actual struct field lookups
Testing
Added tests to verify that column references to PIVOT/UNPIVOT aliases are correctly handled and not converted to Dot expressions.
Add comprehensive tests for sqlglot qualify functionality with UNPIVOT operations
This commit adds unit tests that verify the correct handling of column qualification in various UNPIVOT scenarios:
These tests ensure that the qualify transformation correctly qualifies column references
across different query structures while preserving the semantic meaning of the original SQL.
Performance Considerations
This change optimizes performance by:
The impact on large, complex queries with many nested scopes should be positive, as it reduces unnecessary scope traversal.
Dependency for Lineage Module Enhancement
This fix is a prerequisite for adding full UNPIVOT support in the lineage.py module. The current incorrect conversion of PIVOT/UNPIVOT aliases prevents proper column lineage tracking when working with UNPIVOT operations. With this fix in place, we can proceed with implementing comprehensive UNPIVOT support in the lineage module, ensuring accurate column mapping and dependency tracking through these transformations.
Future Work
Following the acceptance of this PR, I plan to submit another PR that will provide integrated lineage support for both PIVOT and UNPIVOT operations. This upcoming enhancement will enable complete column lineage tracking through these complex transformations, allowing users to accurately trace data flow and dependencies in queries that involve pivoting and unpivoting operations.