fix(optimizer): Fix merge_subqueries.py::rename_inner_sources() #4266
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #4245
Consider this minimum repro, for the optimizer rules
[qualify, eliminate_subqueries, merge_subqueries]
:After
eliminate_subqueries
, the derived tables have been lifted up as CTEs:The
merge_ctes
subrule ofmerge_subqueries
will attempt to merge each inner scope/CTE to the outer scope,which (ignoring the OptimizerError) would ideally generate the following:
To safely move parts from the inner scopes outwards the helper function
_rename_inner_sources()
is used, which attempts to rename the inner sources in case of name collisions. The bug in this procedure has to do with the fact that thetaken
set is built only from the outer sources. For the example above, this is what happens in the function call that leads to the OptimizerError:Notice how
itbl_2
was already an inner source but was incorrectly generated again to replaceITBL
as it wasn't a part oftaken
, leading to duplicate aliases.This PR solves this by extending the
taken
set to be the union of outer & inner selected sources.