Skip to content

fix(optimizer): RewriteBucketedSemiJoinToJoin may produce incorrect results for NOT IN with NULLs #27643

@kaikalur

Description

@kaikalur

Summary

RewriteBucketedSemiJoinToJoin rewrites SemiJoin to LEFT JOIN + Project for bucketed tables, but the rewrite loses three-valued NULL semantics, potentially producing incorrect results for NOT IN queries when the subquery contains NULLs.

Root Cause

The rewrite computes semiJoinOutput as:

IF(l.key IS NULL, NULL, NOT(r.key IS NULL))

The original HashSemiJoinOperator returns NULL when the probe key does not match any build value AND the build set contains NULLs (standard SQL IN semantics: 5 IN (NULL, 1, 2, 3) = UNKNOWN).

The LEFT JOIN rewrite produces FALSE instead — equi-join never matches NULL, so r.key is always null-filled on non-match, yielding NOT(NULL IS NULL) = NOT(TRUE) = FALSE.

Impact

For NOT IN queries where semiJoinOutput is consumed by a bare NOT(S) filter:

5 NOT IN (NULL, 1, 2, 3) Original SemiJoin Rewritten LeftJoin
semiJoinOutput NULL FALSE
NOT(semiJoinOutput) NULL (falsy, row excluded) TRUE (row incorrectly included)

The NOT EXISTS pattern is safe because it uses NOT(COALESCE(S, false)) which treats both NULL and FALSE the same way.

Relevant Files

  • RewriteBucketedSemiJoinToJoin.java — lines 174-180 (the rewrite expression)
  • HashSemiJoinOperator.java — lines 160-184 (correct NULL semantics)
  • TestLogicalPlanner.java — lines 744-752 (NOT IN plan pattern showing bare NOT S)

Suggested Fix

Add a guard to RewriteBucketedSemiJoinToJoin that checks the semiJoinOutput is not consumed by a bare NOT() filter. Only fire when the output is used directly as a boolean filter (positive IN) or wrapped in COALESCE (NOT EXISTS).

Alternatively, handle build-side NULLs correctly by adding a NOT NULL constraint check on the filtering source join key column.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions