Skip to content

Conversation

@mbasmanova
Copy link
Contributor

Summary:
When planning a join, the input is a join edge which specifies left-to-right (lr) and right-to-left (rl) fanouts.

Given join edge a -- b,

  • Left-to-right fanout says that for every row in 'a' there are lrFanout matching rows in 'b'.
  • Right-to-left fanout says that for every row in 'b' there are rlFanout matching rows in 'a'.

For simplicity, we'll use 'fanout' for left-to-right fanout and 'rlFanout' for right-to-left fanout.

The cardinality of an INNER join is fanout * |a| = rlFanout * |b|.

The cardinality of other types of joins can be derived as follows. In addition, for semi project joins, we can derive the selectivity of 'mark is true' filter referred to as mark.trueFraction.

The formulas below expect that join cardinality is computed as |a| * join.fanout. See Cost::resultCardinality() method.

  • Left

    • Each row from the left side (a) appears in the output at least once.
    • join.cardinality = |a| * std::max(1, fanout)
    • join.fanout = std::max(1, fanout)
  • LeftSemiProject

    • This join type is cardinality neutral. Each row from the left side (a) appears in the output exactly once. fanout >= 1 means that each row from the left has a match on the right, hence, ‘mark’ will be true for all rows.
    • join.cardinality = |a|
    • join.fanout = 1
    • mark.trueFraction = std::min(1, fanout)
  • LeftSemiFilter

    • Join fanout is the same as mark.trueFraction in LeftSemiProject.
    • join.cardinality = |a| * std::min(1, fanout)
    • join.fanout = std::min(1, fanout)
  • LeftAnti

    • fanout >= 1 means that each row from the left has a match on the right, hence the output of anti join is empty.
    • join.cardinality = |a| * std::max(0, 1 - fanout)
    • join.fanout = std::max(0, 1 - fanout)
  • Right

    • join.cardinality = |b| * std::max(1, rlFanout)
    • join.fanout = std::max(1, rlFanout) * |b| / |a|
  • RightSemiProject

    • join.cardinality = |b|
    • join.fanout = |b| / |a|
    • mark.trueFraction = std::min(1, rlFanout)
  • RightSemiFilter

    • join.cardinality = |b| * std::min(1, rlFanout)
    • join.fanout = std::min(1, rlFanout) * |b| / |a|
  • RightAnti

    • Not supported.
  • Full

    • The cardinality is at least max(|a|, |b|)
    • join.fanout = std::max(1, fanout, |b| / |a|)

When we flip join sides (Optimization::joinByHashRight), join input changes from ‘a’ to ‘b’ and cardinality formula becomes |b| * join.fanout. The cardinality of the flipped join must match the cardinality of the original join, hence, join.fanout needs to be adjusted. mark.trueFraction value should stay the same as well.

originalJoin.fanout * |a| = flippedJoin.fanout * |b|

  • Original Inner:

    • join.cardinality = |a| * fanout = |b| * rlFanout
    • join.fanout = fanout
  • Flipped Inner:

    • join.fanout = rlFanout
      --
  • Original Left:

    • join.cardinality = |a| * std::max(1, fanout)
    • join.fanout = std::max(1, fanout)
  • Flipped Right:

    • join.fanout = std::max(1, fanout) * |a| / |b|.
      --
  • Original LeftSemiProject:

    • join.fanout = 1
    • mark.trueFraction = std::min(1, fanout)
  • Flipped RightSemiProject:

    • join.fanout = |a| / |b|
      --
  • Original LeftSemiFilter:

    • join.cardinality = |a| * std::min(1, fanout)
    • join.fanout = std::min(1, fanout)
  • Flipped: RightSemiFilter

    • join.fanout = std::min(1, fanout) * |a| / |b|
      --
  • Original RightSemiProject

    • join.cardinality = |b|
    • join.fanout = |b| / |a|
    • mark.trueFraction = std::min(1, rlFanout)
  • Flipped LeftSemiProject

    • join.fanout = 1
      --
  • Original RightSemiFilter

    • join.cardinality = |b| * std::min(1, rlFanout)
    • join.fanout = std::min(1, rlFanout) * |b| / |a|
  • Flipped LeftSemiFilter

    • join.fanout = std::min(1, rlFanout)

Note:

  • std::max(1, rlFanout) * |b| / |a| != fanout

Example:

a has a foreign key to b:

  • |a| = 1000
  • |b| = 10
  • fanout = 1
  • rlFanout = 100
    --
  • std::max(1, rlFanout) * |b| / |a| = 1 * 10 / 1000 = 0.01
  • fanout = 1

Not the same.

Differential Revision: D87592401

@meta-codesync
Copy link

meta-codesync bot commented Nov 21, 2025

@mbasmanova has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87592401.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 21, 2025
@mbasmanova mbasmanova requested a review from oerling November 21, 2025 00:13
)

Summary:

After this change, the plan for TPC-H q16 is no longer optimal. It is RightSemiProject instead of Anti (or LeftSemiProject). Fixing this will require fixing calculation of join fanout and mark.trueFraction. For now, remove plan verification for q16.

Part of facebookincubator#635

Differential Revision: D87543673
Summary:
When planning a join, the input is a join edge which specifies left-to-right (lr) and right-to-left (rl) fanouts.

Given join edge a -- b, 

- Left-to-right fanout says that for every row in 'a' there are lrFanout matching rows in 'b'.
- Right-to-left fanout says that for every row in 'b' there are rlFanout matching rows in 'a'.

For simplicity, we'll use 'fanout' for left-to-right fanout and 'rlFanout' for right-to-left fanout.

The cardinality of an INNER join is fanout * |a| = rlFanout * |b|.

The cardinality of other types of joins can be derived as follows. In addition, for semi project joins, we can derive the selectivity of 'mark is true' filter referred to as mark.trueFraction.

The formulas below expect that join cardinality is computed as |a| * join.fanout. See Cost::resultCardinality() method. 

* Left
  * Each row from the left side (a) appears in the output at least once.
  * join.cardinality = |a| * std::max(1, fanout)
  * join.fanout = std::max(1, fanout)

* LeftSemiProject 
  * This join type is cardinality neutral. Each row from the left side (a) appears in the output exactly once. fanout >= 1 means that each row from the left has a match on the right, hence, ‘mark’ will be true for all rows.
  * join.cardinality = |a|
  * join.fanout = 1
  * mark.trueFraction = std::min(1, fanout)

* LeftSemiFilter 
  * Join fanout is the same as mark.trueFraction in LeftSemiProject.
  * join.cardinality = |a| * std::min(1, fanout)
  * join.fanout = std::min(1, fanout)

* LeftAnti
  * fanout >= 1 means that each row from the left has a match on the right, hence the output of anti join is empty.
  * join.cardinality = |a| * std::max(0, 1 - fanout)
  * join.fanout = std::max(0, 1 - fanout)

* Right
  *	join.cardinality = |b| * std::max(1, rlFanout)
  *	join.fanout = std::max(1, rlFanout) * |b| / |a|

* RightSemiProject
  * join.cardinality = |b|
  *	join.fanout = |b| / |a|
  *	mark.trueFraction = std::min(1, rlFanout)

* RightSemiFilter
  * join.cardinality = |b| * std::min(1, rlFanout)
  * join.fanout = std::min(1, rlFanout) * |b| / |a|

* RightAnti 
  *	Not supported.

* Full
	* The cardinality is at least max(|a|, |b|)
	* join.fanout = std::max(1, fanout, |b| / |a|)


When we flip join sides (Optimization::joinByHashRight), join input changes from ‘a’ to ‘b’ and cardinality formula becomes |b| * join.fanout. The cardinality of the flipped join must match the cardinality of the original join, hence, join.fanout needs to be adjusted. mark.trueFraction value should stay the same as well.

> originalJoin.fanout * |a| = flippedJoin.fanout * |b|

* Original Inner:
  * join.cardinality = |a| * fanout = |b| * rlFanout
  * join.fanout = fanout

* Flipped Inner:
  * join.fanout = rlFanout
--
* Original Left:
  * join.cardinality = |a| * std::max(1, fanout)
  * join.fanout = std::max(1, fanout)

* Flipped Right:
  * join.fanout = std::max(1, fanout) * |a| / |b|.
--
* Original LeftSemiProject:
  * join.fanout = 1
  * mark.trueFraction = std::min(1, fanout)

* Flipped RightSemiProject:
  * join.fanout = |a| / |b|
--
* Original LeftSemiFilter:
  * join.cardinality = |a| * std::min(1, fanout)
  * join.fanout = std::min(1, fanout)

* Flipped: RightSemiFilter
  * join.fanout = std::min(1, fanout) * |a| / |b|
--
* Original RightSemiProject
  * join.cardinality = |b|
  * join.fanout = |b| / |a|
  * mark.trueFraction = std::min(1, rlFanout)

* Flipped LeftSemiProject
  * join.fanout = 1
--
* Original RightSemiFilter
  * join.cardinality = |b| * std::min(1, rlFanout)
  * join.fanout = std::min(1, rlFanout) * |b| / |a|

* Flipped LeftSemiFilter
  * join.fanout = std::min(1, rlFanout) 


Note:

* std::max(1, rlFanout) * |b| / |a| != fanout

Example:

a has a foreign key to b:
*	|a| = 1000
*	|b| = 10
* fanout = 1
* rlFanout = 100
--
* std::max(1, rlFanout) * |b| / |a| = 1 * 10 / 1000 = 0.01
* fanout = 1

Not the same.

Differential Revision: D87592401
@meta-codesync
Copy link

meta-codesync bot commented Nov 21, 2025

This pull request has been merged in d8be619.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants