Skip to content

Conversation

@aditi-pandit
Copy link
Contributor

@aditi-pandit aditi-pandit commented Nov 25, 2024

Description

https://docs.google.com/document/d/1WQfNigR9bVrbM-PqY7F0mswcetN_tdNahzD9ENye-Q0/edit?tab=t.0

Motivation and Context

TPC-DS Q67

== RELEASE NOTES ==

General Changes
* Add Window filter pushdown in native engine for rank and dense_rank functions. Use session property `optimizer.optimize-top-n-rank` to enable the rewrite.

Copy link
Contributor

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just a couple questions

@JsonProperty("rankingType") RankingFunction rankingFunction,
@JsonProperty("rowNumberVariable") VariableReferenceExpression rowNumberVariable,
@JsonProperty("maxRowCountPerPartition") int maxRowCountPerPartition,
@JsonProperty("maxRowCountPerPartition") int maxRankPerPartition,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the json property be changed too or is that not possible without breaking compat?

// Write config file
String configProperties = format("discovery.uri=%s%n" +
"presto.version=testversion%n" +
"native-execution-enabled=true" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need a %n? I'm guessing that's to give it a newline but not sure

@steveburnett
Copy link
Contributor

Thank you for the release note! Some nits of phrasing and formatting suggestions.

== RELEASE NOTES ==

Prestissimo (Native Execution) Changes
* Improve Window filter pushdown in Prestissimo engine for ``rank`` and ``dense_rank`` functions. :pr:`24138`

@steveburnett
Copy link
Contributor

One more nit about the release note entry: we've automated the link to the PR so you don't have to add it manually anymore.

== RELEASE NOTES ==

Prestissimo (Native Execution) Changes
* Improve Window filter pushdown in Prestissimo engine for ``rank`` and ``dense_rank`` functions. 

@aditi-pandit aditi-pandit force-pushed the topn branch 2 times, most recently from 35f18b2 to 4b96592 Compare March 25, 2025 20:06
meta-codesync bot pushed a commit to facebookincubator/velox that referenced this pull request Oct 29, 2025
Summary:
Design doc : https://docs.google.com/document/d/1WQfNigR9bVrbM-PqY7F0mswcetN_tdNahzD9ENye-Q0/edit?usp=sharing

https://github.com/facebookincubator/velox/discussions/9404

e2e Presto PR (with changes in the Presto optimizer as well) prestodb/presto#24138

Latency for SF1K TPC-DS Q67 fell from 399s to 146s with this change.

Pull Request resolved: #14141

Reviewed By: pratikpugalia

Differential Revision: D85763891

Pulled By: kevinwilfong

fbshipit-source-id: 122bc6d60a4cfbf90c7921ebb1df28ef6edcc0a0
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 8, 2025

Reviewer's Guide

This PR extends the existing TopNRowNumber optimization to support rank and dense_rank window functions when native execution is enabled. The window filter pushdown logic is generalized, a new RankingFunction enum is added to TopNRowNumberNode (Java and protocol layers), configuration properties and session flags for optimize-top-n-rank are introduced, and the Velox plan converter and serialization code are updated accordingly. New tests cover both limit- and filter-based pushdowns for rank and dense_rank.

Sequence diagram for TopNRank optimization in WindowFilterPushDown

sequenceDiagram
    participant "WindowFilterPushDown"
    participant "WindowNode"
    participant "TopNRowNumberNode"
    participant "Session"
    "WindowFilterPushDown"->>"WindowNode": Check if canOptimizeRankFunction
    "WindowFilterPushDown"->>"Session": Check isOptimizeTopNRank & isNativeExecutionEnabled
    alt If conditions met
        "WindowFilterPushDown"->>"TopNRowNumberNode": convertToTopNRank(windowNode, limit)
    end
Loading

Class diagram for updated TopNRowNumberNode with RankingFunction

classDiagram
    class TopNRowNumberNode {
        +PlanNode source
        +DataOrganizationSpecification specification
        +RankingFunction rankingFunction
        +VariableReferenceExpression rowNumberVariable
        +int maxRowCountPerPartition
        +boolean partial
        +Optional<VariableReferenceExpression> hashVariable
        +RankingFunction getRankingFunction()
    }
    class RankingFunction {
        <<enum>>
        ROW_NUMBER
        RANK
        DENSE_RANK
    }
    TopNRowNumberNode --> RankingFunction
Loading

Class diagram for FeaturesConfig and SystemSessionProperties changes

classDiagram
    class FeaturesConfig {
        +boolean optimizeTopNRowNumber
        +boolean optimizeTopNRank
        +boolean isOptimizeTopNRowNumber()
        +boolean isOptimizeTopNRank()
        +FeaturesConfig setOptimizeTopNRowNumber(boolean)
        +FeaturesConfig setOptimizeTopNRank(boolean)
    }
    class SystemSessionProperties {
        +static final String OPTIMIZE_TOP_N_ROW_NUMBER
        +static final String OPTIMIZE_TOP_N_RANK
        +static boolean isOptimizeTopNRowNumber(Session)
        +static boolean isOptimizeTopNRank(Session)
    }
    SystemSessionProperties ..> FeaturesConfig
Loading

Class diagram for protocol TopNRowNumberNode and RankingFunction

classDiagram
    class TopNRowNumberNode {
        +shared_ptr<PlanNode> source
        +DataOrganizationSpecification specification
        +RankingFunction rankingType
        +VariableReferenceExpression rowNumberVariable
        +int maxRowCountPerPartition
        +bool partial
    }
    class RankingFunction {
        <<enum>>
        ROW_NUMBER
        RANK
        DENSE_RANK
    }
    TopNRowNumberNode --> RankingFunction
Loading

File-Level Changes

Change Details Files
Generalize WindowFilterPushDown to handle rank and dense_rank
  • Added canOptimizeRankFunction check alongside row number
  • Created convertToTopNRank for constructing TopNRowNumberNode with appropriate RankingFunction
  • Updated visitLimit and visitFilter to branch on optimizeTopNRank and native execution
WindowFilterPushDown.java
Enhance TopNRowNumberNode to carry RankingFunction
  • Added RankingFunction enum (ROW_NUMBER, RANK, DENSE_RANK)
  • Injected rankingFunction field into constructors, JSON annotations and getters
  • Updated GraphvizPrinter, replaceChildren, assignStatsEquivalentPlanNode, canonical plan, and various optimizer rules to pass rankingFunction
TopNRowNumberNode.java
GraphvizPrinter.java
CanonicalPlanGenerator.java
multiple planner rule classes
Support RankingFunction in protocol and Velox conversion
  • Defined RankingFunction enum with JSON to_json/from_json handlers
  • Extended presto_protocol_core.cpp/h with enum serialization
  • Mapped rankingType to Velox RankFunction in PrestoToVeloxQueryPlan
presto_protocol_core.*
PrestoToVeloxQueryPlan.cpp
Introduce optimize-top-n-rank configuration
  • Added optimizeTopNRank flag in FeaturesConfig and SystemSessionProperties
  • Exposed OPTIMIZE_TOP_N_RANK session property and isOptimizeTopNRank helper
  • Updated TestFeaturesConfig mappings
FeaturesConfig.java
SystemSessionProperties.java
TestFeaturesConfig.java
Add tests for rank/dense_rank pushdowns
  • Refactored TestWindowFilterPushDown to parameterize limit and filter scenarios
  • Extended AbstractTestNativeWindowQueries and TestPrestoNativeWindowQueries to validate TopN for rank and dense_rank
  • Added session helpers optimizeTopNRank and optimizeTopNRankWithoutNative in tests
TestWindowFilterPushDown.java
AbstractTestNativeWindowQueries.java
TestPrestoNativeWindowQueries.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Refactor WindowFilterPushDown's visitLimit and visitFilter conditionals to extract the common logic for row_number/rank, reducing duplication and improving readability.
  • Merge convertToTopNRowNumber and convertToTopNRank into a single method parametrized by RankingFunction to avoid near-duplicate code paths.
  • Use NLOHMANN_JSON_SERIALIZE_ENUM for RankingFunction in C++ to eliminate manual to_json/from_json boilerplate.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Refactor WindowFilterPushDown's visitLimit and visitFilter conditionals to extract the common logic for row_number/rank, reducing duplication and improving readability.
- Merge convertToTopNRowNumber and convertToTopNRank into a single method parametrized by RankingFunction to avoid near-duplicate code paths.
- Use NLOHMANN_JSON_SERIALIZE_ENUM for RankingFunction in C++ to eliminate manual to_json/from_json boilerplate.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/WindowFilterPushDown.java:158-159` </location>
<code_context>
                 source = rowNumberNode;
             }
-            else if (source instanceof WindowNode && canOptimizeWindowFunction((WindowNode) source, metadata.getFunctionAndTypeManager()) && isOptimizeTopNRowNumber(session)) {
+            else if (source instanceof WindowNode && canOptimizeRowNumberFunction((WindowNode) source, metadata.getFunctionAndTypeManager()) && isOptimizeTopNRowNumber(session) ||
+                    source instanceof WindowNode && canOptimizeRankFunction((WindowNode) source, metadata.getFunctionAndTypeManager()) && isOptimizeTopNRank(session) && isNativeExecutionEnabled(session)) {
                 WindowNode windowNode = (WindowNode) source;
                 // verify that unordered row_number window functions are replaced by RowNumberNode
</code_context>

<issue_to_address>
**issue:** Operator precedence may lead to unexpected logic in conditional statement.

Add parentheses to clarify how '||' and '&&' are grouped in the condition, ensuring the intended logic and correct short-circuit behavior.
</issue_to_address>

### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/WindowFilterPushDown.java:163` </location>
<code_context>
                 // verify that unordered row_number window functions are replaced by RowNumberNode
                 verify(windowNode.getOrderingScheme().isPresent());
-                TopNRowNumberNode topNRowNumberNode = convertToTopNRowNumber(windowNode, limit);
+                TopNRowNumberNode topNRowNumberNode = convertToTopNRank(windowNode, limit);
                 if (windowNode.getPartitionBy().isEmpty()) {
                     return topNRowNumberNode;
</code_context>

<issue_to_address>
**suggestion:** Method name 'convertToTopNRank' may be misleading for DENSE_RANK support.

Since the method handles both RANK and DENSE_RANK, a more general name like 'convertToTopNRanking' would improve clarity.

Suggested implementation:

```java
                TopNRowNumberNode topNRowNumberNode = convertToTopNRanking(windowNode, limit);

```

You will also need to:
1. Rename the actual method definition from `convertToTopNRank` to `convertToTopNRanking` elsewhere in the codebase (likely in this file or a related class).
2. Update all other usages of `convertToTopNRank` to `convertToTopNRanking` to maintain consistency.
</issue_to_address>

### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/WindowFilterPushDown.java:299-304` </location>
<code_context>
+        {
+
+            String windowFunctionName = Iterables.getOnlyElement(windowNode.getWindowFunctions().values()).getFunctionCall().getFunctionHandle().getName();
+            TopNRowNumberNode.RankingFunction rankingFunction =
+                    windowFunctionName.equals("row_number") ?
+                            TopNRowNumberNode.RankingFunction.ROW_NUMBER :
+                            windowFunctionName.equals("rank") ?
+                                    TopNRowNumberNode.RankingFunction.RANK : TopNRowNumberNode.RankingFunction.DENSE_RANK;
+
+            return new TopNRowNumberNode(
</code_context>

<issue_to_address>
**suggestion:** Ranking function selection logic may not handle unexpected function names robustly.

Explicitly handle or validate unexpected function names to prevent incorrect defaulting to DENSE_RANK.

```suggestion
            String windowFunctionName = Iterables.getOnlyElement(windowNode.getWindowFunctions().values()).getFunctionCall().getFunctionHandle().getName();
            TopNRowNumberNode.RankingFunction rankingFunction;
            switch (windowFunctionName) {
                case "row_number":
                    rankingFunction = TopNRowNumberNode.RankingFunction.ROW_NUMBER;
                    break;
                case "rank":
                    rankingFunction = TopNRowNumberNode.RankingFunction.RANK;
                    break;
                case "dense_rank":
                    rankingFunction = TopNRowNumberNode.RankingFunction.DENSE_RANK;
                    break;
                default:
                    throw new IllegalArgumentException("Unsupported window function for TopNRowNumberNode: " + windowFunctionName);
            }
```
</issue_to_address>

### Comment 4
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/WindowFilterPushDown.java:332-338` </location>
<code_context>
             return isRowNumberMetadata(functionAndTypeManager, functionAndTypeManager.getFunctionMetadata(node.getWindowFunctions().get(rowNumberVariable).getFunctionHandle()));
         }

+        private static boolean canOptimizeRankFunction(WindowNode node, FunctionAndTypeManager functionAndTypeManager)
+        {
+            if (node.getWindowFunctions().size() != 1) {
+                return false;
+            }
+            VariableReferenceExpression rowNumberVariable = getOnlyElement(node.getWindowFunctions().keySet());
+            return isRankMetadata(functionAndTypeManager, functionAndTypeManager.getFunctionMetadata(node.getWindowFunctions().get(rowNumberVariable).getFunctionHandle()));
+        }
+
</code_context>

<issue_to_address>
**suggestion:** Method name 'canOptimizeRankFunction' may not reflect DENSE_RANK support.

Consider renaming the method to 'canOptimizeRankingFunction' to accurately reflect its support for both RANK and DENSE_RANK functions.

Suggested implementation:

```java
        private static boolean canOptimizeRankingFunction(WindowNode node, FunctionAndTypeManager functionAndTypeManager)
        {
            if (node.getWindowFunctions().size() != 1) {
                return false;
            }
            VariableReferenceExpression rowNumberVariable = getOnlyElement(node.getWindowFunctions().keySet());
            return isRankMetadata(functionAndTypeManager, functionAndTypeManager.getFunctionMetadata(node.getWindowFunctions().get(rowNumberVariable).getFunctionHandle()));
        }

```

If there are any calls to `canOptimizeRankFunction` elsewhere in this file (or in related files), they should also be renamed to `canOptimizeRankingFunction` to match the new method name.
</issue_to_address>

### Comment 5
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TopNRowNumberNode.java:38-43` </location>
<code_context>
 public final class TopNRowNumberNode
         extends InternalPlanNode
 {
+    public enum RankingFunction
+    {
+        ROW_NUMBER,
+        RANK,
+        DENSE_RANK
+    }
+
</code_context>

<issue_to_address>
**suggestion:** RankingFunction enum lacks explicit serialization values.

Explicitly assign string values to each enum constant to ensure stable serialization, especially if enum names change during refactoring.

```suggestion
    public enum RankingFunction
    {
        ROW_NUMBER("row_number"),
        RANK("rank"),
        DENSE_RANK("dense_rank");

        private final String serializationValue;

        RankingFunction(String serializationValue)
        {
            this.serializationValue = serializationValue;
        }

        public String getSerializationValue()
        {
            return serializationValue;
        }

        @Override
        public String toString()
        {
            return serializationValue;
        }

        public static RankingFunction fromSerializationValue(String value)
        {
            for (RankingFunction function : RankingFunction.values()) {
                if (function.serializationValue.equals(value)) {
                    return function;
                }
            }
            throw new IllegalArgumentException("Unknown serialization value: " + value);
        }
    }
```
</issue_to_address>

### Comment 6
<location> `presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp:11612` </location>
<code_context>
+// Loosly copied this here from NLOHMANN_JSON_SERIALIZE_ENUM()
</code_context>

<issue_to_address>
**nitpick (typo):** Typo in comment: 'Loosly' should be 'Loosely'.

```suggestion
// Loosely copied this here from NLOHMANN_JSON_SERIALIZE_ENUM()
```
</issue_to_address>

### Comment 7
<location> `presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp:1803` </location>
<code_context>
 }

+namespace {
+core::TopNRowNumberNode::RankFunction prestoToVeloxRankFunction(
+    protocol::RankingFunction rankingFunction) {
+  switch (rankingFunction) {
</code_context>

<issue_to_address>
**issue (review_instructions):** The function 'prestoToVeloxRankFunction' uses PascalCase, but functions should use camelCase naming.

Please rename 'prestoToVeloxRankFunction' to 'prestoToVeloxRankFunction' using camelCase (e.g., 'prestoToVeloxRankFunction' -> 'prestoToVeloxRankFunction'). Functions should use camelCase per the naming conventions.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `presto-native-execution/**/*.hpp,presto-native-execution/**/*.hpp,presto-native-execution/**/*.cpp`

**Instructions:**
Use PascalCase for types (classes, structs, enums, type aliases, type template parameters) and file names.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@aditi-pandit aditi-pandit force-pushed the topn branch 2 times, most recently from 3319161 to 75d0257 Compare November 8, 2025 20:23
@aditi-pandit aditi-pandit marked this pull request as draft November 8, 2025 21:24
@aditi-pandit aditi-pandit force-pushed the topn branch 3 times, most recently from 5013a95 to a3e9352 Compare November 8, 2025 23:57
@aditi-pandit aditi-pandit marked this pull request as ready for review November 10, 2025 06:42
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider refactoring the repeated source instanceof WindowNode and session flag checks in WindowFilterPushDown into a shared helper predicate to reduce duplication and improve readability.
  • Instead of splitting the function name string in convertToTopNRank, leverage the FunctionHandle or metadata API directly to determine the ranking type for better robustness.
  • Rename the boolean parameter rowOrRank in the test helper methods to a more descriptive name or switch to an enum to make the test logic clearer and self-documenting.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider refactoring the repeated `source instanceof WindowNode` and session flag checks in WindowFilterPushDown into a shared helper predicate to reduce duplication and improve readability.
- Instead of splitting the function name string in convertToTopNRank, leverage the FunctionHandle or metadata API directly to determine the ranking type for better robustness.
- Rename the boolean parameter `rowOrRank` in the test helper methods to a more descriptive name or switch to an enum to make the test logic clearer and self-documenting.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TopNRowNumberNode.java:47` </location>
<code_context>
+
     private final PlanNode source;
     private final DataOrganizationSpecification specification;
+    private final RankingFunction rankingFunction;
     private final VariableReferenceExpression rowNumberVariable;
     private final int maxRowCountPerPartition;
</code_context>

<issue_to_address>
**suggestion (bug_risk):** rankingFunction is a required field but not validated.

Add a null check or validation for rankingFunction in the constructor to prevent runtime errors from accidental null assignments.

Suggested implementation:

```java
    private final RankingFunction rankingFunction;

```

```java
            @JsonProperty("rankingType") RankingFunction rankingFunction,

```

```java
            @JsonProperty("rowNumberVariable") VariableReferenceExpression rowNumberVariable,
            @JsonProperty("maxRowCountPerPartition") int maxRowCountPerPartition,
            @JsonProperty("partial") boolean partial,
            @JsonProperty("hashVariable") Optional<VariableReferenceExpression> hashVariable)
    {
        this.rankingFunction = java.util.Objects.requireNonNull(rankingFunction, "rankingFunction is null");

```
</issue_to_address>

### Comment 2
<location> `presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp:11612` </location>
<code_context>
 }
 } // namespace facebook::presto::protocol
 namespace facebook::presto::protocol {
+// Loosly copied this here from NLOHMANN_JSON_SERIALIZE_ENUM()
+
+// NOLINTNEXTLINE: cppcoreguidelines-avoid-c-arrays
</code_context>

<issue_to_address>
**nitpick (typo):** Typo in comment: 'Loosly' should be 'Loosely'.

```suggestion
// Loosely copied this here from NLOHMANN_JSON_SERIALIZE_ENUM()
```
</issue_to_address>

### Comment 3
<location> `presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp:1803` </location>
<code_context>
 }

+namespace {
+core::TopNRowNumberNode::RankFunction prestoToVeloxRankFunction(
+    protocol::RankingFunction rankingFunction) {
+  switch (rankingFunction) {
</code_context>

<issue_to_address>
**issue (review_instructions):** The function 'prestoToVeloxRankFunction' uses PascalCase, but functions should use camelCase naming.

Please rename 'prestoToVeloxRankFunction' to 'prestoToVeloxRankFunction' using camelCase (e.g., 'prestoToVeloxRankFunction' -> 'prestoToVeloxRankFunction'). This will ensure consistency with the function naming convention.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `presto-native-execution/**/*.hpp,presto-native-execution/**/*.hpp,presto-native-execution/**/*.cpp`

**Instructions:**
Use PascalCase for types (classes, structs, enums, type aliases, type template parameters) and file names.

</details>
</issue_to_address>

### Comment 4
<location> `presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp:11615` </location>
<code_context>
+// Loosly copied this here from NLOHMANN_JSON_SERIALIZE_ENUM()
+
+// NOLINTNEXTLINE: cppcoreguidelines-avoid-c-arrays
+static const std::pair<RankingFunction, json> RankingFunction_enum_table[] =
+    { // NOLINT: cert-err58-cpp
+        {RankingFunction::ROW_NUMBER, "ROW_NUMBER"},
</code_context>

<issue_to_address>
**issue (review_instructions):** RankingFunction_enum_table is a static constant and should use kPascalCase naming, not snake_case.

Per the instructions, static constants should use the kPascalCase naming convention, e.g., kRankingFunctionEnumTable.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `presto-native-execution/**/*.hpp,presto-native-execution/**/*.cpp`

**Instructions:**
Use UPPER_SNAKE_CASE for macros.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@aditi-pandit aditi-pandit force-pushed the topn branch 2 times, most recently from 18a6e17 to 8791f96 Compare November 12, 2025 08:08
@steveburnett
Copy link
Contributor

Thanks for the revised release note! Two things, the first is only a nit.

  1. The PR link ( :pr:24138 ) is now automatically added and we don't have to manually include it anymore.

  2. I don't find optimizer.optimize-top-n-rank in the Presto documentation. Could doc be added for it?

@aditi-pandit
Copy link
Contributor Author

Thanks for the revised release note! Two things, the first is only a nit.

  1. The PR link ( :pr:24138 ) is now automatically added and we don't have to manually include it anymore.
  2. I don't find optimizer.optimize-top-n-rank in the Presto documentation. Could doc be added for it?

@steveburnett : Thanks for your review. Yes, we should add documentation for optimizer.optimize-top-n-rank.

Though its an interesting one as its applicable only for native engine. So I was undecided if I should document it with th native engine session properties here or the optimizer ones which are usually applicable to both engines. I'm more inclined to add it with the native engine config. what is your suggestion ?

@steveburnett
Copy link
Contributor

@steveburnett : Thanks for your review. Yes, we should add documentation for optimizer.optimize-top-n-rank.

Though its an interesting one as its applicable only for native engine. So I was undecided if I should document it with th native engine session properties here or the optimizer ones which are usually applicable to both engines. I'm more inclined to add it with the native engine config. what is your suggestion ?

I agree with you that I'm more inclined to add it with the native engine config, and I think the best thing would be to add doc for it to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties-session.rst.

@aditi-pandit
Copy link
Contributor Author

I agree with you that I'm more inclined to add it with the native engine config, and I think the best thing would be to add doc for it to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties-session.rst.

@steveburnett : Added documentation. Please review.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the documentation! Just a couple of nits of formatting.

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit thanks for adding the optimizer part. I've left some initial comments for discussion.

Comment on lines 352 to 353
VariableReferenceExpression rowNumberVariable = getOnlyElement(node.getWindowFunctions().keySet());
return isRankMetadata(functionAndTypeManager, functionAndTypeManager.getFunctionMetadata(node.getWindowFunctions().get(rowNumberVariable).getFunctionHandle()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VariableReferenceExpression rowNumberVariable = getOnlyElement(node.getWindowFunctions().keySet());
return isRankMetadata(functionAndTypeManager, functionAndTypeManager.getFunctionMetadata(node.getWindowFunctions().get(rowNumberVariable).getFunctionHandle()));
return isRankMetadata(functionAndTypeManager, functionAndTypeManager.getFunctionMetadata(getOnlyElement(node.getWindowFunctions().values()).getFunctionHandle()));

nit: it seems unnecessary to fetch the key first and then use it to get the value.

Copy link
Contributor Author

@aditi-pandit aditi-pandit Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had copied this from canOptimizeRowNumberFunction logic. Have modified this code as well as the above function.

if (windowNode.getPartitionBy().isEmpty()) {
return topNRowNumberNode;
if (canReplaceWithTopNRowNumber(windowNode)) {
TopNRowNumberNode topNRowNumberNode = convertToTopNRowNumber(windowNode, limit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TopNRowNumberNode topNRowNumberNode = convertToTopNRowNumber(windowNode, limit);
TopNRowNumberNode topNRowNumberNode = convertToTopNRank(windowNode, limit);

Should we call convertToTopNRank here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a good catch... I intended to remove the original convertToTopNRowNumber and change convertToTopNRank function to convertToTopNRowNumber to match the naming everywhere.

Have fixed that.

OptionalInt upperBound = extractUpperBound(tupleDomain, rowNumberVariable);

if (upperBound.isPresent()) {
source = convertToTopNRowNumber(windowNode, upperBound.getAsInt());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It seems that we can call convertToTopNRank here as well. And completely drop the method convertToTopNRowNumber?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Comment on lines 168 to 170
if (windowNode.getPartitionBy().isEmpty()) {
return topNRowNumberNode;
}
Copy link
Member

@hantangwangd hantangwangd Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very sure, but is it correct to simply remove the LimitNode for rank and dense_rank here? I'm not familiar with the actual implementation of TopNRowNumber for rank and dense_rank in Velox. But as I understand from the design doc, it's possible that they could return more rows than the specified limit. Is my understanding correct? PLMK if there are any errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think we should only return directly if the RankingFunction is ROW_NUMBER

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Yes, this is only applicable for row_number()

assertPlanWithSession(
sql,
optimizeTopNRowNumber(true),
rowNumber ? optimizeTopNRowNumber(true) : optimizeTopNRank(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also assert on the Ranking function for the TopNRowNumberNode

Copy link
Contributor Author

@aditi-pandit aditi-pandit Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaneja : That's a good idea. Though do you have any suggestions on how to cleanly do so ? An option could be to enhance the matcher classes. wdyt ?

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! Just one nit of formatting, everything else looks good.

velox::core::NestedLoopJoinNode.

``optimizer.optimize_top_n_rank``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@aditi-pandit
Copy link
Contributor Author

@hantangwangd @aaneja : Thanks for your review comments. Have addressed them. PTAL.

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aditi-pandit, thanks for the fixes. I'm seeing some test failures that appear to be related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants