-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(optimizer): Native TopNRank optimization #24138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
b45c503 to
ff8a3b5
Compare
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just a couple questions
| @JsonProperty("rankingType") RankingFunction rankingFunction, | ||
| @JsonProperty("rowNumberVariable") VariableReferenceExpression rowNumberVariable, | ||
| @JsonProperty("maxRowCountPerPartition") int maxRowCountPerPartition, | ||
| @JsonProperty("maxRowCountPerPartition") int maxRankPerPartition, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the json property be changed too or is that not possible without breaking compat?
| // Write config file | ||
| String configProperties = format("discovery.uri=%s%n" + | ||
| "presto.version=testversion%n" + | ||
| "native-execution-enabled=true" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need a %n? I'm guessing that's to give it a newline but not sure
|
Thank you for the release note! Some nits of phrasing and formatting suggestions. |
a63a0e9 to
2118f4c
Compare
|
One more nit about the release note entry: we've automated the link to the PR so you don't have to add it manually anymore. |
35f18b2 to
4b96592
Compare
Summary: Design doc : https://docs.google.com/document/d/1WQfNigR9bVrbM-PqY7F0mswcetN_tdNahzD9ENye-Q0/edit?usp=sharing https://github.com/facebookincubator/velox/discussions/9404 e2e Presto PR (with changes in the Presto optimizer as well) prestodb/presto#24138 Latency for SF1K TPC-DS Q67 fell from 399s to 146s with this change. Pull Request resolved: #14141 Reviewed By: pratikpugalia Differential Revision: D85763891 Pulled By: kevinwilfong fbshipit-source-id: 122bc6d60a4cfbf90c7921ebb1df28ef6edcc0a0
Reviewer's GuideThis PR extends the existing TopNRowNumber optimization to support rank and dense_rank window functions when native execution is enabled. The window filter pushdown logic is generalized, a new RankingFunction enum is added to TopNRowNumberNode (Java and protocol layers), configuration properties and session flags for optimize-top-n-rank are introduced, and the Velox plan converter and serialization code are updated accordingly. New tests cover both limit- and filter-based pushdowns for rank and dense_rank. Sequence diagram for TopNRank optimization in WindowFilterPushDownsequenceDiagram
participant "WindowFilterPushDown"
participant "WindowNode"
participant "TopNRowNumberNode"
participant "Session"
"WindowFilterPushDown"->>"WindowNode": Check if canOptimizeRankFunction
"WindowFilterPushDown"->>"Session": Check isOptimizeTopNRank & isNativeExecutionEnabled
alt If conditions met
"WindowFilterPushDown"->>"TopNRowNumberNode": convertToTopNRank(windowNode, limit)
end
Class diagram for updated TopNRowNumberNode with RankingFunctionclassDiagram
class TopNRowNumberNode {
+PlanNode source
+DataOrganizationSpecification specification
+RankingFunction rankingFunction
+VariableReferenceExpression rowNumberVariable
+int maxRowCountPerPartition
+boolean partial
+Optional<VariableReferenceExpression> hashVariable
+RankingFunction getRankingFunction()
}
class RankingFunction {
<<enum>>
ROW_NUMBER
RANK
DENSE_RANK
}
TopNRowNumberNode --> RankingFunction
Class diagram for FeaturesConfig and SystemSessionProperties changesclassDiagram
class FeaturesConfig {
+boolean optimizeTopNRowNumber
+boolean optimizeTopNRank
+boolean isOptimizeTopNRowNumber()
+boolean isOptimizeTopNRank()
+FeaturesConfig setOptimizeTopNRowNumber(boolean)
+FeaturesConfig setOptimizeTopNRank(boolean)
}
class SystemSessionProperties {
+static final String OPTIMIZE_TOP_N_ROW_NUMBER
+static final String OPTIMIZE_TOP_N_RANK
+static boolean isOptimizeTopNRowNumber(Session)
+static boolean isOptimizeTopNRank(Session)
}
SystemSessionProperties ..> FeaturesConfig
Class diagram for protocol TopNRowNumberNode and RankingFunctionclassDiagram
class TopNRowNumberNode {
+shared_ptr<PlanNode> source
+DataOrganizationSpecification specification
+RankingFunction rankingType
+VariableReferenceExpression rowNumberVariable
+int maxRowCountPerPartition
+bool partial
}
class RankingFunction {
<<enum>>
ROW_NUMBER
RANK
DENSE_RANK
}
TopNRowNumberNode --> RankingFunction
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Refactor WindowFilterPushDown's visitLimit and visitFilter conditionals to extract the common logic for row_number/rank, reducing duplication and improving readability.
- Merge convertToTopNRowNumber and convertToTopNRank into a single method parametrized by RankingFunction to avoid near-duplicate code paths.
- Use NLOHMANN_JSON_SERIALIZE_ENUM for RankingFunction in C++ to eliminate manual to_json/from_json boilerplate.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Refactor WindowFilterPushDown's visitLimit and visitFilter conditionals to extract the common logic for row_number/rank, reducing duplication and improving readability.
- Merge convertToTopNRowNumber and convertToTopNRank into a single method parametrized by RankingFunction to avoid near-duplicate code paths.
- Use NLOHMANN_JSON_SERIALIZE_ENUM for RankingFunction in C++ to eliminate manual to_json/from_json boilerplate.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/WindowFilterPushDown.java:158-159` </location>
<code_context>
source = rowNumberNode;
}
- else if (source instanceof WindowNode && canOptimizeWindowFunction((WindowNode) source, metadata.getFunctionAndTypeManager()) && isOptimizeTopNRowNumber(session)) {
+ else if (source instanceof WindowNode && canOptimizeRowNumberFunction((WindowNode) source, metadata.getFunctionAndTypeManager()) && isOptimizeTopNRowNumber(session) ||
+ source instanceof WindowNode && canOptimizeRankFunction((WindowNode) source, metadata.getFunctionAndTypeManager()) && isOptimizeTopNRank(session) && isNativeExecutionEnabled(session)) {
WindowNode windowNode = (WindowNode) source;
// verify that unordered row_number window functions are replaced by RowNumberNode
</code_context>
<issue_to_address>
**issue:** Operator precedence may lead to unexpected logic in conditional statement.
Add parentheses to clarify how '||' and '&&' are grouped in the condition, ensuring the intended logic and correct short-circuit behavior.
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/WindowFilterPushDown.java:163` </location>
<code_context>
// verify that unordered row_number window functions are replaced by RowNumberNode
verify(windowNode.getOrderingScheme().isPresent());
- TopNRowNumberNode topNRowNumberNode = convertToTopNRowNumber(windowNode, limit);
+ TopNRowNumberNode topNRowNumberNode = convertToTopNRank(windowNode, limit);
if (windowNode.getPartitionBy().isEmpty()) {
return topNRowNumberNode;
</code_context>
<issue_to_address>
**suggestion:** Method name 'convertToTopNRank' may be misleading for DENSE_RANK support.
Since the method handles both RANK and DENSE_RANK, a more general name like 'convertToTopNRanking' would improve clarity.
Suggested implementation:
```java
TopNRowNumberNode topNRowNumberNode = convertToTopNRanking(windowNode, limit);
```
You will also need to:
1. Rename the actual method definition from `convertToTopNRank` to `convertToTopNRanking` elsewhere in the codebase (likely in this file or a related class).
2. Update all other usages of `convertToTopNRank` to `convertToTopNRanking` to maintain consistency.
</issue_to_address>
### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/WindowFilterPushDown.java:299-304` </location>
<code_context>
+ {
+
+ String windowFunctionName = Iterables.getOnlyElement(windowNode.getWindowFunctions().values()).getFunctionCall().getFunctionHandle().getName();
+ TopNRowNumberNode.RankingFunction rankingFunction =
+ windowFunctionName.equals("row_number") ?
+ TopNRowNumberNode.RankingFunction.ROW_NUMBER :
+ windowFunctionName.equals("rank") ?
+ TopNRowNumberNode.RankingFunction.RANK : TopNRowNumberNode.RankingFunction.DENSE_RANK;
+
+ return new TopNRowNumberNode(
</code_context>
<issue_to_address>
**suggestion:** Ranking function selection logic may not handle unexpected function names robustly.
Explicitly handle or validate unexpected function names to prevent incorrect defaulting to DENSE_RANK.
```suggestion
String windowFunctionName = Iterables.getOnlyElement(windowNode.getWindowFunctions().values()).getFunctionCall().getFunctionHandle().getName();
TopNRowNumberNode.RankingFunction rankingFunction;
switch (windowFunctionName) {
case "row_number":
rankingFunction = TopNRowNumberNode.RankingFunction.ROW_NUMBER;
break;
case "rank":
rankingFunction = TopNRowNumberNode.RankingFunction.RANK;
break;
case "dense_rank":
rankingFunction = TopNRowNumberNode.RankingFunction.DENSE_RANK;
break;
default:
throw new IllegalArgumentException("Unsupported window function for TopNRowNumberNode: " + windowFunctionName);
}
```
</issue_to_address>
### Comment 4
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/WindowFilterPushDown.java:332-338` </location>
<code_context>
return isRowNumberMetadata(functionAndTypeManager, functionAndTypeManager.getFunctionMetadata(node.getWindowFunctions().get(rowNumberVariable).getFunctionHandle()));
}
+ private static boolean canOptimizeRankFunction(WindowNode node, FunctionAndTypeManager functionAndTypeManager)
+ {
+ if (node.getWindowFunctions().size() != 1) {
+ return false;
+ }
+ VariableReferenceExpression rowNumberVariable = getOnlyElement(node.getWindowFunctions().keySet());
+ return isRankMetadata(functionAndTypeManager, functionAndTypeManager.getFunctionMetadata(node.getWindowFunctions().get(rowNumberVariable).getFunctionHandle()));
+ }
+
</code_context>
<issue_to_address>
**suggestion:** Method name 'canOptimizeRankFunction' may not reflect DENSE_RANK support.
Consider renaming the method to 'canOptimizeRankingFunction' to accurately reflect its support for both RANK and DENSE_RANK functions.
Suggested implementation:
```java
private static boolean canOptimizeRankingFunction(WindowNode node, FunctionAndTypeManager functionAndTypeManager)
{
if (node.getWindowFunctions().size() != 1) {
return false;
}
VariableReferenceExpression rowNumberVariable = getOnlyElement(node.getWindowFunctions().keySet());
return isRankMetadata(functionAndTypeManager, functionAndTypeManager.getFunctionMetadata(node.getWindowFunctions().get(rowNumberVariable).getFunctionHandle()));
}
```
If there are any calls to `canOptimizeRankFunction` elsewhere in this file (or in related files), they should also be renamed to `canOptimizeRankingFunction` to match the new method name.
</issue_to_address>
### Comment 5
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TopNRowNumberNode.java:38-43` </location>
<code_context>
public final class TopNRowNumberNode
extends InternalPlanNode
{
+ public enum RankingFunction
+ {
+ ROW_NUMBER,
+ RANK,
+ DENSE_RANK
+ }
+
</code_context>
<issue_to_address>
**suggestion:** RankingFunction enum lacks explicit serialization values.
Explicitly assign string values to each enum constant to ensure stable serialization, especially if enum names change during refactoring.
```suggestion
public enum RankingFunction
{
ROW_NUMBER("row_number"),
RANK("rank"),
DENSE_RANK("dense_rank");
private final String serializationValue;
RankingFunction(String serializationValue)
{
this.serializationValue = serializationValue;
}
public String getSerializationValue()
{
return serializationValue;
}
@Override
public String toString()
{
return serializationValue;
}
public static RankingFunction fromSerializationValue(String value)
{
for (RankingFunction function : RankingFunction.values()) {
if (function.serializationValue.equals(value)) {
return function;
}
}
throw new IllegalArgumentException("Unknown serialization value: " + value);
}
}
```
</issue_to_address>
### Comment 6
<location> `presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp:11612` </location>
<code_context>
+// Loosly copied this here from NLOHMANN_JSON_SERIALIZE_ENUM()
</code_context>
<issue_to_address>
**nitpick (typo):** Typo in comment: 'Loosly' should be 'Loosely'.
```suggestion
// Loosely copied this here from NLOHMANN_JSON_SERIALIZE_ENUM()
```
</issue_to_address>
### Comment 7
<location> `presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp:1803` </location>
<code_context>
}
+namespace {
+core::TopNRowNumberNode::RankFunction prestoToVeloxRankFunction(
+ protocol::RankingFunction rankingFunction) {
+ switch (rankingFunction) {
</code_context>
<issue_to_address>
**issue (review_instructions):** The function 'prestoToVeloxRankFunction' uses PascalCase, but functions should use camelCase naming.
Please rename 'prestoToVeloxRankFunction' to 'prestoToVeloxRankFunction' using camelCase (e.g., 'prestoToVeloxRankFunction' -> 'prestoToVeloxRankFunction'). Functions should use camelCase per the naming conventions.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `presto-native-execution/**/*.hpp,presto-native-execution/**/*.hpp,presto-native-execution/**/*.cpp`
**Instructions:**
Use PascalCase for types (classes, structs, enums, type aliases, type template parameters) and file names.
</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...n-base/src/main/java/com/facebook/presto/sql/planner/optimizations/WindowFilterPushDown.java
Outdated
Show resolved
Hide resolved
...n-base/src/main/java/com/facebook/presto/sql/planner/optimizations/WindowFilterPushDown.java
Outdated
Show resolved
Hide resolved
...n-base/src/main/java/com/facebook/presto/sql/planner/optimizations/WindowFilterPushDown.java
Outdated
Show resolved
Hide resolved
...n-base/src/main/java/com/facebook/presto/sql/planner/optimizations/WindowFilterPushDown.java
Outdated
Show resolved
Hide resolved
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TopNRowNumberNode.java
Show resolved
Hide resolved
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp
Show resolved
Hide resolved
3319161 to
75d0257
Compare
5013a95 to
a3e9352
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Consider refactoring the repeated
source instanceof WindowNodeand session flag checks in WindowFilterPushDown into a shared helper predicate to reduce duplication and improve readability. - Instead of splitting the function name string in convertToTopNRank, leverage the FunctionHandle or metadata API directly to determine the ranking type for better robustness.
- Rename the boolean parameter
rowOrRankin the test helper methods to a more descriptive name or switch to an enum to make the test logic clearer and self-documenting.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider refactoring the repeated `source instanceof WindowNode` and session flag checks in WindowFilterPushDown into a shared helper predicate to reduce duplication and improve readability.
- Instead of splitting the function name string in convertToTopNRank, leverage the FunctionHandle or metadata API directly to determine the ranking type for better robustness.
- Rename the boolean parameter `rowOrRank` in the test helper methods to a more descriptive name or switch to an enum to make the test logic clearer and self-documenting.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TopNRowNumberNode.java:47` </location>
<code_context>
+
private final PlanNode source;
private final DataOrganizationSpecification specification;
+ private final RankingFunction rankingFunction;
private final VariableReferenceExpression rowNumberVariable;
private final int maxRowCountPerPartition;
</code_context>
<issue_to_address>
**suggestion (bug_risk):** rankingFunction is a required field but not validated.
Add a null check or validation for rankingFunction in the constructor to prevent runtime errors from accidental null assignments.
Suggested implementation:
```java
private final RankingFunction rankingFunction;
```
```java
@JsonProperty("rankingType") RankingFunction rankingFunction,
```
```java
@JsonProperty("rowNumberVariable") VariableReferenceExpression rowNumberVariable,
@JsonProperty("maxRowCountPerPartition") int maxRowCountPerPartition,
@JsonProperty("partial") boolean partial,
@JsonProperty("hashVariable") Optional<VariableReferenceExpression> hashVariable)
{
this.rankingFunction = java.util.Objects.requireNonNull(rankingFunction, "rankingFunction is null");
```
</issue_to_address>
### Comment 2
<location> `presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp:11612` </location>
<code_context>
}
} // namespace facebook::presto::protocol
namespace facebook::presto::protocol {
+// Loosly copied this here from NLOHMANN_JSON_SERIALIZE_ENUM()
+
+// NOLINTNEXTLINE: cppcoreguidelines-avoid-c-arrays
</code_context>
<issue_to_address>
**nitpick (typo):** Typo in comment: 'Loosly' should be 'Loosely'.
```suggestion
// Loosely copied this here from NLOHMANN_JSON_SERIALIZE_ENUM()
```
</issue_to_address>
### Comment 3
<location> `presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp:1803` </location>
<code_context>
}
+namespace {
+core::TopNRowNumberNode::RankFunction prestoToVeloxRankFunction(
+ protocol::RankingFunction rankingFunction) {
+ switch (rankingFunction) {
</code_context>
<issue_to_address>
**issue (review_instructions):** The function 'prestoToVeloxRankFunction' uses PascalCase, but functions should use camelCase naming.
Please rename 'prestoToVeloxRankFunction' to 'prestoToVeloxRankFunction' using camelCase (e.g., 'prestoToVeloxRankFunction' -> 'prestoToVeloxRankFunction'). This will ensure consistency with the function naming convention.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `presto-native-execution/**/*.hpp,presto-native-execution/**/*.hpp,presto-native-execution/**/*.cpp`
**Instructions:**
Use PascalCase for types (classes, structs, enums, type aliases, type template parameters) and file names.
</details>
</issue_to_address>
### Comment 4
<location> `presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp:11615` </location>
<code_context>
+// Loosly copied this here from NLOHMANN_JSON_SERIALIZE_ENUM()
+
+// NOLINTNEXTLINE: cppcoreguidelines-avoid-c-arrays
+static const std::pair<RankingFunction, json> RankingFunction_enum_table[] =
+ { // NOLINT: cert-err58-cpp
+ {RankingFunction::ROW_NUMBER, "ROW_NUMBER"},
</code_context>
<issue_to_address>
**issue (review_instructions):** RankingFunction_enum_table is a static constant and should use kPascalCase naming, not snake_case.
Per the instructions, static constants should use the kPascalCase naming convention, e.g., kRankingFunctionEnumTable.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `presto-native-execution/**/*.hpp,presto-native-execution/**/*.cpp`
**Instructions:**
Use UPPER_SNAKE_CASE for macros.
</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TopNRowNumberNode.java
Show resolved
Hide resolved
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.cpp
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp
Show resolved
Hide resolved
18a6e17 to
8791f96
Compare
|
Thanks for the revised release note! Two things, the first is only a nit.
|
@steveburnett : Thanks for your review. Yes, we should add documentation for optimizer.optimize-top-n-rank. Though its an interesting one as its applicable only for native engine. So I was undecided if I should document it with th native engine session properties here or the optimizer ones which are usually applicable to both engines. I'm more inclined to add it with the native engine config. what is your suggestion ? |
I agree with you that I'm more inclined to add it with the native engine config, and I think the best thing would be to add doc for it to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties-session.rst. |
@steveburnett : Added documentation. Please review. |
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the documentation! Just a couple of nits of formatting.
hantangwangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditi-pandit thanks for adding the optimizer part. I've left some initial comments for discussion.
| VariableReferenceExpression rowNumberVariable = getOnlyElement(node.getWindowFunctions().keySet()); | ||
| return isRankMetadata(functionAndTypeManager, functionAndTypeManager.getFunctionMetadata(node.getWindowFunctions().get(rowNumberVariable).getFunctionHandle())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| VariableReferenceExpression rowNumberVariable = getOnlyElement(node.getWindowFunctions().keySet()); | |
| return isRankMetadata(functionAndTypeManager, functionAndTypeManager.getFunctionMetadata(node.getWindowFunctions().get(rowNumberVariable).getFunctionHandle())); | |
| return isRankMetadata(functionAndTypeManager, functionAndTypeManager.getFunctionMetadata(getOnlyElement(node.getWindowFunctions().values()).getFunctionHandle())); |
nit: it seems unnecessary to fetch the key first and then use it to get the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had copied this from canOptimizeRowNumberFunction logic. Have modified this code as well as the above function.
| if (windowNode.getPartitionBy().isEmpty()) { | ||
| return topNRowNumberNode; | ||
| if (canReplaceWithTopNRowNumber(windowNode)) { | ||
| TopNRowNumberNode topNRowNumberNode = convertToTopNRowNumber(windowNode, limit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| TopNRowNumberNode topNRowNumberNode = convertToTopNRowNumber(windowNode, limit); | |
| TopNRowNumberNode topNRowNumberNode = convertToTopNRank(windowNode, limit); |
Should we call convertToTopNRank here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a good catch... I intended to remove the original convertToTopNRowNumber and change convertToTopNRank function to convertToTopNRowNumber to match the naming everywhere.
Have fixed that.
| OptionalInt upperBound = extractUpperBound(tupleDomain, rowNumberVariable); | ||
|
|
||
| if (upperBound.isPresent()) { | ||
| source = convertToTopNRowNumber(windowNode, upperBound.getAsInt()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It seems that we can call convertToTopNRank here as well. And completely drop the method convertToTopNRowNumber?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
| if (windowNode.getPartitionBy().isEmpty()) { | ||
| return topNRowNumberNode; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very sure, but is it correct to simply remove the LimitNode for rank and dense_rank here? I'm not familiar with the actual implementation of TopNRowNumber for rank and dense_rank in Velox. But as I understand from the design doc, it's possible that they could return more rows than the specified limit. Is my understanding correct? PLMK if there are any errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I think we should only return directly if the RankingFunction is ROW_NUMBER
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. Yes, this is only applicable for row_number()
| assertPlanWithSession( | ||
| sql, | ||
| optimizeTopNRowNumber(true), | ||
| rowNumber ? optimizeTopNRowNumber(true) : optimizeTopNRank(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also assert on the Ranking function for the TopNRowNumberNode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaneja : That's a good idea. Though do you have any suggestions on how to cleanly do so ? An option could be to enhance the matcher classes. wdyt ?
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! Just one nit of formatting, everything else looks good.
| velox::core::NestedLoopJoinNode. | ||
|
|
||
| ``optimizer.optimize_top_n_rank`` | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
@hantangwangd @aaneja : Thanks for your review comments. Have addressed them. PTAL. |
hantangwangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aditi-pandit, thanks for the fixes. I'm seeing some test failures that appear to be related.
Description
https://docs.google.com/document/d/1WQfNigR9bVrbM-PqY7F0mswcetN_tdNahzD9ENye-Q0/edit?tab=t.0
Motivation and Context
TPC-DS Q67