-
Notifications
You must be signed in to change notification settings - Fork 5.5k
fix: Add P4HyperLogLog to NativeTypeManager #26444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideRegisters the new P4HyperLogLog native type in the type manager and verifies its support through an end-to-end test using approx_set and cardinality. Class diagram for updated NativeTypeManager with P4HyperLogLog supportclassDiagram
class NativeTypeManager {
+DOUBLE
+SMALLINT
+HYPER_LOG_LOG
+P4_HYPER_LOG_LOG
+JSON
+TIMESTAMP_WITH_TIME_ZONE
+UUID
...
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
@pramodsatya : Do we have any tests in presto-native-tests for this ? Would be great to have some. |
@aditi-pandit, yes we will be adding |
35ef83d to
754d347
Compare
|
@aditi-pandit, @pdabre12, please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java:629-641` </location>
<code_context>
".*Scalar function name not registered: native.default.key_sampling_percent.*");
}
+ @Test
+ public void testP4HyperLogLogWithApproxSet()
+ {
+ ImmutableList<Type> types = ImmutableList.of(BIGINT, VARCHAR, DOUBLE);
+ ImmutableList<Long> expected = ImmutableList.of(1002L, 1024L, 1014L);
+ for (Type type : types) {
+ MaterializedResult actual = computeActual(String.format("SELECT cardinality(cast(approx_set(cast(custkey AS %s)) AS P4HYPERLOGLOG)) FROM orders", type.getTypeSignature().toString()));
+ MaterializedResult expectedResult = resultBuilder(getSession(), BIGINT)
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding negative and edge case tests for P4HyperLogLog support.
Please add tests for unsupported type casts, null values, empty sets, and very large inputs to cover error handling and edge cases.
```suggestion
@Test
public void testP4HyperLogLogWithApproxSet()
{
ImmutableList<Type> types = ImmutableList.of(BIGINT, VARCHAR, DOUBLE);
ImmutableList<Long> expected = ImmutableList.of(1002L, 1024L, 1014L);
for (Type type : types) {
MaterializedResult actual = computeActual(String.format("SELECT cardinality(cast(approx_set(cast(custkey AS %s)) AS P4HYPERLOGLOG)) FROM orders", type.getTypeSignature().toString()));
MaterializedResult expectedResult = resultBuilder(getSession(), BIGINT)
.row(expected.get(types.indexOf(type)))
.build();
assertEquals(actual.getMaterializedRows(), expectedResult.getMaterializedRows());
}
}
@Test
public void testP4HyperLogLogUnsupportedType()
{
// Try to use an unsupported type (e.g., ARRAY)
try {
computeActual("SELECT cardinality(cast(approx_set(cast(custkey AS ARRAY<BIGINT>)) AS P4HYPERLOGLOG)) FROM orders");
assertFalse(true, "Expected failure for unsupported type cast");
}
catch (Exception e) {
// Expected: function not supported for ARRAY
assertFalse(e.getMessage().isEmpty());
}
}
@Test
public void testP4HyperLogLogWithNullValues()
{
// Create a table with some nulls
computeActual("CREATE TABLE tmp_nulls AS SELECT custkey, IF (custkey % 10 = 0, NULL, custkey) AS nullable_custkey FROM orders");
MaterializedResult actual = computeActual("SELECT cardinality(cast(approx_set(nullable_custkey) AS P4HYPERLOGLOG)) FROM tmp_nulls");
// Should ignore nulls, cardinality should be less than or equal to original
MaterializedResult expected = computeActual("SELECT cardinality(cast(approx_set(custkey) AS P4HYPERLOGLOG)) FROM orders");
assertFalse(actual.getMaterializedRows().isEmpty());
assertFalse(expected.getMaterializedRows().isEmpty());
Long actualValue = (Long) actual.getMaterializedRows().get(0).getField(0);
Long expectedValue = (Long) expected.getMaterializedRows().get(0).getField(0);
assertFalse(actualValue > expectedValue);
computeActual("DROP TABLE tmp_nulls");
}
@Test
public void testP4HyperLogLogWithEmptySet()
{
// Create an empty table
computeActual("CREATE TABLE tmp_empty AS SELECT custkey FROM orders WHERE 1=0");
MaterializedResult actual = computeActual("SELECT cardinality(cast(approx_set(custkey) AS P4HYPERLOGLOG)) FROM tmp_empty");
MaterializedResult expected = resultBuilder(getSession(), BIGINT)
.row(0L)
.build();
assertEquals(actual.getMaterializedRows(), expected.getMaterializedRows());
computeActual("DROP TABLE tmp_empty");
}
@Test
public void testP4HyperLogLogWithLargeInput()
{
// Create a large table (simulate by duplicating orders)
computeActual("CREATE TABLE tmp_large AS SELECT custkey FROM orders UNION ALL SELECT custkey FROM orders UNION ALL SELECT custkey FROM orders");
MaterializedResult actual = computeActual("SELECT cardinality(cast(approx_set(custkey) AS P4HYPERLOGLOG)) FROM tmp_large");
// Cardinality should be similar to original orders, since keys are duplicated
MaterializedResult expected = computeActual("SELECT cardinality(cast(approx_set(custkey) AS P4HYPERLOGLOG)) FROM orders");
Long actualValue = (Long) actual.getMaterializedRows().get(0).getField(0);
Long expectedValue = (Long) expected.getMaterializedRows().get(0).getField(0);
// Should be equal since all keys are duplicated
assertEquals(actualValue, expectedValue);
computeActual("DROP TABLE tmp_large");
}
```
</issue_to_address>
### Comment 2
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java:639` </location>
<code_context>
+ {
+ ImmutableList<Type> types = ImmutableList.of(BIGINT, VARCHAR, DOUBLE);
+ ImmutableList<Long> expected = ImmutableList.of(1002L, 1024L, 1014L);
+ for (Type type : types) {
+ MaterializedResult actual = computeActual(String.format("SELECT cardinality(cast(approx_set(cast(custkey AS %s)) AS P4HYPERLOGLOG)) FROM orders", type.getTypeSignature().toString()));
+ MaterializedResult expectedResult = resultBuilder(getSession(), BIGINT)
+ .row(expected.get(types.indexOf(type)))
+ .build();
+ assertEquals(actual.getMaterializedRows(), expectedResult.getMaterializedRows());
+ }
+ }
</code_context>
<issue_to_address>
**suggestion (testing):** Test assertions could be more descriptive for easier debugging.
Including the type in the assertion message will make it easier to identify which type fails during testing.
```suggestion
assertEquals(
actual.getMaterializedRows(),
expectedResult.getMaterializedRows(),
String.format("Failed for type: %s", type.getTypeSignature().toString())
);
```
</issue_to_address>
### Comment 3
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java:632-633` </location>
<code_context>
+ @Test
+ public void testP4HyperLogLogWithApproxSet()
+ {
+ ImmutableList<Type> types = ImmutableList.of(BIGINT, VARCHAR, DOUBLE);
+ ImmutableList<Long> expected = ImmutableList.of(1002L, 1024L, 1014L);
+ for (Type type : types) {
+ MaterializedResult actual = computeActual(String.format("SELECT cardinality(cast(approx_set(cast(custkey AS %s)) AS P4HYPERLOGLOG)) FROM orders", type.getTypeSignature().toString()));
</code_context>
<issue_to_address>
**nitpick:** Clarify how expected cardinality values are determined.
Please document how the hardcoded expected values are determined, or consider computing them dynamically to reduce maintenance overhead if the data changes.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java
Outdated
Show resolved
Hide resolved
|
@kevintang2022 can you help to review this? |
|
Yes I just checked it with my team and internal use case, and it should be handled |
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pramodsatya
| { | ||
| ImmutableList<Type> types = ImmutableList.of(BIGINT, VARCHAR, DOUBLE); | ||
| ImmutableList<Long> expected = ImmutableList.of(1002L, 1024L, 1014L); | ||
| for (Type type : types) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be preferable to use
assertQuery("SELECT cardinality(cast(approx_set(cast(custkey AS %s)) AS P4HYPERLOGLOG)) FROM orders", "select count(*) from orders");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion @aditi-pandit, but such a comparison fails:
assertQuery("SELECT cardinality(cast(approx_set(cast(custkey AS BIGINT)) AS P4HYPERLOGLOG)) FROM orders", "SELECT count(DISTINCT custkey) FROM orders");
actual column types:
[bigint]
expected column types:
[bigint]
not equal
Actual rows (1 of 1 extra rows shown, 1 rows in total):
[1002]
Expected rows (1 of 1 missing rows shown, 1 rows in total):
[1000]
I suspect this is because cardinality(approx_set(P4HLL)) returns an approximate NDV with an error margin. The expected values in this PR are borrowed from existing presto-tests, so this validates the function returns the same value as Presto Java in Presto C++.
...native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java
Outdated
Show resolved
Hide resolved
754d347 to
f9df24f
Compare
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pramodsatya
f9df24f to
8566231
Compare
|
Thanks @aditi-pandit, @tdcmeehan, there were some checkstyle errors before which are fixed now. Could you please review/approve again? |
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pramodsatya
pdabre12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pramodsatya
Description
P4HyperLogLog type was added to Velox recently in facebookincubator/velox@75ec5ba, and this type should be supported by
NativeTypeManager.Motivation and Context
Support P4HLL type in
NativeTypeManager, uncovered bynative-testsin #23671.Impact
Support P4HLL in Presto C++ clusters with sidecar.
Test Plan
Added tests.