Skip to content

Conversation

@pramodsatya
Copy link
Contributor

@pramodsatya pramodsatya commented Oct 27, 2025

Description

P4HyperLogLog type was added to Velox recently in facebookincubator/velox@75ec5ba, and this type should be supported by NativeTypeManager.

Motivation and Context

Support P4HLL type in NativeTypeManager, uncovered by native-tests in #23671.

Impact

Support P4HLL in Presto C++ clusters with sidecar.

Test Plan

Added tests.

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Oct 27, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 27, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Registers the new P4HyperLogLog native type in the type manager and verifies its support through an end-to-end test using approx_set and cardinality.

Class diagram for updated NativeTypeManager with P4HyperLogLog support

classDiagram
    class NativeTypeManager {
        +DOUBLE
        +SMALLINT
        +HYPER_LOG_LOG
        +P4_HYPER_LOG_LOG
        +JSON
        +TIMESTAMP_WITH_TIME_ZONE
        +UUID
        ...
    }
Loading

File-Level Changes

Change Details Files
Enable P4HyperLogLog in NativeTypeManager
  • Added P4_HYPER_LOG_LOG constant to supported type list
presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/typemanager/NativeTypeManager.java
Add integration test for P4HyperLogLog support
  • Introduced testP4HyperLogLogWithApproxSet method
  • Iterate BIGINT, VARCHAR, DOUBLE input types
  • Assert cardinality of approx_set cast to P4HYPERLOGLOG matches expected
presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@aditi-pandit
Copy link
Contributor

@pramodsatya : Do we have any tests in presto-native-tests for this ? Would be great to have some.

@pramodsatya
Copy link
Contributor Author

pramodsatya commented Oct 27, 2025

@pramodsatya : Do we have any tests in presto-native-tests for this ? Would be great to have some.

@aditi-pandit, yes we will be adding native-tests for this and the P4HLL support added to Velox recently was found out while running the tests from #23671. Will update the PR description accordingly and open for review soon, and also rebase the final native-tests PR and open it for review.

@pramodsatya pramodsatya marked this pull request as ready for review October 28, 2025 04:31
@pramodsatya pramodsatya requested review from a team and pdabre12 as code owners October 28, 2025 04:31
@prestodb-ci prestodb-ci requested review from a team and pratyakshsharma and removed request for a team October 28, 2025 04:31
@pramodsatya
Copy link
Contributor Author

@aditi-pandit, @pdabre12, please take a look.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java:629-641` </location>
<code_context>
                 ".*Scalar function name not registered: native.default.key_sampling_percent.*");
     }

+    @Test
+    public void testP4HyperLogLogWithApproxSet()
+    {
+        ImmutableList<Type> types = ImmutableList.of(BIGINT, VARCHAR, DOUBLE);
+        ImmutableList<Long> expected = ImmutableList.of(1002L, 1024L, 1014L);
+        for (Type type : types) {
+            MaterializedResult actual = computeActual(String.format("SELECT cardinality(cast(approx_set(cast(custkey AS %s)) AS P4HYPERLOGLOG)) FROM orders", type.getTypeSignature().toString()));
+            MaterializedResult expectedResult = resultBuilder(getSession(), BIGINT)
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding negative and edge case tests for P4HyperLogLog support.

Please add tests for unsupported type casts, null values, empty sets, and very large inputs to cover error handling and edge cases.

```suggestion
    @Test
    public void testP4HyperLogLogWithApproxSet()
    {
        ImmutableList<Type> types = ImmutableList.of(BIGINT, VARCHAR, DOUBLE);
        ImmutableList<Long> expected = ImmutableList.of(1002L, 1024L, 1014L);
        for (Type type : types) {
            MaterializedResult actual = computeActual(String.format("SELECT cardinality(cast(approx_set(cast(custkey AS %s)) AS P4HYPERLOGLOG)) FROM orders", type.getTypeSignature().toString()));
            MaterializedResult expectedResult = resultBuilder(getSession(), BIGINT)
                    .row(expected.get(types.indexOf(type)))
                    .build();
            assertEquals(actual.getMaterializedRows(), expectedResult.getMaterializedRows());
        }
    }

    @Test
    public void testP4HyperLogLogUnsupportedType()
    {
        // Try to use an unsupported type (e.g., ARRAY)
        try {
            computeActual("SELECT cardinality(cast(approx_set(cast(custkey AS ARRAY<BIGINT>)) AS P4HYPERLOGLOG)) FROM orders");
            assertFalse(true, "Expected failure for unsupported type cast");
        }
        catch (Exception e) {
            // Expected: function not supported for ARRAY
            assertFalse(e.getMessage().isEmpty());
        }
    }

    @Test
    public void testP4HyperLogLogWithNullValues()
    {
        // Create a table with some nulls
        computeActual("CREATE TABLE tmp_nulls AS SELECT custkey, IF (custkey % 10 = 0, NULL, custkey) AS nullable_custkey FROM orders");
        MaterializedResult actual = computeActual("SELECT cardinality(cast(approx_set(nullable_custkey) AS P4HYPERLOGLOG)) FROM tmp_nulls");
        // Should ignore nulls, cardinality should be less than or equal to original
        MaterializedResult expected = computeActual("SELECT cardinality(cast(approx_set(custkey) AS P4HYPERLOGLOG)) FROM orders");
        assertFalse(actual.getMaterializedRows().isEmpty());
        assertFalse(expected.getMaterializedRows().isEmpty());
        Long actualValue = (Long) actual.getMaterializedRows().get(0).getField(0);
        Long expectedValue = (Long) expected.getMaterializedRows().get(0).getField(0);
        assertFalse(actualValue > expectedValue);
        computeActual("DROP TABLE tmp_nulls");
    }

    @Test
    public void testP4HyperLogLogWithEmptySet()
    {
        // Create an empty table
        computeActual("CREATE TABLE tmp_empty AS SELECT custkey FROM orders WHERE 1=0");
        MaterializedResult actual = computeActual("SELECT cardinality(cast(approx_set(custkey) AS P4HYPERLOGLOG)) FROM tmp_empty");
        MaterializedResult expected = resultBuilder(getSession(), BIGINT)
                .row(0L)
                .build();
        assertEquals(actual.getMaterializedRows(), expected.getMaterializedRows());
        computeActual("DROP TABLE tmp_empty");
    }

    @Test
    public void testP4HyperLogLogWithLargeInput()
    {
        // Create a large table (simulate by duplicating orders)
        computeActual("CREATE TABLE tmp_large AS SELECT custkey FROM orders UNION ALL SELECT custkey FROM orders UNION ALL SELECT custkey FROM orders");
        MaterializedResult actual = computeActual("SELECT cardinality(cast(approx_set(custkey) AS P4HYPERLOGLOG)) FROM tmp_large");
        // Cardinality should be similar to original orders, since keys are duplicated
        MaterializedResult expected = computeActual("SELECT cardinality(cast(approx_set(custkey) AS P4HYPERLOGLOG)) FROM orders");
        Long actualValue = (Long) actual.getMaterializedRows().get(0).getField(0);
        Long expectedValue = (Long) expected.getMaterializedRows().get(0).getField(0);
        // Should be equal since all keys are duplicated
        assertEquals(actualValue, expectedValue);
        computeActual("DROP TABLE tmp_large");
    }
```
</issue_to_address>

### Comment 2
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java:639` </location>
<code_context>
+    {
+        ImmutableList<Type> types = ImmutableList.of(BIGINT, VARCHAR, DOUBLE);
+        ImmutableList<Long> expected = ImmutableList.of(1002L, 1024L, 1014L);
+        for (Type type : types) {
+            MaterializedResult actual = computeActual(String.format("SELECT cardinality(cast(approx_set(cast(custkey AS %s)) AS P4HYPERLOGLOG)) FROM orders", type.getTypeSignature().toString()));
+            MaterializedResult expectedResult = resultBuilder(getSession(), BIGINT)
+                    .row(expected.get(types.indexOf(type)))
+                    .build();
+            assertEquals(actual.getMaterializedRows(), expectedResult.getMaterializedRows());
+        }
+    }
</code_context>

<issue_to_address>
**suggestion (testing):** Test assertions could be more descriptive for easier debugging.

Including the type in the assertion message will make it easier to identify which type fails during testing.

```suggestion
            assertEquals(
                actual.getMaterializedRows(),
                expectedResult.getMaterializedRows(),
                String.format("Failed for type: %s", type.getTypeSignature().toString())
            );
```
</issue_to_address>

### Comment 3
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java:632-633` </location>
<code_context>
+    @Test
+    public void testP4HyperLogLogWithApproxSet()
+    {
+        ImmutableList<Type> types = ImmutableList.of(BIGINT, VARCHAR, DOUBLE);
+        ImmutableList<Long> expected = ImmutableList.of(1002L, 1024L, 1014L);
+        for (Type type : types) {
+            MaterializedResult actual = computeActual(String.format("SELECT cardinality(cast(approx_set(cast(custkey AS %s)) AS P4HYPERLOGLOG)) FROM orders", type.getTypeSignature().toString()));
</code_context>

<issue_to_address>
**nitpick:** Clarify how expected cardinality values are determined.

Please document how the hardcoded expected values are determined, or consider computing them dynamically to reduce maintenance overhead if the data changes.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@tdcmeehan
Copy link
Contributor

@kevintang2022 can you help to review this?

@kevintang2022
Copy link
Contributor

Yes I just checked it with my team and internal use case, and it should be handled

kevintang2022
kevintang2022 previously approved these changes Oct 28, 2025
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya

{
ImmutableList<Type> types = ImmutableList.of(BIGINT, VARCHAR, DOUBLE);
ImmutableList<Long> expected = ImmutableList.of(1002L, 1024L, 1014L);
for (Type type : types) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be preferable to use

assertQuery("SELECT cardinality(cast(approx_set(cast(custkey AS %s)) AS P4HYPERLOGLOG)) FROM orders", "select count(*) from orders");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @aditi-pandit, but such a comparison fails:

assertQuery("SELECT cardinality(cast(approx_set(cast(custkey AS BIGINT)) AS P4HYPERLOGLOG)) FROM orders", "SELECT count(DISTINCT custkey) FROM orders");
 actual column types:
 [bigint]
expected column types:
[bigint]

not equal
Actual rows (1 of 1 extra rows shown, 1 rows in total):
    [1002]
Expected rows (1 of 1 missing rows shown, 1 rows in total):
    [1000]

I suspect this is because cardinality(approx_set(P4HLL)) returns an approximate NDV with an error margin. The expected values in this PR are borrowed from existing presto-tests, so this validates the function returns the same value as Presto Java in Presto C++.

aditi-pandit
aditi-pandit previously approved these changes Oct 30, 2025
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya

tdcmeehan
tdcmeehan previously approved these changes Oct 30, 2025
@pramodsatya
Copy link
Contributor Author

Thanks @aditi-pandit, @tdcmeehan, there were some checkstyle errors before which are fixed now. Could you please review/approve again?

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya

Copy link
Contributor

@pdabre12 pdabre12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya

@pramodsatya pramodsatya merged commit 4781438 into prestodb:master Oct 31, 2025
117 of 119 checks passed
@pramodsatya pramodsatya deleted the p4_hll_cpp branch October 31, 2025 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants