Skip to content

Conversation

@tdcmeehan
Copy link
Contributor

Description

Add support for custom connector-provided serialization codecs

Motivation and Context

This will allow plugin connectors to be written in C++.

RFC: prestodb/rfcs#49
End to end changes migrating TPCH to the new framework: #26026

Impact

No immediate impact

Test Plan

Included tests

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Oct 9, 2025
@prestodb-ci prestodb-ci requested review from a team and nishithakbhaskaran and removed request for a team October 9, 2025 02:37
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 9, 2025

Reviewer's Guide

This PR refactors the core Jackson serialization framework to support optional binary serialization via connector-provided codecs, controlled by a new feature flag, updates all connector handle Jackson modules to propagate codec settings, extends the SPI and configuration to expose codec providers, and introduces comprehensive tests and build configuration changes.

Class diagram for ConnectorCodecProvider SPI changes

classDiagram
    class ConnectorCodecProvider {
        + getConnectorSplitCodec() : Optional<ConnectorCodec<ConnectorSplit>>
        + getConnectorTableHandleCodec() : Optional<ConnectorCodec<ConnectorTableHandle>>
        + getColumnHandleCodec() : Optional<ConnectorCodec<ColumnHandle>>
        + getConnectorPartitioningHandleCodec() : Optional<ConnectorCodec<ConnectorPartitioningHandle>>
        + getConnectorIndexHandleCodec() : Optional<ConnectorCodec<ConnectorIndexHandle>>
        + getConnectorDeleteTableHandleCodec() : Optional<ConnectorCodec<ConnectorDeleteTableHandle>>
        + getConnectorInsertTableHandleCodec() : Optional<ConnectorCodec<ConnectorInsertTableHandle>>
        + getConnectorOutputTableHandleCodec() : Optional<ConnectorCodec<ConnectorOutputTableHandle>>
        + getConnectorTransactionHandleCodec() : Optional<ConnectorCodec<ConnectorTransactionHandle>>
    }
    ConnectorCodecProvider <|.. ConnectorCodec
Loading

File-Level Changes

Change Details Files
Refactor AbstractTypedJacksonModule to support optional binary serialization using connector-provided codecs
  • Add binarySerializationEnabled flag and codecExtractor parameter to the constructor
  • Implement nested CodecSerializer and CodecDeserializer that write/read a customSerializedValue field with Base64 encoding
  • Fallback to legacy InternalTypeSerializer/Deserializer when codec unavailable, disabled, or for internal handles
presto-main-base/src/main/java/com/facebook/presto/metadata/AbstractTypedJacksonModule.java
Update all connector handle Jackson modules to pass codec settings from FeaturesConfig and ConnectorManager
  • Inject FeaturesConfig and Provider into each module
  • Forward binarySerializationEnabled and codecExtractor to super constructor
  • Add test-friendly constructors accepting a codecExtractor directly
presto-main-base/src/main/java/com/facebook/presto/metadata/SplitJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/index/IndexHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/ColumnHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/DeleteTableHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/InsertTableHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/OutputTableHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/PartitioningHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/TableHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/TableLayoutHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/TransactionHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/FunctionHandleJacksonModule.java
Extend SPI and configuration to expose connector codec providers
  • Add useConnectorProvidedSerializationCodecs flag with @config in FeaturesConfig
  • Implement getConnectorCodecProvider in ConnectorManager
  • Add default handle codec methods in ConnectorCodecProvider for column, index, partitioning, etc.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
presto-main-base/src/main/java/com/facebook/presto/connector/ConnectorManager.java
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorCodecProvider.java
Introduce new and updated tests to validate binary codec logic and replace HandleJsonModule
  • Add TestHttpRemoteTaskConnectorCodec and TestAbstractTypedJacksonModule to cover end-to-end codec scenarios
  • Create TestingHandleJsonModule and update existing tests to use it instead of HandleJsonModule
  • Bind and exercise connector-provided codecs in various test scenarios
presto-main-base/src/test/java/com/facebook/presto/server/remotetask/TestHttpRemoteTaskConnectorCodec.java
presto-main-base/src/test/java/com/facebook/presto/metadata/TestAbstractTypedJacksonModule.java
presto-main-base/src/test/java/com/facebook/presto/metadata/TestingHandleJsonModule.java
multiple existing test classes under src/test/java updated to use TestingHandleJsonModule
Add git-commit-id-plugin to build configuration
  • Include pl.project13.maven:git-commit-id-plugin in pom.xml
  • Configure includeOnlyProperties and dateFormat for plugin
pom.xml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@tdcmeehan tdcmeehan requested review from shangm2 and removed request for nishithakbhaskaran October 9, 2025 02:37
@tdcmeehan tdcmeehan force-pushed the dyncon1 branch 2 times, most recently from 322a2e3 to b309402 Compare October 9, 2025 02:39
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The AbstractTypedJacksonModule constructor and its nested CodecSerializer/CodecDeserializer classes have grown very large; consider extracting these into separate top-level classes to improve readability and maintainability.
  • CodecSerializer’s serializerCache is unbounded; you may want to configure a maximum size or eviction policy to avoid potential memory leaks when serializing many different types.
  • The TestHttpRemoteTaskConnectorCodec class exceeds 1000 lines and tests multiple scenarios in one file; splitting it into smaller, focused test classes will make the tests easier to navigate and maintain.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The AbstractTypedJacksonModule constructor and its nested CodecSerializer/CodecDeserializer classes have grown very large; consider extracting these into separate top-level classes to improve readability and maintainability.
- CodecSerializer’s serializerCache is unbounded; you may want to configure a maximum size or eviction policy to avoid potential memory leaks when serializing many different types.
- The TestHttpRemoteTaskConnectorCodec class exceeds 1000 lines and tests multiple scenarios in one file; splitting it into smaller, focused test classes will make the tests easier to navigate and maintain.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/test/java/com/facebook/presto/metadata/TestAbstractTypedJacksonModule.java:269-139` </location>
<code_context>
+    public void testDirectBinaryDataDeserialization()
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a test for deserialization error handling when binary data is malformed.

Add a test that deserializes invalid or corrupted base64 data and asserts the correct exception is thrown to verify error handling.

Suggested implementation:

```java
    @Test
    public void testDirectBinaryDataDeserialization()
            throws Exception
    {
        // Test deserialization of manually crafted binary data JSON
        ConnectorCodec<TestHandle> codec = new SimpleCodec();
        ConnectorCodecProvider codecProvider = new ConnectorCodecProvider()
        {
            @Override
            public Optional<ConnectorCodec<com.facebook.presto.spi.ConnectorTableHandle>> getConnectorTableHandleCodec()
            {
                return Optional.of((ConnectorCodec) codec);
            }
        };

        // Additional test: deserialization error handling for malformed binary data
        @Test
        public void testMalformedBinaryDataDeserialization()
        {
            String malformedBase64Json = "{\"customSerializedValue\": \"!!not_base64!!\"}";
            ConnectorCodec<TestHandle> codec = new SimpleCodec();
            ObjectMapper objectMapper = new ObjectMapper();
            objectMapper.registerModule(new AbstractTypedJacksonModule(codecProvider));

            Exception exception = assertThrows(Exception.class, () -> {
                objectMapper.readValue(malformedBase64Json, TestHandle.class);
            });

            // Optionally, check the exception type or message
            // assertTrue(exception instanceof IllegalArgumentException || exception instanceof JsonMappingException);
        }

```

- Ensure that `SimpleCodec`, `TestHandle`, and `AbstractTypedJacksonModule` are accessible in the test context.
- If your test framework does not support `assertThrows`, use a try-catch block and fail the test if no exception is thrown.
- You may need to import `org.junit.jupiter.api.Assertions.assertThrows` or similar, depending on your test framework.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit, looks good!

network has high latency or if there are many nodes in the cluster.

``use-connector-provided-serialization-codecs``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@tdcmeehan
Copy link
Contributor Author

tdcmeehan commented Oct 9, 2025

The AbstractTypedJacksonModule constructor and its nested CodecSerializer/CodecDeserializer classes have grown very large; consider extracting these into separate top-level classes to improve readability and maintainability.

Did this.

CodecSerializer’s serializerCache is unbounded; you may want to configure a maximum size or eviction policy to avoid potential memory leaks when serializing many different types.

This cache is unchanged from before, it's just refactored.

The TestHttpRemoteTaskConnectorCodec class exceeds 1000 lines and tests multiple scenarios in one file; splitting it into smaller, focused test classes will make the tests easier to navigate and maintain.

Splitting this up would net more LOCs due to the extensive setup required for this test, so I am keeping it as it is.

@tdcmeehan tdcmeehan changed the title Add support for custom connector-provided serialization codecs feat(connector): Add support for custom connector-provided serialization codecs Oct 9, 2025
@tdcmeehan tdcmeehan force-pushed the dyncon1 branch 2 times, most recently from 40fb721 to 80228ac Compare October 10, 2025 01:30
steveburnett
steveburnett previously approved these changes Oct 10, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Copy link
Contributor

@pdabre12 pdabre12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tdcmeehan. LGTM, just one nit.

pdabre12
pdabre12 previously approved these changes Oct 24, 2025
yingsu00
yingsu00 previously approved these changes Oct 28, 2025
@tdcmeehan tdcmeehan dismissed stale reviews from yingsu00 and pdabre12 via 43e73d9 October 29, 2025 19:10
yingsu00
yingsu00 previously approved these changes Oct 29, 2025
Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan Could you please squash the commits when merging? Thanks!

Copy link
Contributor

@wraymo wraymo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just a few minor comments.

Comment on lines +671 to +675
jsonBinder(binder).addModuleBinding().toInstance(new com.facebook.presto.index.IndexHandleJacksonModule(handleResolver, featuresConfig, noOpIndexCodec));
jsonBinder(binder).addModuleBinding().toInstance(new TransactionHandleJacksonModule(handleResolver, featuresConfig, noOpTransactionCodec));
jsonBinder(binder).addModuleBinding().toInstance(new PartitioningHandleJacksonModule(handleResolver, featuresConfig, noOpPartitioningCodec));
jsonBinder(binder).addModuleBinding().toInstance(new FunctionHandleJacksonModule(handleResolver));
jsonBinder(binder).addModuleBinding().toInstance(new SplitJacksonModule(handleResolver, featuresConfig, splitCodecExtractor));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we didn't test IndexHandle, TransactionHandle, PartitionHandle and SplitHandle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had trouble crafting real tests for the first three in a connector-neutral way without overcomplicating the test even more. But note splits should be handled in this test. Personally I'm OK with the coverage because there's really just two types of things to serialize and deserialize, the structures found in TaskUpdateRequest, and the structures found in TaskUpdateRequest.fragment, and we test both types in this test.

@tdcmeehan tdcmeehan merged commit 3d99a78 into prestodb:master Nov 5, 2025
83 of 84 checks passed
acarpente-denodo added a commit to acarpente-denodo/presto that referenced this pull request Nov 5, 2025
feat(connector): Add support for custom connector-provided serialization codecs (prestodb#26257)
acarpente-denodo added a commit to acarpente-denodo/presto that referenced this pull request Nov 6, 2025
feat(connector): Add support for custom connector-provided serialization codecs (prestodb#26257)
acarpente-denodo added a commit to acarpente-denodo/presto that referenced this pull request Nov 7, 2025
feat(connector): Add support for custom connector-provided serialization codecs (prestodb#26257)
acarpente-denodo added a commit to acarpente-denodo/presto that referenced this pull request Nov 11, 2025
feat(connector): Add support for custom connector-provided serialization codecs (prestodb#26257)
acarpente-denodo added a commit to acarpente-denodo/presto that referenced this pull request Nov 12, 2025
Changes to include the code necessary for:
- feat(connector): Add support for custom connector-provided serialization codecs (prestodb#26257)
- Add support to provide thrift codec for connector specific fields (prestodb#25242)
acarpente-denodo added a commit to acarpente-denodo/presto that referenced this pull request Nov 13, 2025
Changes to include the code necessary for:
- feat(connector): Add support for custom connector-provided serialization codecs (prestodb#26257)
- Add support to provide thrift codec for connector specific fields (prestodb#25242)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants