-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(connector): Add support for custom connector-provided serialization codecs #26257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideThis PR refactors the core Jackson serialization framework to support optional binary serialization via connector-provided codecs, controlled by a new feature flag, updates all connector handle Jackson modules to propagate codec settings, extends the SPI and configuration to expose codec providers, and introduces comprehensive tests and build configuration changes. Class diagram for ConnectorCodecProvider SPI changesclassDiagram
class ConnectorCodecProvider {
+ getConnectorSplitCodec() : Optional<ConnectorCodec<ConnectorSplit>>
+ getConnectorTableHandleCodec() : Optional<ConnectorCodec<ConnectorTableHandle>>
+ getColumnHandleCodec() : Optional<ConnectorCodec<ColumnHandle>>
+ getConnectorPartitioningHandleCodec() : Optional<ConnectorCodec<ConnectorPartitioningHandle>>
+ getConnectorIndexHandleCodec() : Optional<ConnectorCodec<ConnectorIndexHandle>>
+ getConnectorDeleteTableHandleCodec() : Optional<ConnectorCodec<ConnectorDeleteTableHandle>>
+ getConnectorInsertTableHandleCodec() : Optional<ConnectorCodec<ConnectorInsertTableHandle>>
+ getConnectorOutputTableHandleCodec() : Optional<ConnectorCodec<ConnectorOutputTableHandle>>
+ getConnectorTransactionHandleCodec() : Optional<ConnectorCodec<ConnectorTransactionHandle>>
}
ConnectorCodecProvider <|.. ConnectorCodec
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
322a2e3 to
b309402
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The AbstractTypedJacksonModule constructor and its nested CodecSerializer/CodecDeserializer classes have grown very large; consider extracting these into separate top-level classes to improve readability and maintainability.
- CodecSerializer’s serializerCache is unbounded; you may want to configure a maximum size or eviction policy to avoid potential memory leaks when serializing many different types.
- The TestHttpRemoteTaskConnectorCodec class exceeds 1000 lines and tests multiple scenarios in one file; splitting it into smaller, focused test classes will make the tests easier to navigate and maintain.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The AbstractTypedJacksonModule constructor and its nested CodecSerializer/CodecDeserializer classes have grown very large; consider extracting these into separate top-level classes to improve readability and maintainability.
- CodecSerializer’s serializerCache is unbounded; you may want to configure a maximum size or eviction policy to avoid potential memory leaks when serializing many different types.
- The TestHttpRemoteTaskConnectorCodec class exceeds 1000 lines and tests multiple scenarios in one file; splitting it into smaller, focused test classes will make the tests easier to navigate and maintain.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/test/java/com/facebook/presto/metadata/TestAbstractTypedJacksonModule.java:269-139` </location>
<code_context>
+ public void testDirectBinaryDataDeserialization()
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for deserialization error handling when binary data is malformed.
Add a test that deserializes invalid or corrupted base64 data and asserts the correct exception is thrown to verify error handling.
Suggested implementation:
```java
@Test
public void testDirectBinaryDataDeserialization()
throws Exception
{
// Test deserialization of manually crafted binary data JSON
ConnectorCodec<TestHandle> codec = new SimpleCodec();
ConnectorCodecProvider codecProvider = new ConnectorCodecProvider()
{
@Override
public Optional<ConnectorCodec<com.facebook.presto.spi.ConnectorTableHandle>> getConnectorTableHandleCodec()
{
return Optional.of((ConnectorCodec) codec);
}
};
// Additional test: deserialization error handling for malformed binary data
@Test
public void testMalformedBinaryDataDeserialization()
{
String malformedBase64Json = "{\"customSerializedValue\": \"!!not_base64!!\"}";
ConnectorCodec<TestHandle> codec = new SimpleCodec();
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.registerModule(new AbstractTypedJacksonModule(codecProvider));
Exception exception = assertThrows(Exception.class, () -> {
objectMapper.readValue(malformedBase64Json, TestHandle.class);
});
// Optionally, check the exception type or message
// assertTrue(exception instanceof IllegalArgumentException || exception instanceof JsonMappingException);
}
```
- Ensure that `SimpleCodec`, `TestHandle`, and `AbstractTypedJacksonModule` are accessible in the test context.
- If your test framework does not support `assertThrows`, use a try-catch block and fail the test if no exception is thrown.
- You may need to import `org.junit.jupiter.api.Assertions.assertThrows` or similar, depending on your test framework.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one nit, looks good!
| network has high latency or if there are many nodes in the cluster. | ||
|
|
||
| ``use-connector-provided-serialization-codecs`` | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Did this.
This cache is unchanged from before, it's just refactored.
Splitting this up would net more LOCs due to the extensive setup required for this test, so I am keeping it as it is. |
40fb721 to
80228ac
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
39f0505 to
7a68100
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tdcmeehan. LGTM, just one nit.
presto-main-base/src/test/java/com/facebook/presto/metadata/TestAbstractTypedJacksonModule.java
Outdated
Show resolved
Hide resolved
yingsu00
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan Could you please squash the commits when merging? Thanks!
wraymo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Just a few minor comments.
...in/src/test/java/com/facebook/presto/server/remotetask/TestHttpRemoteTaskConnectorCodec.java
Outdated
Show resolved
Hide resolved
| jsonBinder(binder).addModuleBinding().toInstance(new com.facebook.presto.index.IndexHandleJacksonModule(handleResolver, featuresConfig, noOpIndexCodec)); | ||
| jsonBinder(binder).addModuleBinding().toInstance(new TransactionHandleJacksonModule(handleResolver, featuresConfig, noOpTransactionCodec)); | ||
| jsonBinder(binder).addModuleBinding().toInstance(new PartitioningHandleJacksonModule(handleResolver, featuresConfig, noOpPartitioningCodec)); | ||
| jsonBinder(binder).addModuleBinding().toInstance(new FunctionHandleJacksonModule(handleResolver)); | ||
| jsonBinder(binder).addModuleBinding().toInstance(new SplitJacksonModule(handleResolver, featuresConfig, splitCodecExtractor)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we didn't test IndexHandle, TransactionHandle, PartitionHandle and SplitHandle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had trouble crafting real tests for the first three in a connector-neutral way without overcomplicating the test even more. But note splits should be handled in this test. Personally I'm OK with the coverage because there's really just two types of things to serialize and deserialize, the structures found in TaskUpdateRequest, and the structures found in TaskUpdateRequest.fragment, and we test both types in this test.
presto-main-base/src/main/java/com/facebook/presto/metadata/CodecDeserializer.java
Outdated
Show resolved
Hide resolved
...in/src/test/java/com/facebook/presto/server/remotetask/TestHttpRemoteTaskConnectorCodec.java
Outdated
Show resolved
Hide resolved
...in/src/test/java/com/facebook/presto/server/remotetask/TestHttpRemoteTaskConnectorCodec.java
Outdated
Show resolved
Hide resolved
feat(connector): Add support for custom connector-provided serialization codecs (prestodb#26257)
feat(connector): Add support for custom connector-provided serialization codecs (prestodb#26257)
feat(connector): Add support for custom connector-provided serialization codecs (prestodb#26257)
feat(connector): Add support for custom connector-provided serialization codecs (prestodb#26257)
Changes to include the code necessary for: - feat(connector): Add support for custom connector-provided serialization codecs (prestodb#26257) - Add support to provide thrift codec for connector specific fields (prestodb#25242)
Changes to include the code necessary for: - feat(connector): Add support for custom connector-provided serialization codecs (prestodb#26257) - Add support to provide thrift codec for connector specific fields (prestodb#25242)
Description
Add support for custom connector-provided serialization codecs
Motivation and Context
This will allow plugin connectors to be written in C++.
RFC: prestodb/rfcs#49
End to end changes migrating TPCH to the new framework: #26026
Impact
No immediate impact
Test Plan
Included tests
Contributor checklist
Release Notes