-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(client): Add serialized response format option to Presto client #26654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR adds support in the Java Presto client for requesting and processing serialized (binary) query results from the server by introducing a new binaryResults flag, updating StatementClientV1 to send the new query parameter and decode Base64-encoded pages using newly added ClientTypeManager and BinaryDataDeserializer, and adds corresponding tests and build dependencies. Sequence diagram for requesting and processing serialized results in StatementClientV1sequenceDiagram
participant "Client (StatementClientV1)"
participant "Presto Server"
participant "BinaryDataDeserializer"
"Client (StatementClientV1)"->>"Presto Server": POST /v1/statement?binaryResults=true
"Presto Server"-->>"Client (StatementClientV1)": Response with binaryData (Base64 pages)
"Client (StatementClientV1)"->>"BinaryDataDeserializer": deserialize(columns, binaryData)
"BinaryDataDeserializer"-->>"Client (StatementClientV1)": Iterable<List<Object>> (decoded rows)
"Client (StatementClientV1)"->>"Client (StatementClientV1)": set processedResults
Class diagram for new and updated Presto client types (serialized response support)classDiagram
class StatementClientV1 {
- boolean binaryResults
- BinaryDataDeserializer binaryDataDeserializer
+ StatementClientV1(...)
+ deserializeBinaryResults(QueryResults): QueryResults
}
class ClientSession {
- boolean binaryResults
+ isBinaryResults(): boolean
}
class Builder {
- boolean binaryResults
+ withBinaryResults(boolean): Builder
}
class BinaryDataDeserializer {
- PagesSerde pagesSerde
- TypeManager typeManager
- SqlFunctionProperties sqlFunctionProperties
+ BinaryDataDeserializer(BlockEncodingSerde, TypeManager, ClientSession)
+ deserialize(List<Column>, Iterable<String>): Iterable<List<Object>>
}
class ClientTypeManager {
- Map<TypeSignature, Type> typeCache
+ getType(TypeSignature): Type
+ getParameterizedType(String, List<TypeSignatureParameter>): Type
+ hasType(TypeSignature): boolean
}
class QueryParameters {
+ BINARY_RESULTS: String
}
StatementClientV1 --> BinaryDataDeserializer
StatementClientV1 --> ClientSession
BinaryDataDeserializer --> ClientTypeManager
BinaryDataDeserializer --> ClientSession
ClientSession <|-- Builder
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `presto-client/src/main/java/com/facebook/presto/client/BinaryDataDeserializer.java:69-79` </location>
<code_context>
+ List<Type> columnTypes = extractTypesFromColumns(columns);
+ ImmutableList.Builder<List<Object>> allRows = ImmutableList.builder();
+
+ for (String encodedPage : binaryData) {
+ byte[] pageBytes = Base64.getDecoder().decode(encodedPage);
+ Slice slice = Slices.wrappedBuffer(pageBytes);
+
+ BasicSliceInput sliceInput = slice.getInput();
+ SerializedPage serializedPage = readSerializedPage(sliceInput);
+
+ Page page = pagesSerde.deserialize(serializedPage);
+
+ allRows.addAll(convertPageToRows(page, columnTypes));
+ }
+
</code_context>
<issue_to_address>
**suggestion:** Consider handling invalid Base64 input more gracefully.
Catching the exception and including the page identifier in the error message will improve error reporting and debugging.
```suggestion
for (String encodedPage : binaryData) {
byte[] pageBytes;
try {
pageBytes = Base64.getDecoder().decode(encodedPage);
}
catch (IllegalArgumentException e) {
throw new IllegalArgumentException("Failed to decode Base64 for page: " + encodedPage, e);
}
Slice slice = Slices.wrappedBuffer(pageBytes);
BasicSliceInput sliceInput = slice.getInput();
SerializedPage serializedPage = readSerializedPage(sliceInput);
Page page = pagesSerde.deserialize(serializedPage);
allRows.addAll(convertPageToRows(page, columnTypes));
}
```
</issue_to_address>
### Comment 2
<location> `presto-client/src/main/java/com/facebook/presto/client/BinaryDataDeserializer.java:111-122` </location>
<code_context>
+ {
+ ImmutableList.Builder<Type> types = ImmutableList.builder();
+
+ for (Column column : columns) {
+ ClientTypeSignature clientTypeSignature = column.getTypeSignature();
+ TypeSignature typeSignature = convertClientTypeSignatureToTypeSignature(clientTypeSignature);
+ Type type = typeManager.getType(typeSignature);
+
+ if (type == null) {
+ throw new IllegalArgumentException("Unknown type: " + typeSignature);
+ }
</code_context>
<issue_to_address>
**suggestion:** Consider logging or reporting unknown type errors with column context.
Including the column name or index in the exception message will help identify which column caused the error.
```suggestion
private List<Type> extractTypesFromColumns(List<Column> columns)
{
ImmutableList.Builder<Type> types = ImmutableList.builder();
for (int i = 0; i < columns.size(); i++) {
Column column = columns.get(i);
ClientTypeSignature clientTypeSignature = column.getTypeSignature();
TypeSignature typeSignature = convertClientTypeSignatureToTypeSignature(clientTypeSignature);
Type type = typeManager.getType(typeSignature);
if (type == null) {
String columnName = column.getName();
throw new IllegalArgumentException(
String.format("Unknown type: %s for column '%s' (index %d)", typeSignature, columnName, i)
);
}
```
</issue_to_address>
### Comment 3
<location> `presto-client/src/test/java/com/facebook/presto/client/TestQueryResults.java:66-77` </location>
<code_context>
assertTrue(console.clientOptions.toClientSession().validateNextUriSource());
}
+
+ @Test
+ public void testSerializedFormat()
+ {
</code_context>
<issue_to_address>
**suggestion (testing):** Binary data deserialization test does not verify actual deserialization of data.
Please add a test that exercises the complete deserialization process to confirm that binary data is correctly converted into rows and values.
```suggestion
@Test
public void testBinaryDataDeserialization()
{
String jsonWithBinaryData = "{\n" +
" \"id\" : \"test_binary_query\",\n" +
" \"infoUri\" : \"http://localhost:8080/query.html?test_binary_query\",\n" +
" \"binaryData\" : [ \"YmluYXJ5X2RhdGFfMQ==\", \"YmluYXJ5X2RhdGFfMg==\" ],\n" +
" \"stats\" : {\n" +
" \"state\" : \"FINISHED\",\n" +
" \"queued\" : false,\n" +
" \"scheduled\" : false,\n" +
" \"nodes\" : 0\n" +
" }\n" +
"}";
QueryResults results = QUERY_RESULTS_CODEC.fromJson(jsonWithBinaryData);
assertEquals(results.getId(), "test_binary_query");
assertNotNull(results.getBinaryData());
assertEquals(results.getBinaryData().size(), 2);
// Verify actual deserialization of binary data
byte[] expected1 = "binary_data_1".getBytes();
byte[] expected2 = "binary_data_2".getBytes();
assertArrayEquals(expected1, results.getBinaryData().get(0));
assertArrayEquals(expected2, results.getBinaryData().get(1));
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| private List<Type> extractTypesFromColumns(List<Column> columns) | ||
| { | ||
| ImmutableList.Builder<Type> types = ImmutableList.builder(); | ||
|
|
||
| for (Column column : columns) { | ||
| ClientTypeSignature clientTypeSignature = column.getTypeSignature(); | ||
| TypeSignature typeSignature = convertClientTypeSignatureToTypeSignature(clientTypeSignature); | ||
| Type type = typeManager.getType(typeSignature); | ||
|
|
||
| if (type == null) { | ||
| throw new IllegalArgumentException("Unknown type: " + typeSignature); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Consider logging or reporting unknown type errors with column context.
Including the column name or index in the exception message will help identify which column caused the error.
| private List<Type> extractTypesFromColumns(List<Column> columns) | |
| { | |
| ImmutableList.Builder<Type> types = ImmutableList.builder(); | |
| for (Column column : columns) { | |
| ClientTypeSignature clientTypeSignature = column.getTypeSignature(); | |
| TypeSignature typeSignature = convertClientTypeSignatureToTypeSignature(clientTypeSignature); | |
| Type type = typeManager.getType(typeSignature); | |
| if (type == null) { | |
| throw new IllegalArgumentException("Unknown type: " + typeSignature); | |
| } | |
| private List<Type> extractTypesFromColumns(List<Column> columns) | |
| { | |
| ImmutableList.Builder<Type> types = ImmutableList.builder(); | |
| for (int i = 0; i < columns.size(); i++) { | |
| Column column = columns.get(i); | |
| ClientTypeSignature clientTypeSignature = column.getTypeSignature(); | |
| TypeSignature typeSignature = convertClientTypeSignatureToTypeSignature(clientTypeSignature); | |
| Type type = typeManager.getType(typeSignature); | |
| if (type == null) { | |
| String columnName = column.getName(); | |
| throw new IllegalArgumentException( | |
| String.format("Unknown type: %s for column '%s' (index %d)", typeSignature, columnName, i) | |
| ); | |
| } |
…restodb#26654) Summary: The capability exists for Presto server to return serialized responses, but is currently not enabled in the client nor piped through to the CLI. As far as I know, there is no OSS client which can read using the serialized interface. It makes it kind of annoying to test out any changes in this area Adding support in the client to request and process serialized responses from the server To support the new client processing, I created a TypeManager and binary deserializer for use in the client NOTE: right now I only handle the types located in presto-common on the serialized path; there are some additional types declared in presto-main-base which the client does not have access to. Those could either be moved or redefined in presto-client, but since serialized APIs are opt-in and fails fast, it seemed acceptable to defer support for those types Differential Revision: D87268210
aa06998 to
f699c09
Compare
|
When the binary response field was added, it was deliberately excluded from clients because this wasn't intended to be a public API. If we want to expose a binary API to users, I would prefer for it to be well-known format and something we don't support, such as Arrow. Given the motivation for this PR is for testing, I'm wondering if using the REST API directly would satisfy that need, or perhaps enabling binary results only in the low-level Presto client, but not in the user-facing CLI. |
…restodb#26654) Summary: The capability exists for Presto server to return serialized responses, but is currently not enabled in the client nor piped through to the CLI. As far as I know, there is no OSS client which can read using the serialized interface. It makes it kind of annoying to test out any changes in this area Adding support in the client to request and process serialized responses from the server To support the new client processing, I created a TypeManager and binary deserializer for use in the client NOTE: right now I only handle the types located in presto-common on the serialized path; there are some additional types declared in presto-main-base which the client does not have access to. Those could either be moved or redefined in presto-client, but since serialized APIs are opt-in and fails fast, it seemed acceptable to defer support for those types Differential Revision: D87268210
f699c09 to
9c42bcb
Compare
@tdcmeehan I wasn't aware of that motivation, thanks. I was wondering why the server would make this option available but there were no available open source clients to use it. What was the rationale behind not making this a public API?
Given how widely used the binary response formatting is internally, I was hoping for more coverage/support in the Presto repository generally (for the Presto verifier as well, though I suppose I just need serialized support in the StatementClient for that, not the CLI) My preference is still to have some way to enable this via the CLI if possible. We could hide this CLI option if we want to discourage usage (https://github.com/airlift/airline/blob/master/src/main/java/io/airlift/airline/Option.java) but perhaps that's bad practice |
It's simply because the motivation behind the binary results flag was for ease of fuzzer testing. C++ code doesn't need to parse JSON data structures, it can just treat the binary blob as a Presto Page. A real binary protocol would not be designed like that. I expect:
Can you share how you are using it internally, and why you need it? It will help me understand better. |
Several internal clients are capable of requesting and processing binary-serialized pages. Typically, this is enabled for performance reasons, since PrestoPage -> JSON serialization can bottleneck some query patterns Typically the client would use Velox deserialization libraries to unpack the binary data into Velox RowVectors and return those vectors to the caller, or translate them into some other structure
Perhaps you have context whether others are using this serialized interface similarly (outside Meta). I agree for the most part with the points you're making about the shortcomings of the binary protocol as currently written For our part, there are some important workloads which are depending on the correctness of the existing binary interface and would like this existing interface better supported/covered by automation. That being said I take your point if our intention is to deter public usage of the existing serialized interface What do you think about registering this as a hidden CLI option? (https://github.com/airlift/airline/blob/master/src/main/java/io/airlift/airline/Option.java) |
Description
The capability exists for Presto server to return serialized responses, but is currently not enabled in the client nor piped through to the CLI. This change adds support for the client to request and process serialized responses via the
--serializedflagNOTE: right now I only handle the types located in presto-common on the serialized path; there are some additional types declared in presto-main-base which the client does not have access to. Those could either be moved or redefined in presto-client, but since serialized APIs are opt-in and fails fast, it seemed acceptable to defer support for those types
Motivation and Context
As far as I know, there is no OSS client which can read using the serialized interface. This can make it difficult to test changes which may affect serialized query responses.
Impact
Callers can now read from the server using the binary serialized interface using the Presto CLI.
Test Plan
Test manually against tpch:
I also added test coverage to the following components:
And some E2E test coverage using both the binary serialized and JSON-serialized response formats
Contributor checklist
Release Notes
Differential Revision: D87268210