Skip to content

Conversation

@hdikeman
Copy link
Contributor

@hdikeman hdikeman commented Nov 19, 2025

Description

The capability exists for Presto server to return serialized responses, but is currently not enabled in the client nor piped through to the CLI. This change adds support for the client to request and process serialized responses via the --serialized flag

NOTE: right now I only handle the types located in presto-common on the serialized path; there are some additional types declared in presto-main-base which the client does not have access to. Those could either be moved or redefined in presto-client, but since serialized APIs are opt-in and fails fast, it seemed acceptable to defer support for those types

Motivation and Context

As far as I know, there is no OSS client which can read using the serialized interface. This can make it difficult to test changes which may affect serialized query responses.

Impact

Callers can now read from the server using the binary serialized interface using the Presto CLI.

Test Plan

Test manually against tpch:

$ ./target/presto-cli-0.296-SNAPSHOT-executable.jar --server localhost:8081 --catalog tpch --schema tiny --serialized
presto:tiny> select * from lineitem limit 10;
 l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity | l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus | l_shipdate | l_commitdate | l_receiptdate |  l_shipinstruct   | l_shipmode >
------------+-----------+-----------+--------------+------------+-----------------+------------+-------+--------------+--------------+------------+--------------+---------------+-------------------+------------>
          1 |    155190 |      7706 |            1 |       17.0 |        21168.23 |       0.04 |  0.02 | N            | O            | 1996-03-13 | 1996-02-12   | 1996-03-22    | DELIVER IN PERSON | TRUCK      >
          1 |     67310 |      7311 |            2 |       36.0 |        45983.16 |       0.09 |  0.06 | N            | O            | 1996-04-12 | 1996-02-28   | 1996-04-20    | TAKE BACK RETURN  | MAIL       >
          1 |     63700 |      3701 |            3 |        8.0 |         13309.6 |        0.1 |  0.02 | N            | O            | 1996-01-29 | 1996-03-05   | 1996-01-31    | TAKE BACK RETURN  | REG AIR    >
          1 |      2132 |      4633 |            4 |       28.0 |        28955.64 |       0.09 |  0.06 | N            | O            | 1996-04-21 | 1996-03-30   | 1996-05-16    | NONE              | AIR        >
          1 |     24027 |      1534 |            5 |       24.0 |        22824.48 |        0.1 |  0.04 | N            | O            | 1996-03-30 | 1996-03-14   | 1996-04-01    | NONE              | FOB        >
          1 |     15635 |       638 |            6 |       32.0 |        49620.16 |       0.07 |  0.02 | N            | O            | 1996-01-30 | 1996-02-07   | 1996-02-03    | DELIVER IN PERSON | MAIL       >
          2 |    106170 |      1191 |            1 |       38.0 |        44694.46 |        0.0 |  0.05 | N            | O            | 1997-01-28 | 1997-01-14   | 1997-02-02    | TAKE BACK RETURN  | RAIL       >
          3 |      4297 |      1798 |            1 |       45.0 |        54058.05 |       0.06 |   0.0 | R            | F            | 1994-02-02 | 1994-01-04   | 1994-02-23    | NONE              | AIR        >
          3 |     19036 |      6540 |            2 |       49.0 |        46796.47 |        0.1 |   0.0 | R            | F            | 1993-11-09 | 1993-12-20   | 1993-11-24    | TAKE BACK RETURN  | RAIL       >
          3 |    128449 |      3474 |            3 |       27.0 |        39890.88 |       0.06 |  0.07 | A            | F            | 1994-01-16 | 1993-11-22   | 1994-01-23    | DELIVER IN PERSON | SHIP       >
(10 rows)
Query 20251119_002325_00013_gdtfy, FINISHED, 1 node
Splits: 1 total, 0 done (0.00%)
[Latency: client-side: 120ms, server-side: 49ms] [1.01K rows, 398KB] [20.6K rows/s, 7.94MB/s]

I also added test coverage to the following components:

  • ClientTypeManager
  • BinaryResultDeserializer

And some E2E test coverage using both the binary serialized and JSON-serialized response formats

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • [] CI passed.

Release Notes

General changes
* Add client support for binary serialized responses

Differential Revision: D87268210

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 19, 2025

Reviewer's Guide

This PR adds support in the Java Presto client for requesting and processing serialized (binary) query results from the server by introducing a new binaryResults flag, updating StatementClientV1 to send the new query parameter and decode Base64-encoded pages using newly added ClientTypeManager and BinaryDataDeserializer, and adds corresponding tests and build dependencies.

Sequence diagram for requesting and processing serialized results in StatementClientV1

sequenceDiagram
    participant "Client (StatementClientV1)"
    participant "Presto Server"
    participant "BinaryDataDeserializer"
    "Client (StatementClientV1)"->>"Presto Server": POST /v1/statement?binaryResults=true
    "Presto Server"-->>"Client (StatementClientV1)": Response with binaryData (Base64 pages)
    "Client (StatementClientV1)"->>"BinaryDataDeserializer": deserialize(columns, binaryData)
    "BinaryDataDeserializer"-->>"Client (StatementClientV1)": Iterable<List<Object>> (decoded rows)
    "Client (StatementClientV1)"->>"Client (StatementClientV1)": set processedResults
Loading

Class diagram for new and updated Presto client types (serialized response support)

classDiagram
    class StatementClientV1 {
        - boolean binaryResults
        - BinaryDataDeserializer binaryDataDeserializer
        + StatementClientV1(...)
        + deserializeBinaryResults(QueryResults): QueryResults
    }
    class ClientSession {
        - boolean binaryResults
        + isBinaryResults(): boolean
    }
    class Builder {
        - boolean binaryResults
        + withBinaryResults(boolean): Builder
    }
    class BinaryDataDeserializer {
        - PagesSerde pagesSerde
        - TypeManager typeManager
        - SqlFunctionProperties sqlFunctionProperties
        + BinaryDataDeserializer(BlockEncodingSerde, TypeManager, ClientSession)
        + deserialize(List<Column>, Iterable<String>): Iterable<List<Object>>
    }
    class ClientTypeManager {
        - Map<TypeSignature, Type> typeCache
        + getType(TypeSignature): Type
        + getParameterizedType(String, List<TypeSignatureParameter>): Type
        + hasType(TypeSignature): boolean
    }
    class QueryParameters {
        + BINARY_RESULTS: String
    }
    StatementClientV1 --> BinaryDataDeserializer
    StatementClientV1 --> ClientSession
    BinaryDataDeserializer --> ClientTypeManager
    BinaryDataDeserializer --> ClientSession
    ClientSession <|-- Builder
Loading

File-Level Changes

Change Details Files
Expose a serialized response option and propagate it through client configuration
  • Add binaryResults field, getter, and builder methods to ClientSession
  • Add --serialized option in ClientOptions and map it to binaryResults
  • Pass binaryResults flag in JDBC PrestoConnection and other call sites
  • Update CLI and plugin tests to include serialized flag defaults
presto-client/src/main/java/com/facebook/presto/client/ClientSession.java
presto-cli/src/main/java/com/facebook/presto/cli/ClientOptions.java
presto-cli/src/test/java/com/facebook/presto/cli/TestClientOptions.java
presto-jdbc/src/main/java/com/facebook/presto/jdbc/PrestoConnection.java
presto-plan-checker-router-plugin/src/main/java/com/facebook/presto/router/scheduler/PlanCheckerRouterPluginPrestoClient.java
presto-benchmark-driver/src/main/java/com/facebook/presto/benchmark/driver/BenchmarkDriverOptions.java
presto-cli/src/test/java/com/facebook/presto/cli/AbstractCliTest.java
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestingPrestoClient.java
presto-tests/src/test/java/com/facebook/presto/execution/TestFinalQueryInfo.java
Enhance StatementClientV1 to request and process binary results
  • Add BLOCK_ENCODING_MANAGER and BinaryDataDeserializer fields
  • Append binaryResults query parameter when building /v1/statement URL
  • Wrap QueryResults in deserializeBinaryResults when binaryResults is set
  • Implement deserializeBinaryResults to decode Base64 pages into rows
presto-client/src/main/java/com/facebook/presto/client/StatementClientV1.java
presto-client/src/main/java/com/facebook/presto/client/QueryParameters.java
presto-client/src/test/java/com/facebook/presto/client/TestQueryResults.java
Introduce binary deserialization framework with type resolution
  • Implement ClientTypeManager to map TypeSignatures to Type instances
  • Implement BinaryDataDeserializer to Base64-decode pages and convert to row lists
  • Wire block encoding serde and SqlFunctionProperties into deserializer
  • Add QueryParameters constant for binaryResults key
presto-client/src/main/java/com/facebook/presto/client/ClientTypeManager.java
presto-client/src/main/java/com/facebook/presto/client/BinaryDataDeserializer.java
presto-client/src/main/java/com/facebook/presto/client/QueryParameters.java
Add comprehensive unit and integration tests and update dependencies
  • Add TestClientTypeManager and TestBinaryDataDeserializer for core logic
  • Extend TestQueryResults and TestClientOptions to cover binary paths
  • Add end-to-end TestPrestoClient in presto-tests
  • Include io.airlift:slice and test JAR dependency in pom.xml
presto-client/pom.xml
presto-client/src/test/java/com/facebook/presto/client/TestClientTypeManager.java
presto-client/src/test/java/com/facebook/presto/client/TestBinaryDataDeserializer.java
presto-tests/src/test/java/com/facebook/presto/execution/TestPrestoClient.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `presto-client/src/main/java/com/facebook/presto/client/BinaryDataDeserializer.java:69-79` </location>
<code_context>
+        List<Type> columnTypes = extractTypesFromColumns(columns);
+        ImmutableList.Builder<List<Object>> allRows = ImmutableList.builder();
+
+        for (String encodedPage : binaryData) {
+            byte[] pageBytes = Base64.getDecoder().decode(encodedPage);
+            Slice slice = Slices.wrappedBuffer(pageBytes);
+
+            BasicSliceInput sliceInput = slice.getInput();
+            SerializedPage serializedPage = readSerializedPage(sliceInput);
+
+            Page page = pagesSerde.deserialize(serializedPage);
+
+            allRows.addAll(convertPageToRows(page, columnTypes));
+        }
+
</code_context>

<issue_to_address>
**suggestion:** Consider handling invalid Base64 input more gracefully.

Catching the exception and including the page identifier in the error message will improve error reporting and debugging.

```suggestion
        for (String encodedPage : binaryData) {
            byte[] pageBytes;
            try {
                pageBytes = Base64.getDecoder().decode(encodedPage);
            }
            catch (IllegalArgumentException e) {
                throw new IllegalArgumentException("Failed to decode Base64 for page: " + encodedPage, e);
            }
            Slice slice = Slices.wrappedBuffer(pageBytes);

            BasicSliceInput sliceInput = slice.getInput();
            SerializedPage serializedPage = readSerializedPage(sliceInput);

            Page page = pagesSerde.deserialize(serializedPage);

            allRows.addAll(convertPageToRows(page, columnTypes));
        }
```
</issue_to_address>

### Comment 2
<location> `presto-client/src/main/java/com/facebook/presto/client/BinaryDataDeserializer.java:111-122` </location>
<code_context>
+    {
+        ImmutableList.Builder<Type> types = ImmutableList.builder();
+
+        for (Column column : columns) {
+            ClientTypeSignature clientTypeSignature = column.getTypeSignature();
+            TypeSignature typeSignature = convertClientTypeSignatureToTypeSignature(clientTypeSignature);
+            Type type = typeManager.getType(typeSignature);
+
+            if (type == null) {
+                throw new IllegalArgumentException("Unknown type: " + typeSignature);
+            }
</code_context>

<issue_to_address>
**suggestion:** Consider logging or reporting unknown type errors with column context.

Including the column name or index in the exception message will help identify which column caused the error.

```suggestion
    private List<Type> extractTypesFromColumns(List<Column> columns)
    {
        ImmutableList.Builder<Type> types = ImmutableList.builder();

        for (int i = 0; i < columns.size(); i++) {
            Column column = columns.get(i);
            ClientTypeSignature clientTypeSignature = column.getTypeSignature();
            TypeSignature typeSignature = convertClientTypeSignatureToTypeSignature(clientTypeSignature);
            Type type = typeManager.getType(typeSignature);

            if (type == null) {
                String columnName = column.getName();
                throw new IllegalArgumentException(
                    String.format("Unknown type: %s for column '%s' (index %d)", typeSignature, columnName, i)
                );
            }
```
</issue_to_address>

### Comment 3
<location> `presto-client/src/test/java/com/facebook/presto/client/TestQueryResults.java:66-77` </location>
<code_context>
         assertTrue(console.clientOptions.toClientSession().validateNextUriSource());
     }
+
+    @Test
+    public void testSerializedFormat()
+    {
</code_context>

<issue_to_address>
**suggestion (testing):** Binary data deserialization test does not verify actual deserialization of data.

Please add a test that exercises the complete deserialization process to confirm that binary data is correctly converted into rows and values.

```suggestion
    @Test
    public void testBinaryDataDeserialization()
    {
        String jsonWithBinaryData = "{\n" +
                "  \"id\" : \"test_binary_query\",\n" +
                "  \"infoUri\" : \"http://localhost:8080/query.html?test_binary_query\",\n" +
                "  \"binaryData\" : [ \"YmluYXJ5X2RhdGFfMQ==\", \"YmluYXJ5X2RhdGFfMg==\" ],\n" +
                "  \"stats\" : {\n" +
                "    \"state\" : \"FINISHED\",\n" +
                "    \"queued\" : false,\n" +
                "    \"scheduled\" : false,\n" +
                "    \"nodes\" : 0\n" +
                "  }\n" +
                "}";

        QueryResults results = QUERY_RESULTS_CODEC.fromJson(jsonWithBinaryData);
        assertEquals(results.getId(), "test_binary_query");
        assertNotNull(results.getBinaryData());
        assertEquals(results.getBinaryData().size(), 2);

        // Verify actual deserialization of binary data
        byte[] expected1 = "binary_data_1".getBytes();
        byte[] expected2 = "binary_data_2".getBytes();

        assertArrayEquals(expected1, results.getBinaryData().get(0));
        assertArrayEquals(expected2, results.getBinaryData().get(1));
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +111 to +122
private List<Type> extractTypesFromColumns(List<Column> columns)
{
ImmutableList.Builder<Type> types = ImmutableList.builder();

for (Column column : columns) {
ClientTypeSignature clientTypeSignature = column.getTypeSignature();
TypeSignature typeSignature = convertClientTypeSignatureToTypeSignature(clientTypeSignature);
Type type = typeManager.getType(typeSignature);

if (type == null) {
throw new IllegalArgumentException("Unknown type: " + typeSignature);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider logging or reporting unknown type errors with column context.

Including the column name or index in the exception message will help identify which column caused the error.

Suggested change
private List<Type> extractTypesFromColumns(List<Column> columns)
{
ImmutableList.Builder<Type> types = ImmutableList.builder();
for (Column column : columns) {
ClientTypeSignature clientTypeSignature = column.getTypeSignature();
TypeSignature typeSignature = convertClientTypeSignatureToTypeSignature(clientTypeSignature);
Type type = typeManager.getType(typeSignature);
if (type == null) {
throw new IllegalArgumentException("Unknown type: " + typeSignature);
}
private List<Type> extractTypesFromColumns(List<Column> columns)
{
ImmutableList.Builder<Type> types = ImmutableList.builder();
for (int i = 0; i < columns.size(); i++) {
Column column = columns.get(i);
ClientTypeSignature clientTypeSignature = column.getTypeSignature();
TypeSignature typeSignature = convertClientTypeSignatureToTypeSignature(clientTypeSignature);
Type type = typeManager.getType(typeSignature);
if (type == null) {
String columnName = column.getName();
throw new IllegalArgumentException(
String.format("Unknown type: %s for column '%s' (index %d)", typeSignature, columnName, i)
);
}

@hdikeman hdikeman changed the title add serialized response format option to Presto client feat(client): add serialized response format option to Presto client Nov 19, 2025
@hdikeman hdikeman changed the title feat(client): add serialized response format option to Presto client feat(client): Add serialized response format option to Presto client Nov 19, 2025
hdikeman added a commit to hdikeman/presto that referenced this pull request Nov 19, 2025
…restodb#26654)

Summary:

The capability exists for Presto server to return serialized responses, but is currently not enabled in the client nor piped through to the CLI. As far as I know, there is no OSS client which can read using the serialized interface. It makes it kind of annoying to test out any changes in this area

Adding support in the client to request and process serialized responses from the server

To support the new client processing, I created a TypeManager and binary deserializer for use in the client

NOTE: right now I only handle the types located in presto-common on the serialized path; there are some additional types declared in presto-main-base which the client does not have access to. Those could either be moved or redefined in presto-client, but since serialized APIs are opt-in and fails fast, it seemed acceptable to defer support for those types

Differential Revision: D87268210
@tdcmeehan
Copy link
Contributor

When the binary response field was added, it was deliberately excluded from clients because this wasn't intended to be a public API. If we want to expose a binary API to users, I would prefer for it to be well-known format and something we don't support, such as Arrow. Given the motivation for this PR is for testing, I'm wondering if using the REST API directly would satisfy that need, or perhaps enabling binary results only in the low-level Presto client, but not in the user-facing CLI.

…restodb#26654)

Summary:

The capability exists for Presto server to return serialized responses, but is currently not enabled in the client nor piped through to the CLI. As far as I know, there is no OSS client which can read using the serialized interface. It makes it kind of annoying to test out any changes in this area

Adding support in the client to request and process serialized responses from the server

To support the new client processing, I created a TypeManager and binary deserializer for use in the client

NOTE: right now I only handle the types located in presto-common on the serialized path; there are some additional types declared in presto-main-base which the client does not have access to. Those could either be moved or redefined in presto-client, but since serialized APIs are opt-in and fails fast, it seemed acceptable to defer support for those types

Differential Revision: D87268210
@hdikeman
Copy link
Contributor Author

When the binary response field was added, it was deliberately excluded from clients because this wasn't intended to be a public API. If we want to expose a binary API to users, I would prefer for it to be well-known format and something we don't support, such as Arrow.

@tdcmeehan I wasn't aware of that motivation, thanks. I was wondering why the server would make this option available but there were no available open source clients to use it. What was the rationale behind not making this a public API?

Given the motivation for this PR is for testing, I'm wondering if using the REST API directly would satisfy that need, or perhaps enabling binary results only in the low-level Presto client, but not in the user-facing CLI.

Given how widely used the binary response formatting is internally, I was hoping for more coverage/support in the Presto repository generally (for the Presto verifier as well, though I suppose I just need serialized support in the StatementClient for that, not the CLI)

My preference is still to have some way to enable this via the CLI if possible. We could hide this CLI option if we want to discourage usage (https://github.com/airlift/airline/blob/master/src/main/java/io/airlift/airline/Option.java) but perhaps that's bad practice

@tdcmeehan
Copy link
Contributor

tdcmeehan commented Nov 19, 2025

@tdcmeehan I wasn't aware of that motivation, thanks. I was wondering why the server would make this option available but there were no available open source clients to use it. What was the rationale behind not making this a public API?

It's simply because the motivation behind the binary results flag was for ease of fuzzer testing. C++ code doesn't need to parse JSON data structures, it can just treat the binary blob as a Presto Page. A real binary protocol would not be designed like that. I expect:

  1. Metadata would be decoupled from data paths in the protocol to enable better parallelism (for example, by spooling, or by retrieving results from the worker instead of coordinator)
  2. It would not be marshalled into JSON as it is now, it would be a real binary protocol with no unmarshalling required on the data path.
  3. It would be far more ideal to use a standard such as Arrow for broader ecosystem compatibility, rather than treating Presto's Page format as a public API.

Given how widely used the binary response formatting is internally, I was hoping for more coverage/support in the Presto repository generally (for the Presto verifier as well, though I suppose I just need serialized support in the StatementClient for that, not the CLI)

Can you share how you are using it internally, and why you need it? It will help me understand better.

@hdikeman
Copy link
Contributor Author

hdikeman commented Nov 19, 2025

Can you share how you are using it internally, and why you need it? It will help me understand better.

Several internal clients are capable of requesting and processing binary-serialized pages. Typically, this is enabled for performance reasons, since PrestoPage -> JSON serialization can bottleneck some query patterns

Typically the client would use Velox deserialization libraries to unpack the binary data into Velox RowVectors and return those vectors to the caller, or translate them into some other structure

It's simply because the motivation behind the binary results flag was for ease of fuzzer testing. C++ code doesn't need to parse JSON data structures, it can just treat the binary blob as a Presto Page. A real binary protocol would not be designed like that.

Perhaps you have context whether others are using this serialized interface similarly (outside Meta). I agree for the most part with the points you're making about the shortcomings of the binary protocol as currently written

For our part, there are some important workloads which are depending on the correctness of the existing binary interface and would like this existing interface better supported/covered by automation. That being said I take your point if our intention is to deter public usage of the existing serialized interface

What do you think about registering this as a hidden CLI option? (https://github.com/airlift/airline/blob/master/src/main/java/io/airlift/airline/Option.java)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants