fix(plugin-arrow): Handle restricted output columns in Arrow Page Source #26175

elbinpallimalilibm · 2025-09-29T01:53:41Z

Restriced columns need to be handled when using table valued functions in query.

Description

When using table valued functions (TVF) in queries like given below

SELECT id from TABLE(system.query_function(
              'SELECT name, id FROM tpch.member WHERE id = 1', 
              'name VARCHAR, id INTEGER'))

ArrowPageSource can fail to map the correct FieldVector for the column. This happens because a TVF like above executes the query natively in Flight server and gives a result with two columns but ArrowPageSource is expected to only return results for 1 column, ie id from above query. The column name is restricted from the TVF result

Motivation and Context

This change is required when using TVF against a catalog based on presto-base-arrow-flight module.

Impact

This fixes problems when using TVF in a way given above against Arrow Flight based catalogs.

Test Plan

New unit test added that will test this change. This PR also includes an implementation of TVF called query_function that will test this change.

This change is backward compatible, so existing test cases will also pass with this change.

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

sourcery-ai · 2025-09-29T01:53:47Z

Reviewer's Guide

This PR enhances ArrowPageSource to correctly map columns when using TVFs with restricted output by switching to name-based vector lookup, and introduces a testing framework for the new query_function TVF—including handler, provider, metadata, connector classes, and unit tests.

ER diagram for TVF output column mapping in ArrowPageSource

erDiagram
    FLIGHT_RESULT {
        string name
        int id
    }
    TVF_OUTPUT {
        int id
    }
    FLIGHT_RESULT ||--o| TVF_OUTPUT : restricts
    TVF_OUTPUT {
        int id
    }

Class diagram for ArrowPageSource column mapping update

classDiagram
    class ArrowPageSource {
        +getNextPage()
        -getVectorByColumnName(vectors: List<FieldVector>, name: String): FieldVector
        +close()
    }
    ArrowPageSource --> ArrowBlockBuilder
    ArrowPageSource --> FlightStreamAndClient
    ArrowPageSource --> FieldVector
    ArrowPageSource --> ArrowException

Class diagram for new TVF testing framework classes

classDiagram
    class QueryFunctionProvider {
    }
    class TestingArrowConnector {
    }
    class TestingArrowMetadata {
    }
    class TestingQueryArrowTableHandle {
    }
    class PrimitiveToPrestoTypeMappings {
    }
    QueryFunctionProvider --> TestingArrowConnector
    TestingArrowConnector --> TestingArrowMetadata
    TestingArrowMetadata --> TestingQueryArrowTableHandle
    TestingArrowConnector --> PrimitiveToPrestoTypeMappings

File-Level Changes

Change	Details	Files
Switch vector lookup from index to name-based to handle restricted columns	Add getVectorByColumnName method for name-based retrieval Modify getNextPage to use getVectorByColumnName instead of positional index Import ARROW_INTERNAL_ERROR and throw when a column name is missing	`src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java`
Add query_function TVF provider parsing arguments and producing table handles	Implement QueryFunctionProvider with analyze() building TestingQueryArrowTableHandle Define QueryFunction and QueryFunctionHandle classes Parse DATATYPES argument via regex to create ArrowColumnHandle list	`src/test/java/com/facebook/plugin/arrow/testingConnector/tvf/QueryFunctionProvider.java`
Introduce testing-specific connector components to support TVFs	Update TestingArrowModule to bind ConnectorTableFunction and ArrowConnector Add TestingArrowMetadata overriding table function application and metadata methods Create TestingQueryArrowTableHandle for encapsulating query and columns Implement TestingArrowConnector exposing table functions Add PrimitiveToPrestoTypeMappings utility	`src/test/java/com/facebook/plugin/arrow/testingConnector/TestingArrowModule.java` `src/test/java/com/facebook/plugin/arrow/testingConnector/TestingArrowMetadata.java` `src/test/java/com/facebook/plugin/arrow/testingConnector/TestingQueryArrowTableHandle.java` `src/test/java/com/facebook/plugin/arrow/testingConnector/TestingArrowConnector.java` `src/test/java/com/facebook/plugin/arrow/testingConnector/PrimitiveToPrestoTypeMappings.java`
Update flight client handler and request codec to support query-function handles	Branch in TestingArrowFlightClientHandler for TestingQueryArrowTableHandle to extract raw query Allow nullable schema and table in TestingArrowFlightRequest.createQueryRequest	`src/test/java/com/facebook/plugin/arrow/testingConnector/TestingArrowFlightClientHandler.java` `src/test/java/com/facebook/plugin/arrow/testingServer/TestingArrowFlightRequest.java`
Add unit tests for restricted and unrestricted TVF column selection	Add testQueryFunctionWithRestrictedColumns in TestArrowFlightNativeQueries and TestArrowFlightQueries Add testQueryFunctionWithoutRestrictedColumns in both test classes	`src/test/java/com/facebook/plugin/arrow/TestArrowFlightNativeQueries.java` `src/test/java/com/facebook/plugin/arrow/TestArrowFlightQueries.java`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

elbinpallimalilibm · 2025-09-29T06:44:50Z

@sourcery-ai dismiss
@sourcery-ai review

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

In ArrowPageSource#getVectorIndexForColumnHandleIndex, consider building and caching a mapping from column handle to vector index once per split instead of scanning the list on every page to simplify the logic and improve performance.
The logger.debug call in the ArrowPageSource constructor uses a %s placeholder but the Airlift logger expects {} syntax; either correct the placeholder or remove this debug statement if it’s no longer needed.
The regex-based type parsing in QueryFunctionProvider.extractColumnParameters is brittle for more complex SQL types; consider leveraging Presto’s built-in SQL parser or a more robust parsing approach to support a wider range of data types.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In ArrowPageSource#getVectorIndexForColumnHandleIndex, consider building and caching a mapping from column handle to vector index once per split instead of scanning the list on every page to simplify the logic and improve performance.
- The logger.debug call in the ArrowPageSource constructor uses a `%s` placeholder but the Airlift logger expects `{}` syntax; either correct the placeholder or remove this debug statement if it’s no longer needed.
- The regex-based type parsing in QueryFunctionProvider.extractColumnParameters is brittle for more complex SQL types; consider leveraging Presto’s built-in SQL parser or a more robust parsing approach to support a wider range of data types.

## Individual Comments

### Comment 1
<location> `presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java:57-58` </location>
<code_context>
         this.arrowBlockBuilder = requireNonNull(arrowBlockBuilder, "arrowBlockBuilder is null");
         this.flightStreamAndClient = clientHandler.getFlightStream(connectorSession, split);
+        this.columnHandlesInSplit = split.getColumns();
+        logger.debug("columnHandlesInSplit %s", columnHandlesInSplit);
     }

</code_context>

<issue_to_address>
**suggestion:** Consider removing or adjusting debug logging for columnHandlesInSplit.

Debug logging in production may cause excessive logs or expose sensitive data. Use a less verbose log level or remove before merging if not needed.

```suggestion
        this.columnHandlesInSplit = split.getColumns();
```
</issue_to_address>

### Comment 2
<location> `presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java:147` </location>
<code_context>
+                vectorIndex = index.getAsInt();
+            }
+            else {
+                throw new ArrowException(ARROW_INTERNAL_ERROR, "Unable to find column " + columnHandles.get(columnHandleIndex).getColumnName() + " in the column handles given in split");
+            }
+        }
</code_context>

<issue_to_address>
**suggestion:** Consider including more context in the error message for missing column.

Including the available columns in the error message will make it easier to identify the issue during debugging.

```suggestion
                String availableColumns = columnHandlesInSplit.get().stream()
                        .map(handle -> handle.getColumnName())
                        .collect(Collectors.joining(", "));
                throw new ArrowException(
                        ARROW_INTERNAL_ERROR,
                        "Unable to find column " + columnHandles.get(columnHandleIndex).getColumnName() +
                        " in the column handles given in split. Available columns: [" + availableColumns + "]"
                );
```
</issue_to_address>

### Comment 3
<location> `presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowSplit.java:34` </location>
<code_context>
     private final String schemaName;
     private final String tableName;
     private final byte[] flightEndpointBytes;
+    private final Optional<List<ArrowColumnHandle>> columns;

     @JsonCreator
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Consider validating that columns Optional is not null in ArrowSplit constructor.

Passing null for columns may cause a NullPointerException. Add a requireNonNull check in the constructor to ensure columns is not null.

Suggested implementation:

```java
import java.util.Collections;
import java.util.List;
import java.util.Optional;
import java.util.Objects;

```

```java
    {
        this.schemaName = schemaName;
        this.tableName = tableName;
        this.flightEndpointBytes = flightEndpointBytes;
        this.columns = Objects.requireNonNull(columns, "columns is null");

```
</issue_to_address>

### Comment 4
<location> `presto-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightQueriesWithTVF.java:38` </location>
<code_context>
+    @Test
+    public void testQueryFunction()
+    {
+        MaterializedResult actualRow = computeActual("SELECT id from TABLE(system.query_function('SELECT name, id FROM tpch.member WHERE id = 1', 'name VARCHAR, id INTEGER'))");
+        MaterializedResult expectedRow = resultBuilder(getSession(), INTEGER)
+                .row(1)
</code_context>

<issue_to_address>
**suggestion (testing):** Test does not cover error handling for mismatched column names or types.

Add tests for cases where the TVF signature's column names or types do not match the query result, such as missing columns or incorrect types, to verify proper error handling.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowSplit.java

elbinpallimalilibm · 2025-09-29T10:33:28Z

@sourcery-ai dismiss
@sourcery-ai review

elbinpallimalilibm · 2025-09-29T12:34:41Z

@sourcery-ai dismiss

elbinpallimalilibm · 2025-09-29T13:27:10Z

@sourcery-ai review

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Consider replacing Optional<List> in ArrowSplit with an empty-list default to simplify the vector‐index mapping logic and avoid pervasive Optional checks.
The regex-based parsing of the DATATYPES string in QueryFunctionProvider is brittle; it would be more robust to leverage Presto’s SQL parser or a dedicated type‐declaration parser.
In ArrowPageSource, add an explicit sanity check that the split’s column list size matches the incoming FieldVector count so any mismatches are caught with a clear error early on.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider replacing Optional<List<ArrowColumnHandle>> in ArrowSplit with an empty-list default to simplify the vector‐index mapping logic and avoid pervasive Optional checks.
- The regex-based parsing of the DATATYPES string in QueryFunctionProvider is brittle; it would be more robust to leverage Presto’s SQL parser or a dedicated type‐declaration parser.
- In ArrowPageSource, add an explicit sanity check that the split’s column list size matches the incoming FieldVector count so any mismatches are caught with a clear error early on.

## Individual Comments

### Comment 1
<location> `presto-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightQueriesWithTVF.java:35-36` </location>
<code_context>
+        return ArrowFlightQueryRunner.createQueryRunner(serverPort, true);
+    }
+
+    @Test
+    public void testQueryFunction()
+    {
+        MaterializedResult actualRow = computeActual("SELECT id from TABLE(system.query_function('SELECT name, id FROM tpch.member WHERE id = 1', 'name VARCHAR, id INTEGER'))");
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding negative and edge case tests for the query_function TVF.

Please add tests for scenarios like non-existent columns, unsupported data types, malformed SQL, and restricting all columns to improve error handling and edge case coverage.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

elbinpallimalilibm · 2025-09-29T14:03:15Z

@sourcery-ai dismiss

elbinpallimalilibm · 2025-09-29T14:04:17Z

@sourcery-ai review

elbinpallimalilibm · 2025-09-29T14:05:46Z

@sourcery-ai guide

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Consider replacing Optional<List> with a non‐null empty list to simplify the split‐column logic and avoid Optional checks everywhere.
The check that vectors.size() == columnHandlesInSplit.size() may be too strict if the flight stream returns extra metadata or dictionary vectors—consider relaxing this to a >= check or filtering only the data vectors.
You can simplify the index mapping by building a name→index map from columnHandlesInSplit once (e.g. in the constructor) instead of scanning the list for each column in getVectorIndexForColumnHandleIndex.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider replacing Optional<List<ArrowColumnHandle>> with a non‐null empty list to simplify the split‐column logic and avoid Optional checks everywhere.
- The check that vectors.size() == columnHandlesInSplit.size() may be too strict if the flight stream returns extra metadata or dictionary vectors—consider relaxing this to a >= check or filtering only the data vectors.
- You can simplify the index mapping by building a name→index map from columnHandlesInSplit once (e.g. in the constructor) instead of scanning the list for each column in getVectorIndexForColumnHandleIndex.

## Individual Comments

### Comment 1
<location> `presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowSplit.java:34` </location>
<code_context>
     private final String schemaName;
     private final String tableName;
     private final byte[] flightEndpointBytes;
+    private final Optional<List<ArrowColumnHandle>> columns;

     @JsonCreator
</code_context>

<issue_to_address>
**suggestion:** Consider using an immutable list for columns to prevent accidental modification.

Wrapping columns in an immutable collection will enforce read-only access and enhance thread safety.

Suggested implementation:

```java
    // columns is always an immutable (unmodifiable) list if present
    private final Optional<List<ArrowColumnHandle>> columns;

```

```java
        this.schemaName = schemaName;
        this.tableName = tableName;
        this.flightEndpointBytes = flightEndpointBytes;
        this.columns = columns.map(list -> Collections.unmodifiableList(list));

```
</issue_to_address>

### Comment 2
<location> `presto-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowSplit.java:53` </location>
<code_context>
         flightEndpoint = new FlightEndpoint(ticket, location);
         // Instantiate ArrowSplit with mock data
-        arrowSplit = new ArrowSplit(schemaName, tableName, flightEndpoint.serialize().array());
+        arrowSplit = new ArrowSplit(schemaName, tableName, flightEndpoint.serialize().array(), Optional.empty());
     }

</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for ArrowSplit with non-empty columns.

Please add a test with populated columns to verify ArrowSplit handles restricted columns correctly during serialization and deserialization.

```suggestion
    }

    @Test
    public void testArrowSplitWithRestrictedColumns() {
        String schemaName = "test_schema";
        String tableName = "test_table";
        Location location = new Location("http://localhost:8080");
        Ticket ticket = new Ticket("test_ticket".getBytes());
        FlightEndpoint flightEndpoint = new FlightEndpoint(ticket, location);

        List<String> restrictedColumns = List.of("col1", "col2", "col3");
        ArrowSplit arrowSplit = new ArrowSplit(
                schemaName,
                tableName,
                flightEndpoint.serialize().array(),
                Optional.of(restrictedColumns)
        );

        // Serialize and deserialize
        ByteBuffer serialized = arrowSplit.serialize();
        ArrowSplit deserialized = ArrowSplit.deserialize(serialized);

        // Assert restricted columns are preserved
        assertNotNull(deserialized.getRestrictedColumns());
        assertEquals(deserialized.getRestrictedColumns().get(), restrictedColumns);
        assertEquals(deserialized.getSchemaName(), schemaName);
        assertEquals(deserialized.getTableName(), tableName);
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

elbinpallimalilibm

Refactored the tests to compare with results from expected query instead of comparing with explicit values.

elbinpallimalilibm · 2025-10-26T09:14:08Z

...e-execution/presto_cpp/presto_protocol/connector/arrow_flight/presto_protocol_arrow_flight.h

 struct ArrowTableHandle : public ConnectorTableHandle {
  String schema = {};
  String table = {};
+  std::shared_ptr<List<ArrowColumnHandle>> columns = {};


Added tests in TestArrowFlightNativeQueries to test scenarios with restricted columns. Seems the native connector doesn't even need to read these values, since the tests pass successfully.

elbinpallimalilibm

Redesigned the whole PR since only a simple fix was needed in ArrowPageSource. I have kept the sample implementation of TVF so that TVF with Arrow Flight connector can be tested including scenarios where output columns are restricted.

elbinpallimalilibm · 2025-10-26T10:39:38Z

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java


+    private FieldVector getVectorByColumnName(List<FieldVector> vectors, String name)
+    {
+        return vectors.stream().filter(v -> v.getName().equals(name)).findFirst().orElseThrow(() -> new ArrowException(ARROW_INTERNAL_ERROR, "Unable to find field for column " + name + " in the fields returned from Arrow Flight stream"));


Revisited the pull request and identified that only this fix was needed to handle restricted output columns when using TVF.

elbinpallimalilibm · 2025-10-26T10:44:41Z

@SourceryAI guide

elbinpallimalilibm · 2025-10-26T10:44:47Z

@SourceryAI review

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java:116-119` </location>
<code_context>

+    private FieldVector getVectorByColumnName(List<FieldVector> vectors, String name)
+    {
+        return vectors.stream().filter(v -> v.getName().equals(name)).findFirst().orElseThrow(() -> new ArrowException(ARROW_INTERNAL_ERROR, "Unable to find field for column " + name + " in the fields returned from Arrow Flight stream"));
+    }
+
</code_context>

<issue_to_address>
**suggestion:** ArrowException message could include available field names for easier debugging.

Consider adding the available field names to the ArrowException message to help identify missing or misnamed columns.

```suggestion
    private FieldVector getVectorByColumnName(List<FieldVector> vectors, String name)
    {
        String availableFields = vectors.stream()
            .map(FieldVector::getName)
            .reduce((a, b) -> a + ", " + b)
            .orElse("(none)");
        return vectors.stream()
            .filter(v -> v.getName().equals(name))
            .findFirst()
            .orElseThrow(() -> new ArrowException(
                ARROW_INTERNAL_ERROR,
                "Unable to find field for column " + name +
                " in the fields returned from Arrow Flight stream. " +
                "Available fields: " + availableFields
            ));
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

BryanCutler · 2025-10-29T00:41:57Z

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java

        List<FieldVector> vectors = flightStreamAndClient.getRoot().getFieldVectors();
        for (int columnIndex = 0; columnIndex < columnHandles.size(); columnIndex++) {
-            FieldVector vector = vectors.get(columnIndex);
+            FieldVector vector = getVectorByColumnName(vectors, columnHandles.get(columnIndex).getColumnName());


My only concern with doing it this way is that it's a full search over the fields for each column, so if there is a very wide record batch, it's not very efficient. Also this would affect all cases, not just when using TVFs.

Since this does simplify quite a bit, I think it's ok for now. Could you add a note to explain the reason the columns are matched by name and not just by index?

Added comments explaining why fields are looked up by name. BTW, the native arrow flight connector, also looks up the field by column name.

True, I think what you have now is close to what is in the native connector

BryanCutler · 2025-10-29T00:44:24Z

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java


+    private FieldVector getVectorByColumnName(List<FieldVector> vectors, String name)
+    {
+        return vectors.stream().filter(v -> v.getName().equals(name)).findFirst().orElseThrow(() -> new ArrowException(ARROW_INTERNAL_ERROR, "Unable to find field for column " + name + " in the fields returned from Arrow Flight stream"));


I think you could make this a little more efficient. I think stream().filter() will continue iterating even if a match is found. The Arrow VectorSchemaRoot also has a method to lookup a vector by name.

Changed to use the method in VectorSchemaRoot to get the field by name.

...-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightNativeQueries.java

elbinpallimalilibm

Addressed suggestions.

...-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightNativeQueries.java

elbinpallimalilibm · 2025-10-29T02:03:52Z

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java


+    private FieldVector getVectorByColumnName(List<FieldVector> vectors, String name)
+    {
+        return vectors.stream().filter(v -> v.getName().equals(name)).findFirst().orElseThrow(() -> new ArrowException(ARROW_INTERNAL_ERROR, "Unable to find field for column " + name + " in the fields returned from Arrow Flight stream"));


Changed to use the method in VectorSchemaRoot to get the field by name.

elbinpallimalilibm · 2025-10-29T02:04:59Z

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java

        List<FieldVector> vectors = flightStreamAndClient.getRoot().getFieldVectors();
        for (int columnIndex = 0; columnIndex < columnHandles.size(); columnIndex++) {
-            FieldVector vector = vectors.get(columnIndex);
+            FieldVector vector = getVectorByColumnName(vectors, columnHandles.get(columnIndex).getColumnName());


Added comments explaining why fields are looked up by name. BTW, the native arrow flight connector, also looks up the field by column name.

BryanCutler · 2025-10-29T18:23:42Z

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java

+            // In scenarios where the user query contains a Table Valued Function, the output columns could be in a
+            // different order or could be a subset of the columns in the flight stream. So we are fetching the requested
+            // field vector by matching the column name instead of fetching by column index.
+            FieldVector vector = vectorSchemaRoot.getVector(columnHandle.getColumnName());


getVector will return a null if not found, need to check for that

added check

BryanCutler · 2025-10-29T18:26:22Z

presto-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightQueries.java

+    @Test
+    public void testQueryFunctionWithoutRestrictedColumns() throws InterruptedException
+    {
+        assertQuery("SELECT NAME, NATIONKEY FROM TABLE(system.query_function('SELECT NATIONKEY, NAME FROM tpch.nation WHERE NATIONKEY = 4','NATIONKEY BIGINT, NAME VARCHAR'))", "SELECT NAME, NATIONKEY FROM nation WHERE NATIONKEY = 4");


Can you add a test that would reverse the order of the columns? e.g. SELECT NATIONKEY, NAME ...

Also a negative test where the output column is not present in the TVF? like SELECT FOO ..
This would fail during analysis right?

added negative test

BryanCutler

LGTM

tdcmeehan · 2025-11-05T02:48:40Z

Thanks a lot for carrying this through @elbinpallimalilibm. Thanks also @BryanCutler for the detailed and thorough reviews.

prestodb-ci added the from:IBM PR from IBM label Sep 29, 2025

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch 2 times, most recently from f6ce553 to f5f55bc Compare September 29, 2025 03:48

elbinpallimalilibm closed this Sep 29, 2025

elbinpallimalilibm reopened this Sep 29, 2025

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch 2 times, most recently from 0efd5c1 to 9ada46d Compare September 29, 2025 06:43

elbinpallimalilibm commented Sep 29, 2025

View reviewed changes

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java Outdated Show resolved Hide resolved

elbinpallimalilibm changed the title ~~Handle restriced output columns in Arrow Page Source~~ Handle restricted output columns in Arrow Page Source Sep 29, 2025

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from 9ada46d to 2b4de89 Compare September 29, 2025 06:55

elbinpallimalilibm marked this pull request as ready for review September 29, 2025 09:18

elbinpallimalilibm requested a review from a team as a code owner September 29, 2025 09:18

prestodb-ci requested review from a team, auden-woolfson and imsayari404 and removed request for a team September 29, 2025 09:18

sourcery-ai bot reviewed Sep 29, 2025

View reviewed changes

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java Outdated Show resolved Hide resolved

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowSplit.java Outdated Show resolved Hide resolved

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from 2b4de89 to c9efdfc Compare September 29, 2025 10:32

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from c9efdfc to 20e2a14 Compare September 29, 2025 13:26

sourcery-ai bot reviewed Sep 29, 2025

View reviewed changes

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from 20e2a14 to 48a35cb Compare September 29, 2025 14:01

sourcery-ai bot reviewed Sep 29, 2025

View reviewed changes

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from daec4f0 to 96a9bf7 Compare October 25, 2025 00:17

elbinpallimalilibm requested review from BryanCutler and tdcmeehan October 25, 2025 00:20

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch 4 times, most recently from 028f6fc to efc25da Compare October 26, 2025 09:17

elbinpallimalilibm commented Oct 26, 2025

View reviewed changes

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from efc25da to a6bb015 Compare October 26, 2025 10:38

elbinpallimalilibm commented Oct 26, 2025

View reviewed changes

sourcery-ai bot reviewed Oct 26, 2025

View reviewed changes

BryanCutler requested changes Oct 29, 2025

View reviewed changes

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from a6bb015 to 9bea9e7 Compare October 29, 2025 02:00

elbinpallimalilibm commented Oct 29, 2025

View reviewed changes

elbinpallimalilibm requested a review from BryanCutler October 29, 2025 02:05

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch 2 times, most recently from 6fd82cf to 1b01b32 Compare October 29, 2025 15:54

BryanCutler requested changes Oct 29, 2025

View reviewed changes

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from 1b01b32 to 46ef954 Compare October 30, 2025 02:54

fix: Handle restricted output columns in Arrow Page Source

934eac8

elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from 46ef954 to 934eac8 Compare October 30, 2025 05:54

elbinpallimalilibm requested a review from BryanCutler October 30, 2025 16:32

BryanCutler approved these changes Oct 30, 2025

View reviewed changes

tdcmeehan approved these changes Nov 5, 2025

View reviewed changes

tdcmeehan merged commit 7c80949 into prestodb:master Nov 5, 2025
80 checks passed

elbinpallimalilibm deleted the arrow_handle_restriced_columns branch November 5, 2025 02:51

unidevel mentioned this pull request Nov 14, 2025

Add release notes for 0.296 unix280/presto#47

Merged

15 tasks

fix(plugin-arrow): Handle restricted output columns in Arrow Page Source #26175

fix(plugin-arrow): Handle restricted output columns in Arrow Page Source #26175

Uh oh!

Conversation

elbinpallimalilibm commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Uh oh!

sourcery-ai bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

ER diagram for TVF output column mapping in ArrowPageSource

Class diagram for ArrowPageSource column mapping update

Class diagram for new TVF testing framework classes

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

elbinpallimalilibm commented Sep 29, 2025

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

elbinpallimalilibm commented Sep 29, 2025

Uh oh!

elbinpallimalilibm commented Sep 29, 2025

Uh oh!

elbinpallimalilibm commented Sep 29, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

elbinpallimalilibm commented Sep 29, 2025

Uh oh!

elbinpallimalilibm commented Sep 29, 2025

Uh oh!

elbinpallimalilibm commented Sep 29, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

elbinpallimalilibm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elbinpallimalilibm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elbinpallimalilibm commented Oct 26, 2025

Uh oh!

elbinpallimalilibm commented Oct 26, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BryanCutler Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elbinpallimalilibm commented Sep 29, 2025 •

edited

Loading

sourcery-ai bot commented Sep 29, 2025 •

edited

Loading

BryanCutler Oct 29, 2025 •

edited

Loading