Skip to content

Conversation

@elbinpallimalilibm
Copy link
Contributor

@elbinpallimalilibm elbinpallimalilibm commented Sep 29, 2025

Restriced columns need to be handled when using table valued functions in query.

Description

When using table valued functions (TVF) in queries like given below

SELECT id from TABLE(system.query_function(
              'SELECT name, id FROM tpch.member WHERE id = 1', 
              'name VARCHAR, id INTEGER'))

ArrowPageSource can fail to map the correct FieldVector for the column. This happens because a TVF like above executes the query natively in Flight server and gives a result with two columns but ArrowPageSource is expected to only return results for 1 column, ie id from above query. The column name is restricted from the TVF result

Motivation and Context

This change is required when using TVF against a catalog based on presto-base-arrow-flight module.

Impact

This fixes problems when using TVF in a way given above against Arrow Flight based catalogs.

Test Plan

New unit test added that will test this change. This PR also includes an implementation of TVF called query_function that will test this change.

This change is backward compatible, so existing test cases will also pass with this change.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Sep 29, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Sep 29, 2025

Reviewer's Guide

This PR enhances ArrowPageSource to correctly map columns when using TVFs with restricted output by switching to name-based vector lookup, and introduces a testing framework for the new query_function TVF—including handler, provider, metadata, connector classes, and unit tests.

ER diagram for TVF output column mapping in ArrowPageSource

erDiagram
    FLIGHT_RESULT {
        string name
        int id
    }
    TVF_OUTPUT {
        int id
    }
    FLIGHT_RESULT ||--o| TVF_OUTPUT : restricts
    TVF_OUTPUT {
        int id
    }
Loading

Class diagram for ArrowPageSource column mapping update

classDiagram
    class ArrowPageSource {
        +getNextPage()
        -getVectorByColumnName(vectors: List<FieldVector>, name: String): FieldVector
        +close()
    }
    ArrowPageSource --> ArrowBlockBuilder
    ArrowPageSource --> FlightStreamAndClient
    ArrowPageSource --> FieldVector
    ArrowPageSource --> ArrowException
Loading

Class diagram for new TVF testing framework classes

classDiagram
    class QueryFunctionProvider {
    }
    class TestingArrowConnector {
    }
    class TestingArrowMetadata {
    }
    class TestingQueryArrowTableHandle {
    }
    class PrimitiveToPrestoTypeMappings {
    }
    QueryFunctionProvider --> TestingArrowConnector
    TestingArrowConnector --> TestingArrowMetadata
    TestingArrowMetadata --> TestingQueryArrowTableHandle
    TestingArrowConnector --> PrimitiveToPrestoTypeMappings
Loading

File-Level Changes

Change Details Files
Switch vector lookup from index to name-based to handle restricted columns
  • Add getVectorByColumnName method for name-based retrieval
  • Modify getNextPage to use getVectorByColumnName instead of positional index
  • Import ARROW_INTERNAL_ERROR and throw when a column name is missing
src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java
Add query_function TVF provider parsing arguments and producing table handles
  • Implement QueryFunctionProvider with analyze() building TestingQueryArrowTableHandle
  • Define QueryFunction and QueryFunctionHandle classes
  • Parse DATATYPES argument via regex to create ArrowColumnHandle list
src/test/java/com/facebook/plugin/arrow/testingConnector/tvf/QueryFunctionProvider.java
Introduce testing-specific connector components to support TVFs
  • Update TestingArrowModule to bind ConnectorTableFunction and ArrowConnector
  • Add TestingArrowMetadata overriding table function application and metadata methods
  • Create TestingQueryArrowTableHandle for encapsulating query and columns
  • Implement TestingArrowConnector exposing table functions
  • Add PrimitiveToPrestoTypeMappings utility
src/test/java/com/facebook/plugin/arrow/testingConnector/TestingArrowModule.java
src/test/java/com/facebook/plugin/arrow/testingConnector/TestingArrowMetadata.java
src/test/java/com/facebook/plugin/arrow/testingConnector/TestingQueryArrowTableHandle.java
src/test/java/com/facebook/plugin/arrow/testingConnector/TestingArrowConnector.java
src/test/java/com/facebook/plugin/arrow/testingConnector/PrimitiveToPrestoTypeMappings.java
Update flight client handler and request codec to support query-function handles
  • Branch in TestingArrowFlightClientHandler for TestingQueryArrowTableHandle to extract raw query
  • Allow nullable schema and table in TestingArrowFlightRequest.createQueryRequest
src/test/java/com/facebook/plugin/arrow/testingConnector/TestingArrowFlightClientHandler.java
src/test/java/com/facebook/plugin/arrow/testingServer/TestingArrowFlightRequest.java
Add unit tests for restricted and unrestricted TVF column selection
  • Add testQueryFunctionWithRestrictedColumns in TestArrowFlightNativeQueries and TestArrowFlightQueries
  • Add testQueryFunctionWithoutRestrictedColumns in both test classes
src/test/java/com/facebook/plugin/arrow/TestArrowFlightNativeQueries.java
src/test/java/com/facebook/plugin/arrow/TestArrowFlightQueries.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch 2 times, most recently from f6ce553 to f5f55bc Compare September 29, 2025 03:48
@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch 2 times, most recently from 0efd5c1 to 9ada46d Compare September 29, 2025 06:43
@elbinpallimalilibm
Copy link
Contributor Author

@sourcery-ai dismiss
@sourcery-ai review

@elbinpallimalilibm elbinpallimalilibm changed the title Handle restriced output columns in Arrow Page Source Handle restricted output columns in Arrow Page Source Sep 29, 2025
@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from 9ada46d to 2b4de89 Compare September 29, 2025 06:55
@elbinpallimalilibm elbinpallimalilibm marked this pull request as ready for review September 29, 2025 09:18
@elbinpallimalilibm elbinpallimalilibm requested a review from a team as a code owner September 29, 2025 09:18
@prestodb-ci prestodb-ci requested review from a team, auden-woolfson and imsayari404 and removed request for a team September 29, 2025 09:18
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In ArrowPageSource#getVectorIndexForColumnHandleIndex, consider building and caching a mapping from column handle to vector index once per split instead of scanning the list on every page to simplify the logic and improve performance.
  • The logger.debug call in the ArrowPageSource constructor uses a %s placeholder but the Airlift logger expects {} syntax; either correct the placeholder or remove this debug statement if it’s no longer needed.
  • The regex-based type parsing in QueryFunctionProvider.extractColumnParameters is brittle for more complex SQL types; consider leveraging Presto’s built-in SQL parser or a more robust parsing approach to support a wider range of data types.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In ArrowPageSource#getVectorIndexForColumnHandleIndex, consider building and caching a mapping from column handle to vector index once per split instead of scanning the list on every page to simplify the logic and improve performance.
- The logger.debug call in the ArrowPageSource constructor uses a `%s` placeholder but the Airlift logger expects `{}` syntax; either correct the placeholder or remove this debug statement if it’s no longer needed.
- The regex-based type parsing in QueryFunctionProvider.extractColumnParameters is brittle for more complex SQL types; consider leveraging Presto’s built-in SQL parser or a more robust parsing approach to support a wider range of data types.

## Individual Comments

### Comment 1
<location> `presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java:57-58` </location>
<code_context>
         this.arrowBlockBuilder = requireNonNull(arrowBlockBuilder, "arrowBlockBuilder is null");
         this.flightStreamAndClient = clientHandler.getFlightStream(connectorSession, split);
+        this.columnHandlesInSplit = split.getColumns();
+        logger.debug("columnHandlesInSplit %s", columnHandlesInSplit);
     }

</code_context>

<issue_to_address>
**suggestion:** Consider removing or adjusting debug logging for columnHandlesInSplit.

Debug logging in production may cause excessive logs or expose sensitive data. Use a less verbose log level or remove before merging if not needed.

```suggestion
        this.columnHandlesInSplit = split.getColumns();
```
</issue_to_address>

### Comment 2
<location> `presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java:147` </location>
<code_context>
+                vectorIndex = index.getAsInt();
+            }
+            else {
+                throw new ArrowException(ARROW_INTERNAL_ERROR, "Unable to find column " + columnHandles.get(columnHandleIndex).getColumnName() + " in the column handles given in split");
+            }
+        }
</code_context>

<issue_to_address>
**suggestion:** Consider including more context in the error message for missing column.

Including the available columns in the error message will make it easier to identify the issue during debugging.

```suggestion
                String availableColumns = columnHandlesInSplit.get().stream()
                        .map(handle -> handle.getColumnName())
                        .collect(Collectors.joining(", "));
                throw new ArrowException(
                        ARROW_INTERNAL_ERROR,
                        "Unable to find column " + columnHandles.get(columnHandleIndex).getColumnName() +
                        " in the column handles given in split. Available columns: [" + availableColumns + "]"
                );
```
</issue_to_address>

### Comment 3
<location> `presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowSplit.java:34` </location>
<code_context>
     private final String schemaName;
     private final String tableName;
     private final byte[] flightEndpointBytes;
+    private final Optional<List<ArrowColumnHandle>> columns;

     @JsonCreator
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Consider validating that columns Optional is not null in ArrowSplit constructor.

Passing null for columns may cause a NullPointerException. Add a requireNonNull check in the constructor to ensure columns is not null.

Suggested implementation:

```java
import java.util.Collections;
import java.util.List;
import java.util.Optional;
import java.util.Objects;

```

```java
    {
        this.schemaName = schemaName;
        this.tableName = tableName;
        this.flightEndpointBytes = flightEndpointBytes;
        this.columns = Objects.requireNonNull(columns, "columns is null");

```
</issue_to_address>

### Comment 4
<location> `presto-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightQueriesWithTVF.java:38` </location>
<code_context>
+    @Test
+    public void testQueryFunction()
+    {
+        MaterializedResult actualRow = computeActual("SELECT id from TABLE(system.query_function('SELECT name, id FROM tpch.member WHERE id = 1', 'name VARCHAR, id INTEGER'))");
+        MaterializedResult expectedRow = resultBuilder(getSession(), INTEGER)
+                .row(1)
</code_context>

<issue_to_address>
**suggestion (testing):** Test does not cover error handling for mismatched column names or types.

Add tests for cases where the TVF signature's column names or types do not match the query result, such as missing columns or incorrect types, to verify proper error handling.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from 2b4de89 to c9efdfc Compare September 29, 2025 10:32
@elbinpallimalilibm
Copy link
Contributor Author

@sourcery-ai dismiss
@sourcery-ai review

@elbinpallimalilibm
Copy link
Contributor Author

@sourcery-ai dismiss

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from c9efdfc to 20e2a14 Compare September 29, 2025 13:26
@elbinpallimalilibm
Copy link
Contributor Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider replacing Optional<List> in ArrowSplit with an empty-list default to simplify the vector‐index mapping logic and avoid pervasive Optional checks.
  • The regex-based parsing of the DATATYPES string in QueryFunctionProvider is brittle; it would be more robust to leverage Presto’s SQL parser or a dedicated type‐declaration parser.
  • In ArrowPageSource, add an explicit sanity check that the split’s column list size matches the incoming FieldVector count so any mismatches are caught with a clear error early on.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider replacing Optional<List<ArrowColumnHandle>> in ArrowSplit with an empty-list default to simplify the vector‐index mapping logic and avoid pervasive Optional checks.
- The regex-based parsing of the DATATYPES string in QueryFunctionProvider is brittle; it would be more robust to leverage Presto’s SQL parser or a dedicated type‐declaration parser.
- In ArrowPageSource, add an explicit sanity check that the split’s column list size matches the incoming FieldVector count so any mismatches are caught with a clear error early on.

## Individual Comments

### Comment 1
<location> `presto-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightQueriesWithTVF.java:35-36` </location>
<code_context>
+        return ArrowFlightQueryRunner.createQueryRunner(serverPort, true);
+    }
+
+    @Test
+    public void testQueryFunction()
+    {
+        MaterializedResult actualRow = computeActual("SELECT id from TABLE(system.query_function('SELECT name, id FROM tpch.member WHERE id = 1', 'name VARCHAR, id INTEGER'))");
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding negative and edge case tests for the query_function TVF.

Please add tests for scenarios like non-existent columns, unsupported data types, malformed SQL, and restricting all columns to improve error handling and edge case coverage.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from 20e2a14 to 48a35cb Compare September 29, 2025 14:01
@elbinpallimalilibm
Copy link
Contributor Author

@sourcery-ai dismiss

@elbinpallimalilibm
Copy link
Contributor Author

@sourcery-ai review

@elbinpallimalilibm
Copy link
Contributor Author

@sourcery-ai guide

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider replacing Optional<List> with a non‐null empty list to simplify the split‐column logic and avoid Optional checks everywhere.
  • The check that vectors.size() == columnHandlesInSplit.size() may be too strict if the flight stream returns extra metadata or dictionary vectors—consider relaxing this to a >= check or filtering only the data vectors.
  • You can simplify the index mapping by building a name→index map from columnHandlesInSplit once (e.g. in the constructor) instead of scanning the list for each column in getVectorIndexForColumnHandleIndex.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider replacing Optional<List<ArrowColumnHandle>> with a non‐null empty list to simplify the split‐column logic and avoid Optional checks everywhere.
- The check that vectors.size() == columnHandlesInSplit.size() may be too strict if the flight stream returns extra metadata or dictionary vectors—consider relaxing this to a >= check or filtering only the data vectors.
- You can simplify the index mapping by building a name→index map from columnHandlesInSplit once (e.g. in the constructor) instead of scanning the list for each column in getVectorIndexForColumnHandleIndex.

## Individual Comments

### Comment 1
<location> `presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowSplit.java:34` </location>
<code_context>
     private final String schemaName;
     private final String tableName;
     private final byte[] flightEndpointBytes;
+    private final Optional<List<ArrowColumnHandle>> columns;

     @JsonCreator
</code_context>

<issue_to_address>
**suggestion:** Consider using an immutable list for columns to prevent accidental modification.

Wrapping columns in an immutable collection will enforce read-only access and enhance thread safety.

Suggested implementation:

```java
    // columns is always an immutable (unmodifiable) list if present
    private final Optional<List<ArrowColumnHandle>> columns;

```

```java
        this.schemaName = schemaName;
        this.tableName = tableName;
        this.flightEndpointBytes = flightEndpointBytes;
        this.columns = columns.map(list -> Collections.unmodifiableList(list));

```
</issue_to_address>

### Comment 2
<location> `presto-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowSplit.java:53` </location>
<code_context>
         flightEndpoint = new FlightEndpoint(ticket, location);
         // Instantiate ArrowSplit with mock data
-        arrowSplit = new ArrowSplit(schemaName, tableName, flightEndpoint.serialize().array());
+        arrowSplit = new ArrowSplit(schemaName, tableName, flightEndpoint.serialize().array(), Optional.empty());
     }

</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for ArrowSplit with non-empty columns.

Please add a test with populated columns to verify ArrowSplit handles restricted columns correctly during serialization and deserialization.

```suggestion
    }

    @Test
    public void testArrowSplitWithRestrictedColumns() {
        String schemaName = "test_schema";
        String tableName = "test_table";
        Location location = new Location("http://localhost:8080");
        Ticket ticket = new Ticket("test_ticket".getBytes());
        FlightEndpoint flightEndpoint = new FlightEndpoint(ticket, location);

        List<String> restrictedColumns = List.of("col1", "col2", "col3");
        ArrowSplit arrowSplit = new ArrowSplit(
                schemaName,
                tableName,
                flightEndpoint.serialize().array(),
                Optional.of(restrictedColumns)
        );

        // Serialize and deserialize
        ByteBuffer serialized = arrowSplit.serialize();
        ArrowSplit deserialized = ArrowSplit.deserialize(serialized);

        // Assert restricted columns are preserved
        assertNotNull(deserialized.getRestrictedColumns());
        assertEquals(deserialized.getRestrictedColumns().get(), restrictedColumns);
        assertEquals(deserialized.getSchemaName(), schemaName);
        assertEquals(deserialized.getTableName(), tableName);
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from daec4f0 to 96a9bf7 Compare October 25, 2025 00:17
@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch 4 times, most recently from 028f6fc to efc25da Compare October 26, 2025 09:17
Copy link
Contributor Author

@elbinpallimalilibm elbinpallimalilibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the tests to compare with results from expected query instead of comparing with explicit values.

struct ArrowTableHandle : public ConnectorTableHandle {
String schema = {};
String table = {};
std::shared_ptr<List<ArrowColumnHandle>> columns = {};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests in TestArrowFlightNativeQueries to test scenarios with restricted columns. Seems the native connector doesn't even need to read these values, since the tests pass successfully.

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from efc25da to a6bb015 Compare October 26, 2025 10:38
Copy link
Contributor Author

@elbinpallimalilibm elbinpallimalilibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redesigned the whole PR since only a simple fix was needed in ArrowPageSource. I have kept the sample implementation of TVF so that TVF with Arrow Flight connector can be tested including scenarios where output columns are restricted.


private FieldVector getVectorByColumnName(List<FieldVector> vectors, String name)
{
return vectors.stream().filter(v -> v.getName().equals(name)).findFirst().orElseThrow(() -> new ArrowException(ARROW_INTERNAL_ERROR, "Unable to find field for column " + name + " in the fields returned from Arrow Flight stream"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisited the pull request and identified that only this fix was needed to handle restricted output columns when using TVF.

@elbinpallimalilibm
Copy link
Contributor Author

@SourceryAI guide

@elbinpallimalilibm
Copy link
Contributor Author

@SourceryAI review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowPageSource.java:116-119` </location>
<code_context>

+    private FieldVector getVectorByColumnName(List<FieldVector> vectors, String name)
+    {
+        return vectors.stream().filter(v -> v.getName().equals(name)).findFirst().orElseThrow(() -> new ArrowException(ARROW_INTERNAL_ERROR, "Unable to find field for column " + name + " in the fields returned from Arrow Flight stream"));
+    }
+
</code_context>

<issue_to_address>
**suggestion:** ArrowException message could include available field names for easier debugging.

Consider adding the available field names to the ArrowException message to help identify missing or misnamed columns.

```suggestion
    private FieldVector getVectorByColumnName(List<FieldVector> vectors, String name)
    {
        String availableFields = vectors.stream()
            .map(FieldVector::getName)
            .reduce((a, b) -> a + ", " + b)
            .orElse("(none)");
        return vectors.stream()
            .filter(v -> v.getName().equals(name))
            .findFirst()
            .orElseThrow(() -> new ArrowException(
                ARROW_INTERNAL_ERROR,
                "Unable to find field for column " + name +
                " in the fields returned from Arrow Flight stream. " +
                "Available fields: " + availableFields
            ));
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

List<FieldVector> vectors = flightStreamAndClient.getRoot().getFieldVectors();
for (int columnIndex = 0; columnIndex < columnHandles.size(); columnIndex++) {
FieldVector vector = vectors.get(columnIndex);
FieldVector vector = getVectorByColumnName(vectors, columnHandles.get(columnIndex).getColumnName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern with doing it this way is that it's a full search over the fields for each column, so if there is a very wide record batch, it's not very efficient. Also this would affect all cases, not just when using TVFs.

Since this does simplify quite a bit, I think it's ok for now. Could you add a note to explain the reason the columns are matched by name and not just by index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments explaining why fields are looked up by name. BTW, the native arrow flight connector, also looks up the field by column name.

Copy link
Contributor

@BryanCutler BryanCutler Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I think what you have now is close to what is in the native connector


private FieldVector getVectorByColumnName(List<FieldVector> vectors, String name)
{
return vectors.stream().filter(v -> v.getName().equals(name)).findFirst().orElseThrow(() -> new ArrowException(ARROW_INTERNAL_ERROR, "Unable to find field for column " + name + " in the fields returned from Arrow Flight stream"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could make this a little more efficient. I think stream().filter() will continue iterating even if a match is found. The Arrow VectorSchemaRoot also has a method to lookup a vector by name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use the method in VectorSchemaRoot to get the field by name.

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from a6bb015 to 9bea9e7 Compare October 29, 2025 02:00
Copy link
Contributor Author

@elbinpallimalilibm elbinpallimalilibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed suggestions.


private FieldVector getVectorByColumnName(List<FieldVector> vectors, String name)
{
return vectors.stream().filter(v -> v.getName().equals(name)).findFirst().orElseThrow(() -> new ArrowException(ARROW_INTERNAL_ERROR, "Unable to find field for column " + name + " in the fields returned from Arrow Flight stream"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use the method in VectorSchemaRoot to get the field by name.

List<FieldVector> vectors = flightStreamAndClient.getRoot().getFieldVectors();
for (int columnIndex = 0; columnIndex < columnHandles.size(); columnIndex++) {
FieldVector vector = vectors.get(columnIndex);
FieldVector vector = getVectorByColumnName(vectors, columnHandles.get(columnIndex).getColumnName());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments explaining why fields are looked up by name. BTW, the native arrow flight connector, also looks up the field by column name.

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch 2 times, most recently from 6fd82cf to 1b01b32 Compare October 29, 2025 15:54
// In scenarios where the user query contains a Table Valued Function, the output columns could be in a
// different order or could be a subset of the columns in the flight stream. So we are fetching the requested
// field vector by matching the column name instead of fetching by column index.
FieldVector vector = vectorSchemaRoot.getVector(columnHandle.getColumnName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getVector will return a null if not found, need to check for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added check

@Test
public void testQueryFunctionWithoutRestrictedColumns() throws InterruptedException
{
assertQuery("SELECT NAME, NATIONKEY FROM TABLE(system.query_function('SELECT NATIONKEY, NAME FROM tpch.nation WHERE NATIONKEY = 4','NATIONKEY BIGINT, NAME VARCHAR'))", "SELECT NAME, NATIONKEY FROM nation WHERE NATIONKEY = 4");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that would reverse the order of the columns? e.g. SELECT NATIONKEY, NAME ...

Also a negative test where the output column is not present in the TVF? like SELECT FOO ..
This would fail during analysis right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added negative test

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow_handle_restriced_columns branch from 1b01b32 to 46ef954 Compare October 30, 2025 02:54
Copy link
Contributor

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tdcmeehan
Copy link
Contributor

Thanks a lot for carrying this through @elbinpallimalilibm. Thanks also @BryanCutler for the detailed and thorough reviews.

@tdcmeehan tdcmeehan merged commit 7c80949 into prestodb:master Nov 5, 2025
80 checks passed
@elbinpallimalilibm elbinpallimalilibm deleted the arrow_handle_restriced_columns branch November 5, 2025 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants