Skip to content

Conversation

@ScrapCodes
Copy link
Contributor

@ScrapCodes ScrapCodes commented Sep 22, 2025

Description

There is a separate PR enabling tests, once that is in - we can work on the tests.

Following unsolved problems exist:

  1. Histogram implementation.

Motivation and Context

Stats can improve the plans involving Oracle connector.

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
== RELEASE NOTES ==
Oracle Connector changes
* Add : Implementation to fetch table stats from source tables

@ScrapCodes ScrapCodes requested a review from a team as a code owner September 22, 2025 14:23
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Sep 22, 2025
@prestodb-ci prestodb-ci requested review from a team, NivinCS and jp-sivaprasad and removed request for a team September 22, 2025 14:23
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Sep 22, 2025

Reviewer's Guide

This PR extends OracleClient by overriding getTableStatistics to fetch and parse table-level and column-level statistics from Oracle’s DBA_TAB_STATISTICS and ALL_TAB_COLUMNS views, converting them into Presto’s TableStatistics and ColumnStatistics via helper methods and enriched logging.

Sequence diagram for getTableStatistics data retrieval and conversion

sequenceDiagram
participant OC as OracleClient
participant CF as connectionFactory
participant DB as OracleDB
participant TS as TableStatistics
participant CS as ColumnStatistics
OC->>CF: openConnection(JdbcIdentity)
CF->>DB: Connect
OC->>DB: Execute SQL for table stats (DBA_TAB_STATISTICS)
DB-->>OC: Return NUM_ROWS, AVG_ROW_LEN, LAST_ANALYZED
OC->>DB: Execute SQL for column stats (ALL_TAB_COLUMNS)
DB-->>OC: Return column stats
OC->>CS: Build ColumnStatistics for each column
OC->>TS: Build TableStatistics with column stats and row count
OC-->>Caller: Return TableStatistics
Loading

Class diagram for updated OracleClient with getTableStatistics

classDiagram
class OracleClient {
  +getTableStatistics(session, handle, columnHandles, tupleDomain) TableStatistics
  -getColumnStaticsSql(handle) String
  -toDouble(number) double
}
OracleClient --|> BaseJdbcClient
class TableStatistics {
  +builder()
  +empty()
  +setColumnStatistics(columnStatisticsMap)
  +setRowCount(rowCount)
}
class ColumnStatistics {
  +builder()
  +setDataSize(dataSize)
  +setNullsFraction(nullsFraction)
  +setDistinctValuesCount(distinctValuesCount)
  +setRange(DoubleRange)
}
class DoubleRange {
  +DoubleRange(low, high)
}
OracleClient --> TableStatistics
OracleClient --> ColumnStatistics
ColumnStatistics --> DoubleRange
Loading

File-Level Changes

Change Details Files
Override getTableStatistics in OracleClient to fetch and build statistics
  • Require non-null schema and table names
  • Execute SQL against DBA_TAB_STATISTICS for NUM_ROWS, AVG_ROW_LEN, LAST_ANALYZED
  • Loop through column stats query and map results to ColumnStatistics
  • Build and return TableStatistics with row count and column-level stats
presto-oracle/src/main/java/com/facebook/presto/plugin/oracle/OracleClient.java
Add helper methods for column stats SQL generation and value parsing
  • Implemented getColumnStaticsSql to produce SQL for ALL_TAB_COLUMNS
  • Added toDouble to safely parse string-encoded numeric values with NaN fallback
presto-oracle/src/main/java/com/facebook/presto/plugin/oracle/OracleClient.java
Introduce structured logging for diagnostics
  • Added Logger LOG field
  • Logged debug when stats are missing or parsing errors occur
  • Logged info on successful getTableStatistics invocation
presto-oracle/src/main/java/com/facebook/presto/plugin/oracle/OracleClient.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@ScrapCodes ScrapCodes requested a review from aaneja September 22, 2025 14:23
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Wrap the PreparedStatement and ResultSet objects in try-with-resources (not just the Connection) to ensure all JDBC resources are properly closed.
  • Use SLF4J’s {} placeholders instead of %s in log messages so parameters are substituted correctly.
  • Avoid building SQL via String.format with unescaped identifiers—use bind parameters or proper identifier quoting to prevent SQL injection or quoting errors.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Wrap the PreparedStatement and ResultSet objects in try-with-resources (not just the Connection) to ensure all JDBC resources are properly closed.
- Use SLF4J’s {} placeholders instead of %s in log messages so parameters are substituted correctly.
- Avoid building SQL via String.format with unescaped identifiers—use bind parameters or proper identifier quoting to prevent SQL injection or quoting errors.

## Individual Comments

### Comment 1
<location> `presto-oracle/src/main/java/com/facebook/presto/plugin/oracle/OracleClient.java:152-154` </location>
<code_context>
     }

+    @Override
+    public TableStatistics getTableStatistics(ConnectorSession session, JdbcTableHandle handle, List<JdbcColumnHandle> columnHandles, TupleDomain<ColumnHandle> tupleDomain)
+    {
+        try {
+            Preconditions.checkNotNullOrEmpty(handle.getSchemaName(), "schema name");
+            Preconditions.checkNotNullOrEmpty(handle.getTableName(), "table name");
</code_context>

<issue_to_address>
**issue:** Consider handling the case where NUM_ROWS is zero to avoid division by zero.

Guard against numRows being zero when computing nullsFraction to prevent runtime errors and ensure accurate statistics.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ScrapCodes ScrapCodes marked this pull request as draft September 23, 2025 07:57
public TableStatistics getTableStatistics(ConnectorSession session, JdbcTableHandle handle, List<JdbcColumnHandle> columnHandles, TupleDomain<ColumnHandle> tupleDomain)
{
try {
Preconditions.checkNotNullOrEmpty(handle.getSchemaName(), "schema name");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a requireNonNull should suffice for this

"FROM ALL_TAB_COLUMNS\n" +
"WHERE OWNER = '%s'\n" +
" AND TABLE_NAME = '%s' AND COLUMN_NAME = '%s') ");
StringBuffer sqlQueryBuffer = new StringBuffer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why not StringBuilder ?

@agrawalreetika
Copy link
Member

agrawalreetika commented Sep 23, 2025

@ScrapCodes fyi, there is an effort to re-enable Oracle tests here - #25762

@ScrapCodes ScrapCodes changed the title [WIP] Implemented getTableStatistics for oracle connector. Implemented getTableStatistics for oracle connector. Sep 25, 2025
@ScrapCodes ScrapCodes force-pushed the oracle_get_stats branch 2 times, most recently from 314094b to dca512e Compare October 14, 2025 08:22
@ScrapCodes ScrapCodes marked this pull request as ready for review October 14, 2025 08:24
@prestodb-ci prestodb-ci requested a review from a team October 14, 2025 08:24
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and found some issues that need to be addressed.

  • Consider using try-with-resources for PreparedStatements and ResultSets to ensure they are always closed and avoid resource leaks.
  • Build SQL queries using parameterized statements or proper identifier quoting instead of String.format to prevent SQL injection and handle edge cases.
  • Use '{}' placeholders in Logger calls instead of '%s' for consistent formatting with the logging framework.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider using try-with-resources for PreparedStatements and ResultSets to ensure they are always closed and avoid resource leaks.
- Build SQL queries using parameterized statements or proper identifier quoting instead of String.format to prevent SQL injection and handle edge cases.
- Use '{}' placeholders in Logger calls instead of '%s' for consistent formatting with the logging framework.

## Individual Comments

### Comment 1
<location> `presto-oracle/src/main/java/com/facebook/presto/plugin/oracle/OracleClient.java:150` </location>
<code_context>
     }

+    @Override
+    public TableStatistics getTableStatistics(ConnectorSession session, JdbcTableHandle handle, List<JdbcColumnHandle> columnHandles, TupleDomain<ColumnHandle> tupleDomain)
+    {
+        try {
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Consider closing PreparedStatements explicitly to avoid resource leaks.

PreparedStatements (preparedStatement and preparedStatementCol) should also be closed explicitly to ensure all JDBC resources are released, especially under high query loads.

Suggested implementation:

```java
            try (Connection connection = connectionFactory.openConnection(JdbcIdentity.from(session));
                 PreparedStatement preparedStatement = getPreparedStatement(session, connection, sql)) {

```

If there is another PreparedStatement (e.g., `preparedStatementCol`) created in this method, you should also include it in the try-with-resources statement:

try (Connection connection = ...;
     PreparedStatement preparedStatement = ...;
     PreparedStatement preparedStatementCol = ...) {
    // use preparedStatement and preparedStatementCol
}

If the method creates PreparedStatements in different scopes, ensure each is closed in a try-with-resources or explicitly in a finally block.
</issue_to_address>

### Comment 2
<location> `presto-oracle/src/main/java/com/facebook/presto/plugin/oracle/OracleClient.java:174` </location>
<code_context>
+                PreparedStatement preparedStatementCol = getPreparedStatement(session, connection, getColumnStaticsSql(handle));
+                resultSet = preparedStatementCol.executeQuery();
+                Map<ColumnHandle, ColumnStatistics> columnStatisticsMap = new HashMap<>();
+                Map<String, JdbcColumnHandle> columnHandleMap = Maps.uniqueIndex(columnHandles, JdbcColumnHandle::getColumnName);
+                while (resultSet.next() && numRows > 0) {
+                    String columnName = resultSet.getString("COLUMN_NAME");
</code_context>

<issue_to_address>
**issue:** Handle missing column names gracefully when populating columnStatisticsMap.

If columnHandleMap.get(columnName) returns null, this null key will be added to columnStatisticsMap, which may cause errors later. Please add a null check and handle such cases appropriately.
</issue_to_address>

### Comment 3
<location> `presto-oracle/src/main/java/com/facebook/presto/plugin/oracle/OracleClient.java:239` </location>
<code_context>
+            // varchar.
+            LOG.debug(e, "error while decoding : %s", number);
+        }
+        return NaN;
+    }
+
</code_context>

<issue_to_address>
**issue (bug_risk):** Use Double.NaN for clarity and consistency.

'NaN' is undefined here; replace it with 'Double.NaN' to prevent compilation errors.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ScrapCodes ScrapCodes changed the title Implemented getTableStatistics for oracle connector. feat: Implemented getTableStatistics for oracle connector. Oct 30, 2025
@ScrapCodes ScrapCodes changed the title feat: Implemented getTableStatistics for oracle connector. Feat: Implemented getTableStatistics for oracle connector Oct 30, 2025
@ScrapCodes ScrapCodes changed the title Feat: Implemented getTableStatistics for oracle connector feat: Implemented getTableStatistics for oracle connector Oct 30, 2025
@tdcmeehan tdcmeehan changed the title feat: Implemented getTableStatistics for oracle connector feat(plugin-oracle): Implemented getTableStatistics for oracle connector Oct 30, 2025
@tdcmeehan
Copy link
Contributor

@ScrapCodes please fix the release note.

@aaneja aaneja merged commit 4a60975 into prestodb:master Nov 3, 2025
120 of 128 checks passed
@ScrapCodes ScrapCodes deleted the oracle_get_stats branch November 3, 2025 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants