-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve speed when listing columns in BigQuery #21920
base: master
Are you sure you want to change the base?
Conversation
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/DecimalTypeInfo.java
Outdated
Show resolved
Hide resolved
protected TypeInfo() {} | ||
|
||
@Override | ||
public abstract String toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these methods are not needed, remove toString, equals and hashCode. toString is only used in test but that's redundant - we only need a single roundtrip: String -> TypeInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toString
is used in a few exceptions. I'll remove others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's the exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the IllegalArgumentException
will be suppressed in convertToTrinoType
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I can either remove it, or we could add some logging, to make it easier to debug why some columns are not available in Trino. LMK what's better.
d3fba84
to
f43ccd6
Compare
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/type/TestTypeInfoUtils.java
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeManager.java
Show resolved
Hide resolved
f43ccd6
to
6b34692
Compare
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeManager.java
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/BaseBigQueryConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/BaseBigQueryConnectorTest.java
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/TestBigQueryType.java
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/BaseBigQueryConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/TypeInfoUtils.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/TypeInfoUtils.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/TypeInfoUtils.java
Outdated
Show resolved
Hide resolved
a2bbea8
to
3654c2e
Compare
@ebyhr PTAL |
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/TestBigQueryType.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/BaseBigQueryConnectorTest.java
Outdated
Show resolved
Hide resolved
FROM %s.INFORMATION_SCHEMA.COLUMNS | ||
GROUP BY table_catalog, table_schema, table_name | ||
""".formatted(quote(remoteSchemaName))); | ||
String schemaName = client.toSchemaName(DatasetId.of(projectId, remoteSchemaName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to build DatasetId and call getDataset method (when is it different from remoteSchemaName)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it for consistency. This connector has a local to remote names mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #19860
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it's for extension. Such change should go to the internal repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we'd have to revert #19860. It's out of the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to revert the PR. Just removing this line is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand, this is required for consistency. We either allow this local-to-remote name mapping, or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at listRelationCommentMetadata()
method. Also, can you write a test that doesn't pass without this line if you still want to keep it? If not, please remove it.
this local-to-remote name mapping
Please explain when schemaName
gets a different value from remoteSchemaName
in this repository:
String schemaName = client.toSchemaName(DatasetId.of(projectId, remoteSchemaName));
...
protected String toSchemaName(DatasetId datasetId)
{
return datasetId.getDataset();
}
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeManager.java
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/PrimitiveTypeInfo.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/TypeInfoUtils.java
Outdated
Show resolved
Hide resolved
cc6a799
to
bf0cf78
Compare
/test-with-secrets sha=bf0cf785188ed0cca97f6b18f4f750df035ebe75 |
The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/9218809344 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost good to me.
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryClient.java
Outdated
Show resolved
Hide resolved
FROM %s.INFORMATION_SCHEMA.COLUMNS | ||
GROUP BY table_catalog, table_schema, table_name | ||
""".formatted(quote(remoteSchemaName))); | ||
String schemaName = client.toSchemaName(DatasetId.of(projectId, remoteSchemaName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it's for extension. Such change should go to the internal repository.
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/TypeInfoUtils.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/TypeInfoUtils.java
Outdated
Show resolved
Hide resolved
protected TypeInfo() {} | ||
|
||
@Override | ||
public abstract String toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the IllegalArgumentException
will be suppressed in convertToTrinoType
, right?
* Unsupported type exception occurs only for data types | ||
* known to be not supported, like STRUCT, not all unknown tokens. | ||
*/ | ||
public class UnsupportedTypeException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name looks little confusing because STRUCT type is supported except for parsing. How about renaming to ParsingException or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not supported in the type
package. ParsingException
seems to be too broad, or I'd have to refactor the exception to use it instead of IllegalArgumentException
and make the typeName
parameter optional. I don't think it's worth doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not supported in the type package
I know. However, the class is used from outside of the package (BigQueryTypeManager) either. The package is unclear when reading the code in the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I make it more clear, using a fully qualified name, or with a comment?
if (!e.getTypeName().equals(STRUCT) || table.isEmpty() || table.get().getDefinition().getSchema() == null) { | ||
// ignore unsupported types | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to continue iteration even when table is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need the table to map STRUCT types. We should continue iteration to convert other types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose table.isEmpty()
means the table doesn't exist. No need to return columns in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't mean it doesn't exit, just it wasn't provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the tableSupplier
is called with getTable
method. It returns an empty when the table doesn't exist:
trino/plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryClient.java
Lines 208 to 217 in 569b045
public Optional<TableInfo> getTable(TableId remoteTableId) | |
{ | |
try { | |
return Optional.ofNullable(bigQuery.getTable(remoteTableId)); | |
} | |
catch (BigQueryException e) { | |
// getTable method throws an exception in some situations, e.g. wild card tables | |
return Optional.empty(); | |
} | |
} |
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/type/TestTypeInfoUtils.java
Outdated
Show resolved
Hide resolved
bf0cf78
to
8c16327
Compare
return RowType.from(fields); | ||
} | ||
|
||
public List<ColumnMetadata> convertToTrinoType(List<String> names, List<String> types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is used only from tests. Remove.
// ignore unsupported types | ||
continue; | ||
} | ||
typeSignature = createRowType(table.get().getDefinition().getSchema().getFields().get(name)).getTypeSignature(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we try to get typeSingature
in case of unsupported type?
If such STRUCT
type is valid maybe parseTypeString
should be adapted to accept it?
public final class BigDecimalTypeInfo | ||
extends PrimitiveTypeInfo | ||
{ | ||
private static final int MAX_PRECISION_MINUS_SCALE = 38; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that max precision is 76?
Why not use just MAX_PRECISION
constant?
public final class DecimalTypeInfo | ||
extends PrimitiveTypeInfo | ||
{ | ||
private static final int MAX_PRECISION_MINUS_SCALE = 29; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not MAX_PRECISION constant?
|
||
private static class TypeInfoParser | ||
{ | ||
public record Token(int position, String text, boolean type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public record Token(int position, String text, boolean type) | |
public record Token(int position, String text, boolean charType) |
int end = 1; | ||
while (end <= typeInfoString.length()) { | ||
if (end == typeInfoString.length() || | ||
!isTypeChar(typeInfoString.charAt(end - 1)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we check every char what is the condition that end - 1
will not be cached in previous loop iteration?
Description
A continuation of #21830. Test coverage of the type parser is at 99% of lines and 95% of branches. Original description below.
Using mini parser because BigQuery Java SDK doesn't support translating string to BigQuery type as far as I asked Google engineers.
This PR improves listing columns. Test with 169 tables is improved from 22s to 1.5s.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: