-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[FLINK-37862][table] Support inline structured types in SQL #26638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall excellent PR. I have some questions and small comments
Or via explicit declaration: | ||
```java | ||
DataTypes.STRUCTURED(Class, DataTypes.FIELD(n0, t0), DataTypes.FIELD(n1, t1), ...); | ||
DataTypes.STRUCTURED(String, DataTypes.FIELD(n0, t0), DataTypes.FIELD(n1, t1), ...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be?
DataTypes.STRUCTURED(String, DataTypes.FIELD(n0, t0), DataTypes.FIELD(n1, t1), ...); | |
DataTypes.STRUCTURED(String.class, DataTypes.FIELD(n0, t0), DataTypes.FIELD(n1, t1), ...); |
And what is the difference between this line and the one before it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to highlight that there are two methods taking either Class or String. I reworked this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah now I get it! I just got confused with the String
class in Java, and not an actual "String"
/** | ||
* Creates a STRUCTURED type such as {@code STRUCTURED('org.my.Class', name STRING, age INT)}. | ||
*/ | ||
RelDataType createStructuredType( | ||
String className, List<RelDataType> typeList, List<String> fieldNameList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why is the comment/description field not present here? I think this because we don't really need them, and they are just for clarity, written by the user?
/** | |
* Creates a STRUCTURED type such as {@code STRUCTURED('org.my.Class', name STRING, age INT)}. | |
*/ | |
RelDataType createStructuredType( | |
String className, List<RelDataType> typeList, List<String> fieldNameList); | |
/** | |
* Creates a STRUCTURED type such as {@code STRUCTURED('org.my.Class', name STRING 'full name of the person', age INT)}. | |
*/ | |
RelDataType createStructuredType( | |
String className, List<RelDataType> typeList, List<String> fieldNameList, List<String> fieldCommentList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very thorough review! Yes, the comments don't end up in the planner. We do the same for ROW. This is historical legacy that bites us until today. These comments make comparisons and other operations unnecessary complicated. For now, we leave them out. However, a catalog can use them already for DESCRIBE t
. From a type definition point of view the type is defined. Everything else we need to solve in a future PR.
* @see StructuredType | ||
*/ | ||
public static <T> DataType STRUCTURED(String className, Field... fields) { | ||
return buildStructuredType(StructuredType.newBuilder(className), fields); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be a good idea to fail fast here with a simple validation like className.isBlank() || className.isEmpty()
?
Or is it a valid statement to have
STRUCTURED<``, f0 t0>
STRUCTURED<` `, f0 t0>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The builder is currently only used internally, via DataTypes.STRUCTURED
and other locations. When instantiating the StructuredType we run checkClassName
which will do the full validation. Additional checks just spam the code.
...lanner/src/test/java/org/apache/flink/table/planner/functions/StructuredFunctionsITCase.java
Outdated
Show resolved
Hide resolved
public static final String TYPE = | ||
"STRUCTURED<'" + Type1.class.getName() + "', a INT, b STRING>"; | ||
|
||
public Integer a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Changing the type to int
on both classes will cause the tests to fail. Why is that?
public Integer a; | |
public int a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes because there is a mismatch in extracted and explicit type. The function will return the extracted one. But in general it is up to the user to ensure that data type matches class. Usually, users should not declare the structured types manually but use reflective extraction by just using the POJO in a function signature.
...lanner/src/test/java/org/apache/flink/table/planner/functions/StructuredFunctionsITCase.java
Show resolved
Hide resolved
flink-table/flink-table-common/src/test/java/org/apache/flink/table/types/LogicalTypesTest.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
What is the purpose of the change
This adds support for inline structured types in Flink SQL. Inline structured types are not a completely new type. They have been added as part of FLIP-37 when reworking the Flink table type system. However, mostly Table API users were able to use them properly. For SQL, there was no type DDL to cast or declare the type.
This PR improves handling inline structured types by supporting the type declaration:
where
c
is the class name,n
is the unique name of a field,t
is the logical type of a field,d
is the description of a field.Brief change log
STRUCTURED<>
in SQL parser and logical type parserVerifying this change
This change is already covered by existing tests such as
LogicalTypeParserTest
,LogicalTypeTest
and others.It adds casting and equality tests as well as a new
StructuredFunctionITCase
for testing functions dealing with structured types in the future.Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: yesDocumentation