Skip to content

[FLINK-37862][table] Support inline structured types in SQL #26638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 10, 2025

Conversation

twalthr
Copy link
Contributor

@twalthr twalthr commented Jun 5, 2025

What is the purpose of the change

This adds support for inline structured types in Flink SQL. Inline structured types are not a completely new type. They have been added as part of FLIP-37 when reworking the Flink table type system. However, mostly Table API users were able to use them properly. For SQL, there was no type DDL to cast or declare the type.

This PR improves handling inline structured types by supporting the type declaration:

STRUCTURED<'c', n0 t0 'd0', n1 t1 'd1', ...>

where c is the class name, n is the unique name of a field, t is the logical type of a field, d is the description of a field.

Brief change log

  • Support STRUCTURED<> in SQL parser and logical type parser
  • Decouple structured types from the hard requirement of a class being present in the classpath
  • Support equality between types for testing structured types with different class name
  • Improve various JavaDocs and docs to highlight the benefits and shortcomings of structured types

Verifying this change

This change is already covered by existing tests such as LogicalTypeParserTest, LogicalTypeTest and others.

It adds casting and equality tests as well as a new StructuredFunctionITCase for testing functions dealing with structured types in the future.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes
  • The serializers: yes
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs

@flinkbot
Copy link
Collaborator

flinkbot commented Jun 5, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@twalthr twalthr marked this pull request as ready for review June 6, 2025 11:04
Copy link

@raminqaf raminqaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall excellent PR. I have some questions and small comments

Or via explicit declaration:
```java
DataTypes.STRUCTURED(Class, DataTypes.FIELD(n0, t0), DataTypes.FIELD(n1, t1), ...);
DataTypes.STRUCTURED(String, DataTypes.FIELD(n0, t0), DataTypes.FIELD(n1, t1), ...);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be?

Suggested change
DataTypes.STRUCTURED(String, DataTypes.FIELD(n0, t0), DataTypes.FIELD(n1, t1), ...);
DataTypes.STRUCTURED(String.class, DataTypes.FIELD(n0, t0), DataTypes.FIELD(n1, t1), ...);

And what is the difference between this line and the one before it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to highlight that there are two methods taking either Class or String. I reworked this part.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah now I get it! I just got confused with the String class in Java, and not an actual "String"

Comment on lines +39 to +43
/**
* Creates a STRUCTURED type such as {@code STRUCTURED('org.my.Class', name STRING, age INT)}.
*/
RelDataType createStructuredType(
String className, List<RelDataType> typeList, List<String> fieldNameList);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why is the comment/description field not present here? I think this because we don't really need them, and they are just for clarity, written by the user?

Suggested change
/**
* Creates a STRUCTURED type such as {@code STRUCTURED('org.my.Class', name STRING, age INT)}.
*/
RelDataType createStructuredType(
String className, List<RelDataType> typeList, List<String> fieldNameList);
/**
* Creates a STRUCTURED type such as {@code STRUCTURED('org.my.Class', name STRING 'full name of the person', age INT)}.
*/
RelDataType createStructuredType(
String className, List<RelDataType> typeList, List<String> fieldNameList, List<String> fieldCommentList);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very thorough review! Yes, the comments don't end up in the planner. We do the same for ROW. This is historical legacy that bites us until today. These comments make comparisons and other operations unnecessary complicated. For now, we leave them out. However, a catalog can use them already for DESCRIBE t. From a type definition point of view the type is defined. Everything else we need to solve in a future PR.

* @see StructuredType
*/
public static <T> DataType STRUCTURED(String className, Field... fields) {
return buildStructuredType(StructuredType.newBuilder(className), fields);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be a good idea to fail fast here with a simple validation like className.isBlank() || className.isEmpty()?

Or is it a valid statement to have

STRUCTURED<``, f0 t0>
STRUCTURED<` `, f0 t0>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The builder is currently only used internally, via DataTypes.STRUCTURED and other locations. When instantiating the StructuredType we run checkClassName which will do the full validation. Additional checks just spam the code.

public static final String TYPE =
"STRUCTURED<'" + Type1.class.getName() + "', a INT, b STRING>";

public Integer a;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Changing the type to int on both classes will cause the tests to fail. Why is that?

Suggested change
public Integer a;
public int a;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes because there is a mismatch in extracted and explicit type. The function will return the extracted one. But in general it is up to the user to ensure that data type matches class. Usually, users should not declare the structured types manually but use reflective extraction by just using the POJO in a function signature.

Copy link

@raminqaf raminqaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@twalthr twalthr merged commit 1586a55 into apache:master Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants