-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[SPARK-53470][SQL] ExtractValue expressions should always do type checking #52216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @viirya |
case ArrayType(_: StructType, _) => TypeCheckResult.TypeCheckSuccess | ||
// This should never happen, unless we hit a bug. | ||
case other => | ||
TypeCheckResult.TypeCheckFailure("GetArrayStructFields.child must be array of struct type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: append with , but got ${child.dataType}
?
} | ||
// This should never happen, unless we hit a bug. | ||
case other => | ||
TypeCheckResult.TypeCheckFailure("GetMapValue.child must be map type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
this(left, right, GetArrayItem(left, right, failOnError = false)) | ||
override def inputTypes: Seq[AbstractDataType] = left.dataType match { | ||
case _: ArrayType => Seq(ArrayType, IntegerType) | ||
// Do not apply implicit cast if the first arguement is not array type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to keep the null type error behavior. #50590 improved the error message for it, but also made GetArrayItem
fail for null type input. This usually doesn't matter as GetArrayItem
should never get null type input except from Get
, but this PR restores it for consistency with other ExtractValue
expressions.
@@ -285,8 +293,7 @@ case class GetArrayItem( | |||
with ExtractValue | |||
with SupportQueryContext { | |||
|
|||
// We have done type checking for child in `ExtractValue`, so only need to check the `ordinal`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment also didn't look correct. Actually we did checking the datatype of the child in checkInputDataTypes
.
} | ||
|
||
// We have done type checking for child in `ExtractValue`, so only need to check the `key`. | ||
override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, keyType) | ||
override def inputTypes: Seq[AbstractDataType] = Seq(MapType, keyType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if it is possible that the child data type could be other than MapType, the keyType
cannot directly call asInstanceOf
to cast to MapType, because when we call inputTypes
, the child data type is not done checking type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is why I override def checkInputDataTypes
, so that we only access inputTypes
when the first child is map type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay.
thanks for the review, merging to master! |
What changes were proposed in this pull request?
We skip type checking for these
ExtractValue
expressions because we assume they are always created safely byExtractValue.apply
. However, plan transformations may change the attribute data type and break these expressions. This PR adds type checking for these expressions, so that the plan change vailidation can capture the problemaitc rule earlier, instead of triggering Java cast exception at a late point.Why are the changes needed?
better error reporting
Does this PR introduce any user-facing change?
no
How was this patch tested?
new test
Was this patch authored or co-authored using generative AI tooling?
no