-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-47681][SQL][FOLLOWUP] Fix variant decimal handling. #46338
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -392,21 +392,32 @@ public static double getDouble(byte[] value, int pos) { | |||
return Double.longBitsToDouble(readLong(value, pos + 1, 8)); | ||||
} | ||||
|
||||
// Check whether the precision and scale of the decimal are within the limit. | ||||
private static void checkDecimal(BigDecimal d, int maxPrecision) { | ||||
if (d.precision() > maxPrecision || d.scale() > maxPrecision) { | ||||
throw malformedVariant(); | ||||
} | ||||
} | ||||
|
||||
// Get a decimal value from variant value `value[pos...]`. | ||||
// Throw `MALFORMED_VARIANT` if the variant is malformed. | ||||
public static BigDecimal getDecimal(byte[] value, int pos) { | ||||
checkIndex(pos, value.length); | ||||
int basicType = value[pos] & BASIC_TYPE_MASK; | ||||
int typeInfo = (value[pos] >> BASIC_TYPE_BITS) & TYPE_INFO_MASK; | ||||
if (basicType != PRIMITIVE) throw unexpectedType(Type.DECIMAL); | ||||
int scale = value[pos + 1]; | ||||
// Interpret the scale byte as unsigned. If it is a negative byte, the unsigned value must be | ||||
// greater than `MAX_DECIMAL16_PRECISION` and will trigger an error in `checkDecimal`. | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should distinguish between variant decimal spec and Spark decimal limitations. Some databases (and Java libraries) support negative decimal scale, which means the value range of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't allow negative scale in the variant spec either: spark/common/variant/README.md Line 348 in bf2e254
|
||||
int scale = value[pos + 1] & 0xFF; | ||||
BigDecimal result; | ||||
switch (typeInfo) { | ||||
case DECIMAL4: | ||||
result = BigDecimal.valueOf(readLong(value, pos + 2, 4), scale); | ||||
checkDecimal(result, MAX_DECIMAL4_PRECISION); | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the pyspark side also need to be updated? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think so. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The python side is also fixed. During the fix, I found that the python error reporting was not correctly implemented and also fixed it. |
||||
break; | ||||
case DECIMAL8: | ||||
result = BigDecimal.valueOf(readLong(value, pos + 2, 8), scale); | ||||
checkDecimal(result, MAX_DECIMAL8_PRECISION); | ||||
break; | ||||
case DECIMAL16: | ||||
checkIndex(pos + 17, value.length); | ||||
|
@@ -417,6 +428,7 @@ public static BigDecimal getDecimal(byte[] value, int pos) { | |||
bytes[i] = value[pos + 17 - i]; | ||||
} | ||||
result = new BigDecimal(new BigInteger(bytes), scale); | ||||
checkDecimal(result, MAX_DECIMAL16_PRECISION); | ||||
break; | ||||
default: | ||||
throw unexpectedType(Type.DECIMAL); | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we also check
precision > scale
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we do need to distinguish between variant decimal spec and Spark decimal limitations.
precision >= scale
is a requirement in Spark, not in variant spec. In the new test case I add,0.01
has a precison of 1 and a scale of 2. It is a valid decimal in the variant spec but requires some special handling inschema_of_variant
.