-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Support for e notation using existing parse_decimal in string to decimal conversion #6905
base: main
Are you sure you want to change the base?
Fix: Support for e notation using existing parse_decimal in string to decimal conversion #6905
Conversation
error message changed.
@andygrove @viirya @tustvold please take a first look. The one failing test will be fixed once I add the rounding logic in parse-e-notation function. |
.and_then(|v| T::validate_decimal_precision(v, precision).map(|_| v)) | ||
parse_decimal::<T>(v, precision, scale).map_err(|_| { | ||
ArrowError::CastError(format!( | ||
"Cannot cast string '{}' to decimal type of precision {} and scale {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T:DATA_TYPE shows default Decimal(38,10) or Decimal256(76,..) in the error message, hiding the precision and scale provided for cast.
arrow-cast/src/cast/decimal.rs
Outdated
@@ -230,6 +231,7 @@ where | |||
)?)) | |||
} | |||
|
|||
#[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails in clippy, hence added #[allow(dead_code)]
, there is no use, if required we can remove it and cover existing tests with parse_decimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove this and port the tests, to ensure we aren't losing test coverage / accidentally changing behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be a breaking API change, as it changes the rounding behaviour of parse_decimal?
Clippy did not complain and tests are passing, except one which I'm working on - rounding for e-notation. Would any others build task catch it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR seems to remove a number of tests, and orphan some others. If we're changing what cast does, can we please remove the old implementation and port the old tests, so that we aren't losing test coverage.
Also as written this PR is a breaking change, as it alters the rounding behaviour of the parser.
@@ -1284,7 +1284,7 @@ mod tests { | |||
assert_eq!("53.002666", lat.value_as_string(1)); | |||
assert_eq!("52.412811", lat.value_as_string(2)); | |||
assert_eq!("51.481583", lat.value_as_string(3)); | |||
assert_eq!("12.123456", lat.value_as_string(4)); | |||
assert_eq!("12.123457", lat.value_as_string(4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can see this is a breaking change to the rounding behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also to note, previous behavior was not correct.
12.12345678 cast to `Decimal128(38, 6)` = 12.123457
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It truncated rather than rounding, they're both valid behaviours, changing this is a breaking change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an argument for accepting the breaking change to use rounding since it would be consistent with how we cast floating point to decimal. However, do we want to consider adding a parameter to choose between truncation and rounding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally wouldn't characterize this a breaking change, though I can see how others might.
In my opinion, adding a parameter to choose between the behaviors would be the safest thing (aka a field to CastOptions
that defaults to the old, rounding, behavior) for https://docs.rs/arrow/latest/arrow/compute/kernels/cast/fn.cast_with_options.html
Maybe @liukun4515 who added much of the initial decimal support in arrow-rs has time to offer historical perspective on rounding vs truncation during casting?
Thanks @tustvold for the quick review. I've moved over most of the tests for |
Which issue does this PR close?
Closes #. apache/datafusion#10315
Rationale for this change
What changes are included in this PR?
Completed :
Are there any user-facing changes?