Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split thousands Returns several 0 #11

Open
hexofyore opened this issue Jun 4, 2023 · 8 comments
Open

Split thousands Returns several 0 #11

hexofyore opened this issue Jun 4, 2023 · 8 comments

Comments

@hexofyore
Copy link

The !num.is_zero() doesn't work properly for float. The division isn't accurate so the exponent reduces on each division and eventually reaches 0. I think there is no need to use float. The input string can be split by . and both be represented as u128 which gives precision of 38 digits which I think is equal to the Bigfloat.

@hexofyore
Copy link
Author

@Ballasi You have added numbers upto vigintillion. We can use num crate for arbitrary precision integer. If it can be done without breaking changes.

@hexofyore
Copy link
Author

Or we can simply use it as string. We aren't using it for calculation. The commas can be removed and string will be split by decimal point.

@Ballasi
Copy link
Owner

Ballasi commented Jun 5, 2023

The !num.is_zero() doesn't work properly for float.

Can you give me a code snippet as example? That sounds like a bug for the crate, might be interested to investigate.

which gives precision of 38 digits which I think is equal to the Bigfloat

BigFloat has relative precision, which means that it's always 40 digits, but relatively to the known spectrum. For instance, 10000000000000000000000000000000000000000 is a number that BigFloat can store, but u128 cannot.

However, this has limitations since it cannot store 10000000000000000000000000000000000000001. I was hoping that the bigfloat crate had the ability to input a custom decimal positions in mantissa, but it doesn't seem like so.

The input string can be split by .

I had though of that before, but sadly you cannot store numbers like 1.01 as it will be stored as 1 for the integer part and 01 for the decimal part, which will result in 1.1.

You can "inverse" the input of the decimal part (e.g., for the example above, 10 for float), but that makes it very much costy and not very understandable in code reviews either so I wanted to avoid that.

we can simply use it as string

Same here, it's not really the best idea to store number as strings, I don't reject that possibility but it really is not that great, and makes the crate quite expensive to use, which I tend to avoid.

I know that the author of num-bigfloat has made a new crate to go beyond all of these limitations, astro-float. It might be interesting to look at that option too.

What do you think?

@hexofyore
Copy link
Author

Can you give me a code snippet as example? That sounds like a bug for the crate, might be interested to investigate.


    #[test]
    fn test_split() {
        let en = English {
            prefer_nil: true,
            prefer_oh: false,
        };
        println!(
            "{:?}",
            en.split_thousands(num_bigfloat::BigFloat::from(1234.5678))
        )
    }
running 1 test
[234, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
test lang::en::tests::test_split ... ok

Here, the is_zero only works when you get exponents around 100+. So, it gives unnecessry zeroes.

I had though of that before, but sadly you cannot store numbers like 1.01 as it will be stored as 1 for the integer part and 01 for the decimal part, which will result in 1.1

why not? it will just be two strings with 1 and 01 string. We The only things we need to do is take it from last which is relatively easy compared to number type which we have to divide and do modulus operation.

Same here, it's not really the best idea to store number as strings, I don't reject that possibility but it really is not that great, and makes the crate quite expensive to use, which I tend to avoid.

I don't know if it can really be viewed as number when the only thing we take i individual digits. It's similar to Phone Numbers which has digits but no meaning to operation like addition and such. About expensive, will it really be so expensive?

But yes, if there is type with unlimited precision, then that would be good. But I still think string might be better. Processing it would be easier as it's easy to get individual characters from them without much operation.

@Ballasi
Copy link
Owner

Ballasi commented Jun 6, 2023

Here, the is_zero only works when you get exponents around 100+. So, it gives unnecessry zeroes.

Curious, thanks for the code snippet, I'll investigate that! Thanks!

why not? it will just be two strings with 1 and 01 string

I was talking about using u128 in that case, for sure a String would work fine in here!

I don't know if it can really be viewed as number when the only thing we take i individual digits. It's similar to Phone Numbers which has digits but no meaning to operation like addition and such.

True that, the only main "operation" that is used is / 10 or % 10. For sure we don't need a full number type for that.

About expensive, will it really be so expensive?

I am mostly thinking of memory expensive and parsing intensive. Like you pass as arg a i8 which will be turned into a String which will then be processed, create new string on the heap, etc. etc.

I still think string might be better. Processing it would be easier as it's easy to get individual characters from them without much operation.

As talked on Reddit, I am more keen on using something as Vec<u8>, it's as memory expensive, though it's true the length of number won't be that long either, but it makes it easier to use first, e.g., instead of

case number {
    "1" => "one",
    "2" => "two",
    ...
}

you can have UNITS[number]

And on top of that it ensures the data type passed is numbers and not junk.

What do you think of this @hexofyore?

@hexofyore
Copy link
Author

I was mainly thinking in terms of the library being used for when user inputs value which will always be string and later be parsed to whatever. But, it can likely be result of some calculation.

And, I didn't think about that mapping. This makes turning to string not so better.

Vec is the best of world but we still need to turn the number to individual digits and store them as vector. I think we can simply use arbitrary precision integers and maybe we can think of handling zeroes after decimal point somehow.

While any method is doable, I am not so sure which is better. method. I am just beginner and while I may get things done, I don't think I can do it right way.

You choose one and maybe I will try it. Anyway, I think I will first complete nepali addition. And, what do you think of adding to_numeral? Like, '12345' is '१२३४५' in devanagari.

@Ballasi
Copy link
Owner

Ballasi commented Jun 6, 2023

for when user inputs value which will always be string

This is true for the context of the binary, which is mainly just a demo of the library. The main "product" of this num2words git is the library itself which supports both numbers and strings. And experience shows that the main usage of the lib is via numbers rather than string.

I am not so sure which is better

I'll preferably go with the Vec<number> approach as this seems more sensible for me.

I am just beginner and while I may get things done, I don't think I can do it right way.

And that's totally okay! Your help is already greatly appreciated and I'll be here to help in case there is anything! That's firstly very kind of you to take your time to take part in this conversation.

I'll be happy to receive a PR with work for Vec<number>!

what do you think of adding to_numeral? Like, '12345' is '१२३४५' in devanagari

I wonder rather than using to_numeral if it would make more sense to have it be to_native or something similar. I am wondering if there are other languages doing that.

I am also wondering if that is supposed to be the other way around, I mean having १२३४५ as input and not as output.

For now focus on Vec<number> and nepali support, I'll take a look at that more closely!

@Laifsyn
Copy link

Laifsyn commented Apr 11, 2024

a quick fix about extra zeros is to truncate the decimals on each check.

How much does this hit the performance? Idk, because BigFloat is a Copy-Type so it would be best to benchmark first before anything

    fn en_miles(&self, mut num: BigFloat) -> Vec<u64> {
        // Doesn't check if BigFloat is Integer only
        let mut thousands = Vec::new();
        let mil = 1000.into();
        num = num.abs();
        while !num.int().is_zero() {
            // Insert in Low Endian
            thousands.push((num % mil).to_u64().expect("triplet not under 1000"));
            num /= mil; // DivAssign
        }
        thousands
    }

Laifsyn added a commit to Laifsyn/num2words_spanish that referenced this issue May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants