Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unicode Roman Numerals #1672

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add unicode Roman Numerals #1672

wants to merge 1 commit into from

Conversation

edent
Copy link

@edent edent commented Mar 4, 2023

I think I've done this correctly. Please let me know if it needs any more changes.
Fixes #1671

@rhdunn
Copy link
Member

rhdunn commented Mar 15, 2023

Ideally, this should update the TranslateRoman function (https://github.com/espeak-ng/espeak-ng/blob/master/src/libespeak-ng/numbers.c#L756) so that it can correctly map the Unicode characters to the correct number (allowing mixed characters, such as in your year examples), and support other languages than just English.

It would also be a good idea to add tests for the Roman numeral handling (e.g. a tests/language-numbers-roman.test script similar to the cardinal and ordinal number test files in that directory.

@DuaelFr
Copy link

DuaelFr commented Mar 20, 2023

Would it be enough to only extend these two lines in TranslateRoman? I'm bad at C++ so I'm not sure char can hold unicode characters.

	static const char roman_numbers[] = "ixcmvld";
	static const int roman_values[] = { 1, 10, 100, 1000, 5, 50, 500 };

@rhdunn
Copy link
Member

rhdunn commented Mar 27, 2023

Would it be enough to only extend these two lines in TranslateRoman? I'm bad at C++ so I'm not sure char can hold unicode characters.

	static const char roman_numbers[] = "ixcmvld";
	static const int roman_values[] = { 1, 10, 100, 1000, 5, 50, 500 };

The current logic in TranslateRoman assumes the roman numerals are in ASCII. The input (word) should be a lower-case string -- hence that array only having lower-case letters.

You would need to modify the loop on line 793 to use word += utf8_in(&c, word) instead of c = *word++ to allow it to read in the Unicode character.

You can then convert roman_numbers to an array of Unicode (UTF-32) codepoints corresponding to the Roman Numeral characters. It may make sense to colocate the unicode codepoint and the numeral decimal value in a single struct to make it easier to understand and maintain. You can also add a helper to look up the decimal value for a roman numeral that loops through the array, returns 0 if its reached the end (indicated via a { 0, 0 } sentinel value, or returns the corresponding decimal value for the matched numeral codepoint.

That should then work, as the function is called in translateword.c and return a 0 if it is not a roan numeral or 1 if it is.

I think I've done this correctly. Please let me know if it needs any more changes.
Fixes espeak-ng#1671
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode Roman Numerals read out incorrectly
3 participants