Fix handling of non-matching surrogates in collation data. #147

sven-oly · 2023-12-18T19:40:27Z

The current test generator doesn't create tests for collation data when either of the test strings contains an incomplete surrogate. These are recorded in the logging files but they are not stored in any data or mentioned in any dashboards.

sffc · 2024-06-03T22:40:28Z

@markusicu How important is it to test unpaired surrogate collation behavior?

markusicu · 2024-06-05T20:29:39Z

https://www.unicode.org/Public/UCA/latest/CollationTest.html

“These files contain test cases that include ill-formed strings, with surrogate code points. Implementations that do not weight surrogate code points the same way as reserved code points may filter out such lines in the test cases, before testing for conformance.”

sffc · 2024-08-26T23:01:17Z

A key problem here is that unpaired surrogates cannot be represented in UTF-8 (they can be in WTF-8). I feel like I'm not super interested in testing this corner of the conformance data for collation and we should just limit our testing to things that are valid in UTF-8.

markusicu · 2024-08-26T23:04:22Z

That's fine. Did you see my reply from jun05?

sffc · 2024-08-27T00:01:44Z

That's fine. Did you see my reply from jun05?

Yes I did, and it seems like this is the current behavior.

But, the conformance data contains unpaired surrogates presumably because in environments that support them, they need to have a certain behavior, right? So it seems like unicode-org/conformance should pass them down to executors that represent implementations that handle them.

So, I propose keeping this issue open, but demoting the priority.

sffc added this to the Backlog ⟨P3⟩ milestone Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of non-matching surrogates in collation data. #147

Fix handling of non-matching surrogates in collation data. #147

sven-oly commented Dec 18, 2023

sffc commented Jun 3, 2024

markusicu commented Jun 5, 2024

sffc commented Aug 26, 2024

markusicu commented Aug 26, 2024

sffc commented Aug 27, 2024

Fix handling of non-matching surrogates in collation data. #147

Fix handling of non-matching surrogates in collation data. #147

Comments

sven-oly commented Dec 18, 2023

sffc commented Jun 3, 2024

markusicu commented Jun 5, 2024

sffc commented Aug 26, 2024

markusicu commented Aug 26, 2024

sffc commented Aug 27, 2024