Clarify None case in bstr::decode_utf8 #139

glts · 2022-11-09T20:19:22Z

Thank you for this useful library.

In bstr 1.0.1, the documentation for bstr::decode_utf8 states:

When unsuccessful, None is returned along with the number of bytes that make up a maximal prefix of a valid UTF-8 code unit sequence. In this case, the number of bytes consumed is always between 0 and 3, inclusive, where 0 is only returned when slice is empty.

bstr::decode_utf8(b"\xFFabc") returns (None, 1). The byte \xFF cannot be decoded so the result is None; but the number of bytes that make up a maximal prefix of a valid UTF-8 code unit sequence would be 0, as \xFF is not a valid UTF-8 prefix.

Can you confirm, or can you paraphrase the wording for me?

The text was updated successfully, but these errors were encountered:

BurntSushi · 2022-11-09T21:31:27Z

Ah. 1 is indeed correct. The docs need to be updated. Returning 0 wouldn't make sense, because 0 is meant to be the terminal condition of a loop. Returning 0 in any other case leads to more complex loop logic that would be easy to get wrong, which would lead to an infinite loop in practice.

BurntSushi added the doc Documentation should be improved. label Nov 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify None case in bstr::decode_utf8 #139

Clarify None case in bstr::decode_utf8 #139

glts commented Nov 9, 2022

BurntSushi commented Nov 9, 2022

Clarify None case in bstr::decode_utf8 #139

Clarify None case in bstr::decode_utf8 #139

Comments

glts commented Nov 9, 2022

BurntSushi commented Nov 9, 2022