Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to encode high-byte order Unicode characters #41

Open
deed02392 opened this issue Aug 3, 2022 · 3 comments
Open

Unable to encode high-byte order Unicode characters #41

deed02392 opened this issue Aug 3, 2022 · 3 comments

Comments

@deed02392
Copy link

There seems to be an issue with the Unicode character handling.

I am able to encode β†˜ for example which is U+2198
Raw bytes encoded are:
40 7e fb bb fe 28 69 82 00 ec 11 ec 11 ec 11 ec

However a larger Unicode character πŸ” is U+1F51D

This gets encoded with the following raw bytes:
40 9e fb bb fe da 0b de db 49 d0 ec 11 ec 11 ec

Somehow this is not correct because of a failure to reproduce the Unicode character, although I'm still unsure exactly what the failure is, I suspect it is in QR8BitByte().

@deed02392
Copy link
Author

deed02392 commented Aug 14, 2022

This is a correct raw byte sequence for encoding "πŸ”":

71 a4 04 f0 9f 94 9d 00 ec 11 ec 11 ec 11 ec 11
ec 11 ec

It looks like qrbtf does not support Mode 7.

Here's a valid byte sequence with Mode 4:

40 4f 09 f9 49 d0 ec 11 ec 11 ec 11 ec 11 ec 11
ec 11 ec

@deed02392
Copy link
Author

deed02392 commented Aug 14, 2022

The issue seems to be the encoding algorithm is for UTF-8 code units, but charCodeAt gets UTF-16 code units. The encoding algorithm used in QR8BitByte() is only a partial implementation of https://github.com/akheron/jansson/blob/master/src/utf.c

I suggest we use an external library to make more reliable encoding of input strings.

deed02392 added a commit to deed02392/qrbtf that referenced this issue Aug 14, 2022
deed02392 added a commit to deed02392/qrbtf that referenced this issue Aug 14, 2022
input strings with unicode characters at high code points were not encoded correctly, such as "πŸ”"
deed02392 added a commit to deed02392/qrbtf that referenced this issue Aug 15, 2022
input strings with unicode characters at high code points were not encoded correctly, such as "πŸ”"
@pemontto
Copy link

pemontto commented Jun 7, 2023

Just ran into this issue trying to add my WiFi
WIFI:S:πŸ™ πŸ’;T:WPA;P:<redacted password>;;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants