Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS has broken Unicode input #652

Open
trwbox opened this issue Nov 29, 2022 · 4 comments
Open

MacOS has broken Unicode input #652

trwbox opened this issue Nov 29, 2022 · 4 comments

Comments

@trwbox
Copy link

trwbox commented Nov 29, 2022

Starting with MacOS 12 (Monterey), Apple has broken raw hex unicode input

While I'm highly unsure if this issue can be fixed, I wanted an issue in the event someone else is facing the same issue.

Tested on M1 MacBook Pro runnning MacOS 12.6.1.

In MacOS unicode mode the 🍺 emoji gives 1f37a which is the corect unicode hex value, with the official hex input keyboard giving for that value.

Appears in the Apple discussion board a few times with no clear fix
https://discussions.apple.com/thread/253435504

@pathnirvana
Copy link

Happening on M1 Air for me too. Would be nice if this can be fixed

@patricksurry
Copy link
Contributor

patricksurry commented Dec 4, 2023

I explored this a bit. I'm on an older macbook air running Sonoma 14.1.1. It seems to want RALT modifier attached to each key, not just pressed ahead of the sequence. After enabling the OS X Unicode Hex Input keyboard (see https://en.wikipedia.org/wiki/Unicode_input#In_MacOS) it was just sending the literal hex characters that I typed. I hacked my version of kmk.handlers.sequences.py::generate_codepoint_keysym_seq() to wrap everything (including the dummy KC.N0s) like KC.RALT(KC.N0) etc and finally got it working. I'm not sure the best way to implement that in the current codebase given the deep macro nesting which doesn't currently take the keyboard encoding as input.

Also the sequences documentation is a little misleading since it mentions UnicodeMode.MACOS or UnicodeMode.OSX or UnicodeMode.RALT for Mac but the code only seems to check for keyboard.unicode_mode == UnicodeMode.RALT. That confused me for a while.

There's also a question in the code about codepoints beyond 16-bit, like smiley face with codepoint 0001f601. The hex input keyboard wants you to send these as big-endian utf-16 sequences. For example smiley becomes b'\x00\x01\xf6\x01'.decode('utf-32-be').encode('utf-16-be') which is b'\xd8\x3d\xde\x01'. This is typed by holding either option key (aka RALT) while typing the characters d83dde01. Like this: 😁

@trwbox
Copy link
Author

trwbox commented Dec 4, 2023

Only working with UnicodeMode.RALT is really weird since all 3 are assigned to same value here. But that is good to know when I get my hands on a keyboard.

I don't have immediate access to a KMK board, but using the 🍺 emoji for testing and my built in python to estimate circuitpython values. '🍺'.encode('utf-16-be').hex() gets me the UTF-16 value of d83cdf7a. Using utf-32-be and I get 0001f37a this matches with the hex(ord('🍺'))[2:] that KMK uses to create the unicode codepoints typed by KMK, with the addition of the leading 0s. Typing the UTF-32 value (without leading 0s like KMK does) on the unicode hex input keyboard still does not work for me on MacOS 14.1, resulting in the same error character, however the UTF-16 hex does work properly giving '🍺'. Also to note that if I add the leading 0s, since a quick Google search says UTF-32 is a fixed length and the leading zeros might matter, I get which is just the unicode character U+f37a.

Where did you see that the MacOS hex input only supports UTF-16? I couldn't find it, and am curious. But if that is a true, then the method to find the unicode code points might just need changed to force UTF-16 to make it work?

@patricksurry
Copy link
Contributor

Typing the UTF-32 value (without leading 0s like KMK does) on the unicode hex input keyboard still does not work

yes exactly, you have to type the utf-16-be encoding, with leading zeros if necessary to form one or more groups of four digits. In some contexts (like GH comments) it seems to briefly show a diamond/question-mark character after the first four digits and then the desired emoji once you've finished typing the second four digits, all with alt/option held.

see the second answer by Tom Gewecke for this stackoverflow question

so I think it'd be a matter of taking the unicode input string and doing unistring.encode('utf-16-be') and then extracting the result as either four or eight hex digits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants