MacOS has broken Unicode input #652

trwbox · 2022-11-29T01:14:23Z

Starting with MacOS 12 (Monterey), Apple has broken raw hex unicode input

While I'm highly unsure if this issue can be fixed, I wanted an issue in the event someone else is facing the same issue.

Tested on M1 MacBook Pro runnning MacOS 12.6.1.

In MacOS unicode mode the 🍺 emoji gives 1f37a which is the corect unicode hex value, with the official hex input keyboard giving ἷ for that value.

Appears in the Apple discussion board a few times with no clear fix
https://discussions.apple.com/thread/253435504

The text was updated successfully, but these errors were encountered:

pathnirvana · 2023-01-24T08:32:49Z

Happening on M1 Air for me too. Would be nice if this can be fixed

patricksurry · 2023-12-04T02:14:44Z

I explored this a bit. I'm on an older macbook air running Sonoma 14.1.1. It seems to want RALT modifier attached to each key, not just pressed ahead of the sequence. After enabling the OS X Unicode Hex Input keyboard (see https://en.wikipedia.org/wiki/Unicode_input#In_MacOS) it was just sending the literal hex characters that I typed. I hacked my version of kmk.handlers.sequences.py::generate_codepoint_keysym_seq() to wrap everything (including the dummy KC.N0s) like KC.RALT(KC.N0) etc and finally got it working. I'm not sure the best way to implement that in the current codebase given the deep macro nesting which doesn't currently take the keyboard encoding as input.

Also the sequences documentation is a little misleading since it mentions UnicodeMode.MACOS or UnicodeMode.OSX or UnicodeMode.RALT for Mac but the code only seems to check for keyboard.unicode_mode == UnicodeMode.RALT. That confused me for a while.

There's also a question in the code about codepoints beyond 16-bit, like smiley face with codepoint 0001f601. The hex input keyboard wants you to send these as big-endian utf-16 sequences. For example smiley becomes b'\x00\x01\xf6\x01'.decode('utf-32-be').encode('utf-16-be') which is b'\xd8\x3d\xde\x01'. This is typed by holding either option key (aka RALT) while typing the characters d83dde01. Like this: 😁

trwbox · 2023-12-04T03:21:11Z

Only working with UnicodeMode.RALT is really weird since all 3 are assigned to same value here. But that is good to know when I get my hands on a keyboard.

I don't have immediate access to a KMK board, but using the 🍺 emoji for testing and my built in python to estimate circuitpython values. '🍺'.encode('utf-16-be').hex() gets me the UTF-16 value of d83cdf7a. Using utf-32-be and I get 0001f37a this matches with the hex(ord('🍺'))[2:] that KMK uses to create the unicode codepoints typed by KMK, with the addition of the leading 0s. Typing the UTF-32 value (without leading 0s like KMK does) on the unicode hex input keyboard still does not work for me on MacOS 14.1, resulting in the same error character, however the UTF-16 hex does work properly giving '🍺'. Also to note that if I add the leading 0s, since a quick Google search says UTF-32 is a fixed length and the leading zeros might matter, I get  which is just the unicode character U+f37a.

Where did you see that the MacOS hex input only supports UTF-16? I couldn't find it, and am curious. But if that is a true, then the method to find the unicode code points might just need changed to force UTF-16 to make it work?

patricksurry · 2023-12-04T12:16:22Z

Typing the UTF-32 value (without leading 0s like KMK does) on the unicode hex input keyboard still does not work

yes exactly, you have to type the utf-16-be encoding, with leading zeros if necessary to form one or more groups of four digits. In some contexts (like GH comments) it seems to briefly show a diamond/question-mark character after the first four digits and then the desired emoji once you've finished typing the second four digits, all with alt/option held.

see the second answer by Tom Gewecke for this stackoverflow question

so I think it'd be a matter of taking the unicode input string and doing unistring.encode('utf-16-be') and then extracting the result as either four or eight hex digits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MacOS has broken Unicode input #652

MacOS has broken Unicode input #652

trwbox commented Nov 29, 2022 •

edited

pathnirvana commented Jan 24, 2023

patricksurry commented Dec 4, 2023 •

edited

trwbox commented Dec 4, 2023

patricksurry commented Dec 4, 2023

MacOS has broken Unicode input #652

MacOS has broken Unicode input #652

Comments

trwbox commented Nov 29, 2022 • edited

pathnirvana commented Jan 24, 2023

patricksurry commented Dec 4, 2023 • edited

trwbox commented Dec 4, 2023

patricksurry commented Dec 4, 2023

trwbox commented Nov 29, 2022 •

edited

patricksurry commented Dec 4, 2023 •

edited