Functions providing basic support for UTF-8 encodings
Prior to upgrading Hammerspoon's Lua interpreter to 5.3, UTF8 support was provided by including the then beta version of Lua 5.3's utf8 library as a Hammerspoon module. This is no longer necessary, but to maintain compatibility, the Lua utf8 library can still be accessed through hs.utf8
. The documentation for the utf8 library can be found at http://www.lua.org/manual/5.3/ or from the Hammerspoon console via the help command: help.lua.utf8
. This affects the following functions and variables:
- hs.utf8.char - help available via
help.lua.utf8.char
- hs.utf8.charPattern - help available via
help.lua.utf8.charpattern
- hs.utf8.codepoint - help available via
help.lua.utf8.codepoint
- hs.utf8.codes - help available via
help.lua.utf8.codes
- hs.utf8.len - help available via
help.lua.utf8.len
- hs.utf8.offset - help available via
help.lua.utf8.offset
Additional functions that are specific to Hammerspoon which provide expanded support for UTF8 are documented here.
Signature |
hs.utf8.registeredKeys[] |
Type |
Variable |
Description |
A collection of UTF-8 characters already converted from codepoint and available as convient key-value pairs. UTF-8 printable versions of common Apple and OS X special keys are predefined and others can be added with hs.utf8.registerCodepoint(label, codepoint) for your own use. |
Notes |
- This table has a __tostring() metamethod which allows listing it's contents in the Hammerspoon console by typing
hs.utf8.registeredKeys . - For parity with
hs.utf8.registeredLabels , this can also invoked as a function, i.e. hs.utf8.registeredKeys["cmd"] is equivalent to hs.utf8.registeredKeys("cmd")
|
Signature |
hs.utf8.asciiOnly(string[, all]) -> string |
Type |
Function |
Description |
Returns the provided string with all non-printable ascii characters escaped, except Return, Linefeed, and Tab. |
Parameters |
- string - The input string which is to have all non-printable ascii characters escaped as \x## (a single byte hexadecimal number).
- all - an optional boolean parameter (default false) indicating whether or not Return, Linefeed, and Tab should also be considered "non-printable"
|
Returns |
- The cleaned up string, with non-printable characters escaped.
|
Notes |
- Because Unicode characters outside of the basic ascii alphabet are multi-byte characters, any UTF8 or other Unicode encoded character will be broken up into their individual bytes and likely escaped by this function.
- This function is useful for displaying binary data in a human readable way that might otherwise be inexpressible in the Hammerspoon console or other destination. For example:
utf8.charpattern , which contains the regular expression for matching valid UTF8 encoded sequences, results in (null) in the Hammerspoon console, but hs.utf8.asciiOnly(utf8.charpattern) will display [\x00-\x7F\xC2-\xF4][\x80-\xBF]* .
|
Signature |
hs.utf8.codepointToUTF8(...) -> string |
Type |
Function |
Description |
Wrapper to utf8.char(...) which ensures that all codepoints return valid UTF8 characters. |
Parameters |
- codepoints - A series of numeric Unicode code points to be converted to a UTF-8 byte sequences. If a codepoint is a string (and does not start with U+, it is used as a key for lookup in
hs.utf8.registeredKeys[]
|
Returns |
- A string containing the UTF-8 byte sequences corresponding to provided codepoints as a combined string.
|
Notes |
- Valid codepoint values are from 0x0000 - 0x10FFFF (0 - 1114111)
- If the codepoint provided is a string that starts with U+, then the 'U+' is converted to a '0x' so that lua can properly treat the value as numeric.
- Invalid codepoints are returned as the Unicode Replacement Character (U+FFFD)
- This includes out of range codepoints as well as the Unicode Surrogate codepoints (U+D800 - U+DFFF)
|
Signature |
hs.utf8.fixUTF8(inString[, replacementChar]) -> outString, posTable |
Type |
Function |
Description |
Replace invalid UTF8 character sequences in inString with replacementChar so it can be safely displayed in the console or other destination which requires valid UTF8 encoding. |
Parameters |
- inString - String of characters which may contain invalid UTF8 byte sequences
- replacementChar - optional parameter to replace invalid byte sequences in
inString . If this parameter is not provided, the default UTF8 replacement character, U+FFFD, is used.
|
Returns |
- outString - The contents of
inString with all invalid UTF8 byte sequences replaced by the replacementChar . - posTable - a table of indexes in
outString corresponding indicating where replacementChar has been used.
|
Notes |
- This function is a slight modification to code found at http://notebook.kulchenko.com/programming/fixing-malformed-utf8-in-lua.
- If
replacementChar is a multi-byte character (like U+FFFD) or multi character string, then the string length of outString will be longer than the string length of inString . The character positions in posTable will reflect these new positions in outString . - To calculate the character position of the invalid characters in
inString , use something like the following:
|
Signature |
hs.utf8.hexDump(inputString [, count]) -> string |
Type |
Function |
Description |
Returns a hex dump of the provided string. This is primarily useful for examining the exact makeup of binary data contained in a Lua String as individual bytes for debugging purposes. |
Parameters |
- inputString - the data to be rendered as individual hexadecimal bytes for examination.
- count - an optional parameter specifying the number of bytes to display per line (default 16)
|
Returns |
- a string containing the hex dump of the input string.
|
Notes |
- Like hs.utf8.asciiOnly, this function will break up Unicode characters into their individual bytes.
- As an example:
hs.utf8.hexDump(utf8.charpattern) will return 00 : 5B 00 2D 7F C2 2D F4 5D 5B 80 2D BF 5D 2A : [.-..-.][.-.]*
|
Signature |
hs.utf8.registerCodepoint(label, codepoint) -> string |
Type |
Function |
Description |
Registers a Unicode codepoint under the given label as a UTF-8 string of bytes which can be referenced by the label later in your code as hs.utf8.registeredKeys[label] for convenience and readability. |
Parameters |
- label - a string label to use as a human-readable reference when getting the UTF-8 byte sequence for use in other strings and output functions.
- codepoint - a Unicode codepoint in numeric or
U+xxxx format to register with the given label.
|
Returns |
- Returns the UTF-8 byte sequence for the Unicode codepoint registered.
|
Notes |
- If a codepoint label was previously registered, this will overwrite the previous value with a new one. Because many of the special keys you may want to register have different variants, this allows you to easily modify the existing predefined defaults to suite your preferences.
- The return value is merely syntactic sugar and you do not need to save it locally; it can be safely ignored -- future access to the pre-converted codepoint should be retrieved as
hs.utf8.registeredKeys[label] in your code. It looks good when invoked from the console, though ☺.
|
Signature |
hs.utf8.registeredLabels(utf8char) -> string |
Type |
Function |
Description |
Returns the label name for a UTF8 character, as it is registered in hs.utf8.registeredKeys[] . |
Parameters |
- utf8char -- the character to lookup in
hs.utf8.registeredKeys[]
|
Returns |
- The string label for the UTF8 character or a string in the format of "U+XXXX", if it is not defined in
hs.utf8.registeredKeys[] , or nil, if utf8char is not a valid UTF8 character.
|
Notes |
- For parity with
hs.utf8.registeredKeys , this can also be invoked as if it were an array: i.e. hs.utf8.registeredLabels(char) is equivalent to hs.utf8.registeredLabels[char]
|