-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
source emoji from unicode.org html file; add emoji category #10
base: master
Are you sure you want to change the base?
Conversation
For diacritics handling, you may want to use sanette/ubase. It may not fix everything (I see you had to replace |
(* leading ints are illegal in | ||
* OCaml identifiers so we prepend | ||
* them with a '_' *) | ||
let wrap_leading_ints s = match s.[0] with '0' .. '9' -> "_" ^ s | _ -> s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather match on anything that is a valid identifier, and if it's not the case, then, add a _
This is now updated to unicode v15.1 🐦🔥 |
This PR makes the code to generate emoji.ml parse html files from unicode.org: http://www.unicode.org/emoji/charts/full-emoji-list.html and https://www.unicode.org/emoji/charts/full-emoji-modifiers.html
It has all emojis according to https://www.unicode.org/emoji/charts/emoji-counts.html
It add diacritics fixes.
It add (sub)categories!
It add tests, and various changes, update readme.md
It conflict with the lasts two commits since those changes are based on before @favonia change to gencode.ml
but the only real conflict is how diacritics are handled. In this PR we only use '_' to replace diacritics to stay consistent and stay as close as possible to the official names.