Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source emoji from unicode.org html file; add emoji category #10

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

Swrup
Copy link
Collaborator

@Swrup Swrup commented Dec 25, 2022

This PR makes the code to generate emoji.ml parse html files from unicode.org: http://www.unicode.org/emoji/charts/full-emoji-list.html and https://www.unicode.org/emoji/charts/full-emoji-modifiers.html
It has all emojis according to https://www.unicode.org/emoji/charts/emoji-counts.html
It add diacritics fixes.
It add (sub)categories!
It add tests, and various changes, update readme.md

It conflict with the lasts two commits since those changes are based on before @favonia change to gencode.ml
but the only real conflict is how diacritics are handled. In this PR we only use '_' to replace diacritics to stay consistent and stay as close as possible to the official names.

@zapashcanon
Copy link
Collaborator

For diacritics handling, you may want to use sanette/ubase. It may not fix everything (I see you had to replace 1st to have an OCaml valid identifier) but it should help in some cases.

(* leading ints are illegal in
* OCaml identifiers so we prepend
* them with a '_' *)
let wrap_leading_ints s = match s.[0] with '0' .. '9' -> "_" ^ s | _ -> s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather match on anything that is a valid identifier, and if it's not the case, then, add a _

@Swrup
Copy link
Collaborator Author

Swrup commented May 23, 2024

This is now updated to unicode v15.1 🐦‍🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants