EACC/Unicode Ideograph Mappings

The kEACC field in Unihan 6.2 is woefully out of date. Compared to the mappings in the latest MARC-8 Code Table at the Library of Congress (LoC) it has 8 different mappings and is missing 235.

This directory contains an updated table for Unihan derived from the LoC data.

The Source Data

Unihan_OtherMappings.txt 6.2 from the Unicode Consortium
“MARC-8 to Unicode XML mapping file” from the Library of Congress

The Mapping Table

loc-eacc-ucs.txt was generated with loc.xslt XSLT script from the LoC MARC-8 table.

The Programs

loc.xslt

XSLT script to extract the Han Ideograph mappings from the LoC XML file. Handles the cases where the EACC code maps to both the PUA and to U+3013. The output of this script is a file containing two tab-separated columns:

The 3-byte EACC code as six hexadecimal numbers
The USV of the corresponding Unicode character

eacc-loc-unihan.lisp

functions for reading the mapping tables and comparing their entries. This uses the CL-PPCRE library which is easily installable via QuickLisp. Tested with Clozure Common Lisp it should work with any implementation.

Comparing the Tables

Load eacc-loc-unihan.lisp into your Lisp image and switch to the EACC package.

EACC> (defvar *unihan* (read-unihan-eacc-mappings "Unihan_OtherMappings.txt"))
*UNIHAN*
EACC> (defvar *loc* (read-loc-eacc-mappings "loc-eacc-ucs.txt"))
*LOC*
EACC> (compare-entries *UNIHAN* *LOC*)
4B5F58	0F9B2	096F6
215C32	0FA25	09038
215061	0FA1D	07CBE
4B7421	0F9A9	056F9
4B4B3E	0F9AD	073B2
215F71	0FA1C	09756
4B333E	0F92E	051B7
214339	0FA12	06674
NIL

The output of the call to compare-entries shows the 8 ideographs in EACC that have different mappings in Unihan (e.g., U+F982) than in the LoC table (e.g., U+96F6).

Comparing in the other direction shows the 235 characters that have mappings in the LoC table without a kEACC mapping in Unihan:

EACC> (compare-entries *LOC* *UNIHAN*)
4B3474		0537F
213F53		061F2
4B5361		089D2
214456		06813
;;; lots deleted
216053		0985E
216044		09818
3A284C		053A9
45564B		0865E
NIL

License

The source code is in the public domain: do with it what you will.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.org		README.org
eacc-loc-unihan.lisp		eacc-loc-unihan.lisp
loc-eacc-ucs.txt		loc-eacc-ucs.txt
loc.xslt		loc.xslt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EACC/Unicode Ideograph Mappings

The Source Data

The Mapping Table

The Programs

Comparing the Tables

License

About

Releases

Packages

Languages

TreeRex/han-eacc-ucs

Folders and files

Latest commit

History

Repository files navigation

EACC/Unicode Ideograph Mappings

The Source Data

The Mapping Table

The Programs

Comparing the Tables

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages