Skip to content
EricNichols edited this page Jun 15, 2008 · 27 revisions

Japanese English Machine Translation

This page describes the Jaen MT system, an MT system based on the [wiki:LogonTop LOGON architecture], using [wiki:JacyTop Jacy] for the source language analysis and the [wiki:ErgTop ERG] for the target language generation. In order to degrade gracefully in the presence of input we cannot yet translate, we are also working on a backup [wiki:MtJaenSmt Statistical MT system] based on [http://www.statmt.org/moses/ Moses].

Jaen is the elder sibling of [wiki:NoJa Noja], the Norwegian-Japanese MT system.

The first version of Jaen is described in:

Tanaka Corpus Development Data

Change Date Parse Coverage Transfer Coverage Generation Coverage End-to-End Coverage BLEU Oracle
Initial run 06/05 2779 / 4500 (61.76%) 549 / 2779 (19.76%) 383 / 549 (69.76%) 383 / 4500 (8.51%) 0.1433 0.2234
Fixed _eba_c_rel etc. 06/07 2779 / 4500 (61.76%) 662 / 2779 (23.82%) 478 / 662 (72.21%) 478 / 4500 (10.62%) 0.1436 0.2302
Rel name debugging + handcrafted rules 06/09 2779 / 4500 (61.76%) 679 / 2779 (24.43%) 487 / 679 (71.72%) 487 / 4500 (10.82%) 0.1470 0.2335
Generic entries w/o にる,だける 06/14 3014 / 4500 (66.98%) 691 / 3014 (22.93%) 491 / 691 (71.06%) 491 / 4500 (10.91%) 0.1404 0.2300
New vn handling 06/14 3014 / 4500 (66.98%) 703 / 3014 (23.32%) 500 / 703 (71.12%) 500 / 4500 (11.11%) 0.1369 0.2264
Generic Edict rules 06/15 3014 / 4500 (66.98%) 722 / 3014 (23.95%) 520 / 722 (72.02%) 520 / 4500 (11.56%) 0.1337 0.2231
Clone this wiki locally