Using text corpora from Amharic wikipedia and Ethiopian News Headlines, we build AMBERT a language embedding bootstrapped off BERT and Google Translate. See the emedding for details.
The resulting embedding can be used via the AmBert class to build various NLP applications.
am2en is a basic example of a model using this embedding for an amharic to english translation application.
BERT_BASE
environment variable, which is used in
bert_embedding, should point to the BERT installation directory.
AM_LLM
environment variable, which is used in ambert_embedding,
should point to the am-llm repository's directory i.e. this directory.
Add $AM_LLM
to your PYTHONPATH
environment variable.
E.g.
export BERT_BASE=~/bert_base
export AM_LLM=~/src/am-llm
export PYTHONPATH=$AM_LLM:$PYTHONPATH