Skip to content

nemozen/am-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large Language Model for Amharic

Using text corpora from Amharic wikipedia and Ethiopian News Headlines, we build AMBERT a language embedding bootstrapped off BERT and Google Translate. See the emedding for details.

The resulting embedding can be used via the AmBert class to build various NLP applications.

am2en is a basic example of a model using this embedding for an amharic to english translation application.

Setup

BERT_BASE environment variable, which is used in bert_embedding, should point to the BERT installation directory.

AM_LLM environment variable, which is used in ambert_embedding, should point to the am-llm repository's directory i.e. this directory.

Add $AM_LLM to your PYTHONPATH environment variable.

E.g.

export BERT_BASE=~/bert_base
export AM_LLM=~/src/am-llm
export PYTHONPATH=$AM_LLM:$PYTHONPATH

About

Amharic Large Language Model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages