Skip to content

nguyen1207/machine_translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vietnamese-English Machine Translation with VinaLLaMA-7B

This repository contains the code and data processing for finetuning VinaLLaMA-7B and VinaLLaMA-7B-chat in the paper "VinaLLaMA-7B: A Large-Scale Vietnamese-English Machine Translation Model" by Hieu Pham, Dat Quoc Nguyen, Thi Ngoc Diep Do, Minh Nguyen, and Son N. Tran on machine translation task.

The model is finetuned on teencode and slang data from social media text data UIT-VSMEC (translated to English using GPT4), synthetic data (generated using GPT4), parallel dataset mt_eng_vietnamese (HuggingFace).

The instruction prompt used for finetuning is MTInstruct, AlignInstruct, HintInstruct, ReviseInstruct in the paper "Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages" by Zhuoyuan Mao and Yen Yu.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published