Vietnamese-English Machine Translation with VinaLLaMA-7B

This repository contains the code and data processing for finetuning VinaLLaMA-7B and VinaLLaMA-7B-chat in the paper "VinaLLaMA-7B: A Large-Scale Vietnamese-English Machine Translation Model" by Hieu Pham, Dat Quoc Nguyen, Thi Ngoc Diep Do, Minh Nguyen, and Son N. Tran on machine translation task.

The model is finetuned on teencode and slang data from social media text data UIT-VSMEC (translated to English using GPT4), synthetic data (generated using GPT4), parallel dataset mt_eng_vietnamese (HuggingFace).

The instruction prompt used for finetuning is MTInstruct, AlignInstruct, HintInstruct, ReviseInstruct in the paper "Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages" by Zhuoyuan Mao and Yen Yu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Vietnamese-English Machine Translation with VinaLLaMA-7B

Files

README.md

Latest commit

History

README.md

File metadata and controls

Vietnamese-English Machine Translation with VinaLLaMA-7B