Skip to content

Scripts for machine translation corpora filtering/ 机器翻译平行语料过滤的脚本

Notifications You must be signed in to change notification settings

alphadl/corpus_filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

该脚本有过滤语料的作用

This Script is Mainly for Filtering the Paralle Corpora

过滤特征有:(Features)

1.双语句子长度比率(length ratio of src and tgt sentences)

2.重复句子(repeated sentences)

目前支持的语种类:(Supported LANGs)

  • 英文EN<-->中文ZH
  • 日文JP<-->中文ZH
  • 韩文KR<-->中文ZH

9th/Aug/2018

About

Scripts for machine translation corpora filtering/ 机器翻译平行语料过滤的脚本

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages