Skip to content

Latest commit

 

History

History
11 lines (8 loc) · 337 Bytes

CHANGELOG.md

File metadata and controls

11 lines (8 loc) · 337 Bytes

CHANGELOG

v0.2:

  • different, more consistent handling of end-of-word token (commit a749a7)
  • allow passing of vocabulary and frequency threshold to apply_bpe.py, preventing the production of OOV (or rare) subword units (commit a00db)

v0.1:

  • consistent cross-version unicode handling
  • all scripts are now deterministic