Skip to content

Commit

Permalink
Fix bin/utils/README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ZhitingHu committed Jun 23, 2019
1 parent 3488d64 commit 75876a4
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions bin/utils/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
This directory contains several utilities for, e.g., data pre-processing.

Instructions of using BPE and WPM encoding are as follows.
See [examples/transformer](https://github.com/asyml/texar/tree/master/examples/transformer)
See [examples/transformer](https://github.com/asyml/texar-pytorch/tree/master/examples/transformer)
for a real example of using these encodings.

**Note that** there are a few different (sub-)word encoding approaches and implementations which are used by several popular models. For example:

* **BPE by Rico Sennrich**: Used in [Transformer](https://github.com/asyml/texar/tree/master/examples/transformer) for machine translation. This is the version in this folder, including both BPE training and encoding/decoding.
* **BPE by OpenAI**: Used in [GPT-2]() language model. Includes BPE encoding/decoding and provided BPE vocab (no training).
* **BPE by WordPiece**: Used in [BERT](https://github.com/asyml/texar/tree/master/examples/bert) for text embedding. Includes BPE encoding/decoding and provided BPE vocab (no training).
* **SPM by sentencepiece**: Used in [Transformer](https://github.com/asyml/texar/tree/master/examples/transformer) for machine translation. This is the version in this folder, including both SPM training and encoding/decoding.
* **BPE by Rico Sennrich**: Used in [Transformer](https://github.com/asyml/texar-pytorch/tree/master/examples/transformer) for machine translation. This is the version in this folder, including both BPE training and encoding/decoding.
* **BPE by OpenAI**: Used in [GPT-2](https://github.com/ZhitingHu/texar-pytorch/tree/master/examples/gpt-2) language model. Includes BPE encoding/decoding and provided BPE vocab (no training).
* **BPE by WordPiece**: Used in [BERT](https://github.com/asyml/texar-pytorch/tree/master/examples/bert) for text embedding. Includes BPE encoding/decoding and provided BPE vocab (no training).
* **SPM by sentencepiece**: Used in [Transformer](https://github.com/asyml/texar-pytorch/tree/master/examples/transformer) for machine translation. This is the version in this folder, including both SPM training and encoding/decoding.

### *[Byte Pair Encoding (BPE)](https://arxiv.org/abs/1508.07909)* pipeline

Expand Down

0 comments on commit 75876a4

Please sign in to comment.