Skip to content

Commit 433dd58

Browse files
author
futuran
committed
fix translate code
1 parent 854547b commit 433dd58

File tree

43 files changed

+225498
-986
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+225498
-986
lines changed

README.md

Lines changed: 12 additions & 179 deletions
Original file line numberDiff line numberDiff line change
@@ -1,186 +1,19 @@
1-
# OpenNMT-py: Open-Source Neural Machine Translation
21

3-
[![Build Status](https://travis-ci.org/OpenNMT/OpenNMT-py.svg?branch=master)](https://travis-ci.org/OpenNMT/OpenNMT-py)
4-
[![Run on FH](https://img.shields.io/badge/Run%20on-FloydHub-blue.svg)](https://floydhub.com/run?template=https://github.com/OpenNMT/OpenNMT-py)
52

6-
This is a [PyTorch](https://github.com/pytorch/pytorch)
7-
port of [OpenNMT](https://github.com/OpenNMT/OpenNMT),
8-
an open-source (MIT) neural machine translation system. It is designed to be research friendly to try out new ideas in translation, summary, image-to-text, morphology, and many other domains. Some companies have proven the code to be production ready.
3+
# OpenNMT-py-MultiEncoder
94

10-
We love contributions. Please consult the Issues page for any [Contributions Welcome](https://github.com/OpenNMT/OpenNMT-py/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22) tagged post.
5+
### Original Reference
6+
- original README:https://github.com/futuran/OpenNMT-py/blob/multiencoder/README.md
7+
- This repository is folked from https://github.com/OpenNMT/OpenNMT-py
118

12-
<center style="padding: 40px"><img width="70%" src="http://opennmt.github.io/simple-attn.png" /></center>
9+
## Abstract
10+
- This is multiencoder version of Open-NMT-py
1311

14-
Before raising an issue, make sure you read the requirements and the documentation examples.
12+
## Preprocess
13+
- add opt "train_sim", "valid_sim" to original conf file.
1514

16-
Unless there is a bug, please use the [Forum](http://forum.opennmt.net) or [Gitter](https://gitter.im/OpenNMT/OpenNMT-py) to ask questions.
15+
## Train
16+
- you can start train as original OpenNMT-py
1717

18-
19-
Table of Contents
20-
=================
21-
* [Full Documentation](http://opennmt.net/OpenNMT-py/)
22-
* [Requirements](#requirements)
23-
* [Features](#features)
24-
* [Quickstart](#quickstart)
25-
* [Run on FloydHub](#run-on-floydhub)
26-
* [Acknowledgements](#acknowledgements)
27-
* [Citation](#citation)
28-
29-
## Requirements
30-
31-
Install `OpenNMT-py` from `pip`:
32-
```bash
33-
pip install OpenNMT-py
34-
```
35-
36-
or from the sources:
37-
```bash
38-
git clone https://github.com/OpenNMT/OpenNMT-py.git
39-
cd OpenNMT-py
40-
python setup.py install
41-
```
42-
43-
Note: If you have MemoryError in the install try to use `pip` with `--no-cache-dir`.
44-
45-
*(Optional)* some advanced features (e.g. working audio, image or pretrained models) requires extra packages, you can install it with:
46-
```bash
47-
pip install -r requirements.opt.txt
48-
```
49-
50-
Note:
51-
52-
- some features require Python 3.5 and after (eg: Distributed multigpu, entmax)
53-
- we currently only support PyTorch 1.4
54-
55-
## Features
56-
57-
- [Seq2Seq models (encoder-decoder) with multiple RNN cells (lstm/gru) and attention (dotprod/mlp) types](http://opennmt.net/OpenNMT-py/options/train.html#model-encoder-decoder)
58-
- [Transformer models](http://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-the-transformer-model)
59-
- [Copy and Coverage Attention](http://opennmt.net/OpenNMT-py/options/train.html#model-attention)
60-
- [Pretrained Embeddings](http://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-pretrained-embeddings-e-g-glove)
61-
- [Source word features](http://opennmt.net/OpenNMT-py/options/train.html#model-embeddings)
62-
- [Image-to-text processing](http://opennmt.net/OpenNMT-py/im2text.html)
63-
- [Speech-to-text processing](http://opennmt.net/OpenNMT-py/speech2text.html)
64-
- [TensorBoard logging](http://opennmt.net/OpenNMT-py/options/train.html#logging)
65-
- [Multi-GPU training](http://opennmt.net/OpenNMT-py/FAQ.html##do-you-support-multi-gpu)
66-
- [Data preprocessing](http://opennmt.net/OpenNMT-py/options/preprocess.html)
67-
- [Inference (translation) with batching and beam search](http://opennmt.net/OpenNMT-py/options/translate.html)
68-
- Inference time loss functions.
69-
- [Conv2Conv convolution model]
70-
- SRU "RNNs faster than CNN" paper
71-
- Mixed-precision training with [APEX](https://github.com/NVIDIA/apex), optimized on [Tensor Cores](https://developer.nvidia.com/tensor-cores)
72-
73-
## Quickstart
74-
75-
[Full Documentation](http://opennmt.net/OpenNMT-py/)
76-
77-
78-
### Step 1: Preprocess the data
79-
80-
```bash
81-
onmt_preprocess -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo
82-
```
83-
84-
We will be working with some example data in `data/` folder.
85-
86-
The data consists of parallel source (`src`) and target (`tgt`) data containing one sentence per line with tokens separated by a space:
87-
88-
* `src-train.txt`
89-
* `tgt-train.txt`
90-
* `src-val.txt`
91-
* `tgt-val.txt`
92-
93-
Validation files are required and used to evaluate the convergence of the training. It usually contains no more than 5000 sentences.
94-
95-
96-
After running the preprocessing, the following files are generated:
97-
98-
* `demo.train.pt`: serialized PyTorch file containing training data
99-
* `demo.valid.pt`: serialized PyTorch file containing validation data
100-
* `demo.vocab.pt`: serialized PyTorch file containing vocabulary data
101-
102-
103-
Internally the system never touches the words themselves, but uses these indices.
104-
105-
### Step 2: Train the model
106-
107-
```bash
108-
onmt_train -data data/demo -save_model demo-model
109-
```
110-
111-
The main train command is quite simple. Minimally it takes a data file
112-
and a save file. This will run the default model, which consists of a
113-
2-layer LSTM with 500 hidden units on both the encoder/decoder.
114-
If you want to train on GPU, you need to set, as an example:
115-
`CUDA_VISIBLE_DEVICES=1,3`
116-
`-world_size 2 -gpu_ranks 0 1` to use (say) GPU 1 and 3 on this node only.
117-
To know more about distributed training on single or multi nodes, read the FAQ section.
118-
119-
### Step 3: Translate
120-
121-
```bash
122-
onmt_translate -model demo-model_acc_XX.XX_ppl_XXX.XX_eX.pt -src data/src-test.txt -output pred.txt -replace_unk -verbose
123-
```
124-
125-
Now you have a model which you can use to predict on new data. We do this by running beam search. This will output predictions into `pred.txt`.
126-
127-
!!! note "Note"
128-
The predictions are going to be quite terrible, as the demo dataset is small. Try running on some larger datasets! For example you can download millions of parallel sentences for [translation](http://www.statmt.org/wmt16/translation-task.html) or [summarization](https://github.com/harvardnlp/sent-summary).
129-
130-
## Alternative: Run on FloydHub
131-
132-
[![Run on FloydHub](https://static.floydhub.com/button/button.svg)](https://floydhub.com/run?template=https://github.com/OpenNMT/OpenNMT-py)
133-
134-
Click this button to open a Workspace on [FloydHub](https://www.floydhub.com/?utm_medium=readme&utm_source=opennmt-py&utm_campaign=jul_2018) for training/testing your code.
135-
136-
137-
## Pretrained embeddings (e.g. GloVe)
138-
139-
Please see the FAQ: [How to use GloVe pre-trained embeddings in OpenNMT-py](http://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-pretrained-embeddings-e-g-glove)
140-
141-
## Pretrained Models
142-
143-
The following pretrained models can be downloaded and used with translate.py.
144-
145-
http://opennmt.net/Models-py/
146-
147-
## Acknowledgements
148-
149-
OpenNMT-py is run as a collaborative open-source project.
150-
The original code was written by [Adam Lerer](http://github.com/adamlerer) (NYC) to reproduce OpenNMT-Lua using Pytorch.
151-
152-
Major contributors are:
153-
[Sasha Rush](https://github.com/srush) (Cambridge, MA)
154-
[Vincent Nguyen](https://github.com/vince62s) (Ubiqus)
155-
[Ben Peters](http://github.com/bpopeters) (Lisbon)
156-
[Sebastian Gehrmann](https://github.com/sebastianGehrmann) (Harvard NLP)
157-
[Yuntian Deng](https://github.com/da03) (Harvard NLP)
158-
[Guillaume Klein](https://github.com/guillaumekln) (Systran)
159-
[Paul Tardy](https://github.com/pltrdy) (Ubiqus / Lium)
160-
[François Hernandez](https://github.com/francoishernandez) (Ubiqus)
161-
[Jianyu Zhan](http://github.com/jianyuzhan) (Shanghai)
162-
[Dylan Flaute](http://github.com/flauted (University of Dayton)
163-
and more !
164-
165-
OpenNMT-py belongs to the OpenNMT project along with OpenNMT-Lua and OpenNMT-tf.
166-
167-
## Citation
168-
169-
[OpenNMT: Neural Machine Translation Toolkit](https://arxiv.org/pdf/1805.11462)
170-
171-
[OpenNMT technical report](https://doi.org/10.18653/v1/P17-4012)
172-
173-
```
174-
@inproceedings{opennmt,
175-
author = {Guillaume Klein and
176-
Yoon Kim and
177-
Yuntian Deng and
178-
Jean Senellart and
179-
Alexander M. Rush},
180-
title = {Open{NMT}: Open-Source Toolkit for Neural Machine Translation},
181-
booktitle = {Proc. ACL},
182-
year = {2017},
183-
url = {https://doi.org/10.18653/v1/P17-4012},
184-
doi = {10.18653/v1/P17-4012}
185-
}
186-
```
18+
## Translate
19+
- add opt "sim" to original conf file.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
for tvt in train val test; do
2+
subword-nmt apply-bpe -c ../tamura_preprocess/subword_3k/codes.ja < ../merge.0.00.restore/flickr_${tvt}.ja1.tokenized.with_match.bpe.jaonly.r > flickr_${tvt}.ja1.tokenized.with_match.bpe.jaonly.r.blbpe
3+
done

0 commit comments

Comments
 (0)