Skip to content

Commit 0536816

Browse files
author
FrancisYao
committed
update
1 parent 5ed4e92 commit 0536816

File tree

7 files changed

+5
-325
lines changed

7 files changed

+5
-325
lines changed

.DS_Store

6 KB
Binary file not shown.

Guideline.md

Lines changed: 0 additions & 36 deletions
This file was deleted.

README.md

Lines changed: 5 additions & 174 deletions
Original file line numberDiff line numberDiff line change
@@ -1,179 +1,10 @@
1-
## A Hierarchical Latent Concept to Paraphrase Variational Autoencoder
1+
# The Latent Bag of Words Model
22

3-
The journals about this project is moved to [this link](https://github.com/Francix/Deep-Generative-Models-for-Natural-Language-Processing/blob/master/README.md), as a reading list
3+
Implementation of Yao Fu, Yansong Feng and John Cunningham, _Paraphrase Generation with Latent Bag of Words_. NeurIPS 2019. [paper](https://github.com/FranxYao/dgm_latent_bow/doc/latent_bow_camera_ready.pdf)
44

5-
## Results - Quora
5+
<img src="etc/sample_sentences.png" alt="example"
6+
title="Example" width="800" />
67

7-
Models | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | Rouge-1 | Rouge-2 | Rouge-L
8-
------ | ------ | ------ | ------ | ------ | ------- | ------- | -------
9-
seq2seq | 51.34 | 36.88 | 28.08 | 22.27 | 52.66 | 29.17 | 50.29
10-
seq2seq-attn | 53.24 | 38.79 | 29.56 | 23.34 | 54.71 | 30.68 | 52.29
11-
beta-vae, beta = 1e-3 | 43.02 | 28.60 | 20.98 | 16.29 | 41.81 | 21.17 | 40.09
12-
beta-vae, beta = 1e-4 | 47.86 | 33.21 | 24.96 | 19.73 | 47.62 | 25.49 | 45.46
13-
bow-hard | 33.40 | 21.18 | 14.43 | 10.36 | 36.08 | 16.23 | 33.77
14-
latent-bow-topk | 54.93 | 41.19 | 31.98 | 25.57 | 58.05 | 33.95 | 55.74
15-
latent-bow-gumbel | 54.82 | 40.96 | 31.74 | 25.33 | 57.75 | 33.67 | 55.46
16-
cheating-bow | 72.96 | 61.78 | 54.40 | 49.47 | 72.15 | 52.61 | 68.53
8+
For more background about deep generative models for natural language processing, see the [DGM4NLP](https://github.com/FranxYao/Deep-Generative-Models-for-Natural-Language-Processing) journal list.
179

18-
note: strictly, we should call this cross-aligned VAE
1910

20-
## Results - MSCOCO
21-
22-
Models | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | Rouge-1 | Rouge-2 | Rouge-L
23-
------ | ------ | ------ | ------ | ------ | ------- | ------- | -------
24-
seq2seq | 69.61 | 47.14 | 31.64 | 21.65 | 40.11 | 14.31 | 36.28
25-
seq2seq-attn | 71.24 | 49.65 | 34.04 | 23.66 | 41.07 | 15.26 | 37.35
26-
beta-vae, beta = 1e-3 | 68.81 | 45.82 | 30.56 | 20.99 | 39.63 | 13.86 | 35.81
27-
beta-vae, beta = 1e-4 | 70.04 | 47.59 | 32.29 | 22.54 | 40.72 | 14.75 | 36.75
28-
bow-hard | 48.14 | 28.35 | 16.25 | 9.28 | 31.66 | 8.30 | 27.37
29-
latent-bow-topk | 72.60 | 51.14 | 35.66 | 25.27 | 42.08 | 16.13 | 38.16
30-
latent-bow-gumbel | 72.37 | 50.81 | 35.32 | 24.98 | 42.12 | 16.05 | 38.13
31-
cheating-bow | 80.87 | 65.38 | 51.72 | 41.48 | 45.54 | 20.57 | 40.97
32-
33-
## Results - MSCOCO - Detailed
34-
35-
Models | PPL | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4
36-
------ | --- | ------ | ------ | ------ | ------
37-
seq2seq | 4.36 | 69.61 | 47.14 | 31.64 | 21.65
38-
seq2seq-attn | 4.88 | 71.24 | 49.65 | 34.04 | 23.66
39-
beta-vae, beta = 1e-3 | 3.94 | 68.81 | 45.82 | 30.56 | 20.99
40-
beta-vae, beta = 1e-4 | 4.12 | 70.04 | 47.59 | 32.29 | 22.54
41-
bow-hard | 19.13 | 48.14 | 28.35 | 16.25 | 9.28
42-
latent-bow-topk | 4.75 | 72.60 | 51.14 | 35.66 | 25.27
43-
latent-bow-gumbel | 4.69 | 72.37 | 50.81 | 35.32 | 24.98
44-
cheating-bow | 15.65 | 80.87 | 65.38 | 51.72 | 41.48
45-
latent-bow-memory-only |
46-
seq2seq-attn top2 sampling |
47-
bow-seq2seq, enc baseline | - | 63.39 | 40.31 | 24.40 | 14.76
48-
bow-seq2seq, ref baseline | - | 76.09 | 49.90 | 31.79 | 20.41
49-
bow, predict all para bow | - | 64.44 | 41.26 | 25.90 | 16.47
50-
bow, predict all para bow exclude self bow |
51-
hierarchical vae |
52-
53-
Models | Rouge-1 | Rouge-2 | Rouge-L
54-
------ | ------- | ------- | -------
55-
seq2seq | 40.11 | 14.31 | 36.28
56-
seq2seq-attn | 41.07 | 15.26 | 37.35
57-
beta-vae, beta = 1e-3 | 39.63 | 13.86 | 35.81
58-
beta-vae, beta = 1e-4 | 40.72 | 14.75 | 36.75
59-
bow-hard | 31.66 | 8.30 | 27.37
60-
latent-bow-topk | 42.08 | 16.13 | 38.16
61-
latent-bow-gumbel | 42.12 | 16.05 | 38.13
62-
cheating-bow | 45.54 | 20.57 | 40.97
63-
seq2seq-attn top2 sampling |
64-
latent-bow-memory-only |
65-
66-
Models | Dist-1 | Dist-2 | Dist-3
67-
------ | ------ | ------ | ------
68-
seq2seq | 689 | 3343 | 7400
69-
seq2seq-attn | 943 | 4867 | 11494
70-
beta-vae, beta = 1e-3 | 737 | 3367 | 6923
71-
beta-vae, beta = 1e-4 | 1090 | 5284 | 11216
72-
bow-hard | 2100 | 24505 | 71293
73-
latent-bow-topk | 1407 | 7496 | 17062
74-
latent-bow-gumbel | 1433 | 7563 | 17289
75-
cheating-bow | 2399 | 26963 | 70128
76-
seq2seq-attn top2 sampling |
77-
latent-bow-memory-only |
78-
79-
Models | IN-BLEU-1 | IN-BLEU-2 | IN-BLEU-3 | IN-BLEU-4 | Jaccard Dist
80-
------ | --------- | --------- | --------- | --------- | ------------
81-
seq2seq | 46.01 | 28.17 | 18.41 | 12.76 | 33.74
82-
seq2seq-attn | 49.28 | 32.23 | 22.19 | 16.06 | 37.60
83-
beta-vae, beta = 1e-3 | 44.92 | 26.82 | 17.34 | 12.02 | 32.41
84-
beta-vae, beta = 1e-4 | 46.97 | 29.07 | 19.33 | 13.68 | 34.42
85-
bow-hard | 27.62 | 14.31 | 7.59 | 4.06 | 21.08
86-
latent-bow-topk | 51.22 | 34.36 | 24.31 | 18.04 | 39.25
87-
latent-bow-gumbel |
88-
cheating-bow | 34.95 | 18.98 | 10.79 | 6.41 | 24.85
89-
seq2seq-attn top2 sampling |
90-
latent-bow-memory-only |
91-
bow-seq2seq, enc baseline | 41.40 | 25.31 | 15.78 | 10.13 | -
92-
bow-seq2seq, ref baseline | 29.56 | 13.95 | 7.11 | 3.83 | -
93-
bow, predict all para bow | 49.07 | 31.17 | 20.55 | 14.18 | -
94-
bow, predict all para bow exclude self bow |
95-
hierarchical vae |
96-
97-
98-
Sentence samples - seq2seq-attn
99-
* I: Five slices of bread are placed on a surface .
100-
* O: A bunch of food that is sitting on a plate .
101-
* I: A wooden floor inside of a kitchen next to a stove top oven .
102-
* O: A kitchen with a stove , oven , and a refrigerator .
103-
* I: Four horses pull a carriage carrying people in a parade .
104-
* O: A group of people riding horses down a street .
105-
106-
Random Walk samples - seq2seq-attn
107-
* I: A man sitting on a bench reading a piece of paper
108-
* -> A man is sitting on a bench in front of a building
109-
* -> A man is standing in the middle of a park bench
110-
* -> A man is holding a baby in the park
111-
* -> A man is holding a baby in a park
112-
* -> A man is holding a baby in a park
113-
* I: A water buffalo grazes on tall grass while an egret stands by
114-
* -> A large bison standing in a grassy field
115-
* -> A large buffalo standing in a field with a large green grass
116-
* -> A bison with a green grass covered in green grass
117-
* -> A large bison grazing in a field with a green grass covered field
118-
* -> A large bison grazing in a field with a large tree in the background
119-
120-
## Project Vision
121-
122-
* "Use probabilistic models where we have inductive bias; Use flexible function approximators where we do not."
123-
* This project aims to explore effective Generative Modeling techniques for Natural Langauge Generation
124-
125-
* Two paths
126-
1. Improving text generation diversity by injecting randomness (or by anything else)
127-
* Existing text generation models tend to produce repeated and dull expressions from fixed learned modes.
128-
* E.g. "I do not know" for any questions in a Question Answering system.
129-
* With MLE training, models usually converge to the local maximal which is dominated by the most frequent patterns, thus losing text variety.
130-
* We aim to promote text diversity by injecting randomness.
131-
* \# NOTE: many existing works do this by using adversarial regularization (Xu et.al., Zhang et.al.) but I want to utilize the randomness of VAE. This idea is not so main-stream so I think I should do some prelimilary verification.
132-
* \# NOTE: I have had this idea since last year but have not seem any work about it. So if the prelimilary experiments do not work I may switch back to the existing line.
133-
2. Language generation guided by global semantics
134-
* Many recent works incorporte global semantic signals (e.g. topics) into sentence generation systems with latent variable models.
135-
* These models exhibit many advantages such as better generation quality (but also can be worse honestly), making the generation controllable (which is desirable for decades), and improving interpretability (but sometimes compromises quality).
136-
* This work explore the new methods to utilize global semantic signals with latent variable models to improve the downstream generation quality such as language variety.
137-
* \# NOTE: These two topics are the most compelling in my mind, but I cannot decide which one is more practical at this time (Feb 06 2019). Will do a survey this week and decide next week.
138-
139-
* Methods(tentative):
140-
* Every time one wants to say something, he will have certain _concepts_ in his mind. e.g. "lunch .. burger .. good"
141-
* At this stage, this _concept_ is not a sentence yet, it is a concept in his mind, he has not say it yet.
142-
* One has many ways to say this sentence, all the sentences are to some extent different from each other, but they all convey the same meaning. They are _paraphrases_ to each other.
143-
* We can think of different sentence realization of this _concept_ as different samples from the same distribution.
144-
* Because of stochasticity, each sample is different than each other, which is to say, **stochasticity induces language diversity**
145-
* Our idea is to use stochasticity to model language diversity.
146-
* We model one _concept_ as a Gaussian
147-
* We model different ways _realization_ of this concept as a mixture Gaussian, each component share the _concept_ Gaussian as their prior.
148-
* Given a sentence, we recover the mixture Gaussian, then we use different samples from the mixture to get different paraphrase of that sentence. -- This will require us to reparameterize through Gaussian Mixture, see (Grave 16).
149-
150-
* Assumptions
151-
* Simgle Gaussian cannot model stochasticity because of posterior collpse -- TO BE VERIFIED (but I think I have done this before, not 100% sure)
152-
153-
* Goal
154-
* Effectiveness: we can actually generate paraphrases
155-
* surface difference: lower BLEU of different paraphrases
156-
* semantic similarity: use a classifier to give similarity score
157-
158-
* Vision
159-
* upper bound: build new effective models (for one focused application.)
160-
* upper bound: investigating exising methods and gain a deeper understanding (thus giving a position paper).
161-
* lower bound: test existing state of the art models and analyse their pros and cons.
162-
* lower bound: continuous trial and error and get to know many ways that do not work.
163-
164-
* Related Works
165-
* Text Generation Models (with particular sentence quality objective)
166-
* Sentence Variational Autoencoders
167-
* Adversarial Regularization for Text Generation
168-
169-
## Code structures
170-
* AdaBound.py
171-
* config.py
172-
* controller.py
173-
* data_utils.py
174-
* hierarchical_vae.py
175-
* lm.py
176-
* main.py
177-
* seq2seq.py
178-
* similarity.py
179-
* vae.py

doc/.DS_Store

6 KB
Binary file not shown.

doc/latent_bow_camera_ready.pdf

1.44 MB
Binary file not shown.

etc/sample_sentences.png

316 KB
Loading

0 commit comments

Comments
 (0)