GitHub - hsgodhia/independent-study: this implements a skip-gram model to specialize word embeddings to capture word similarity and not just trivial word relatedness, this is done by training on PPDB. Also includes more experiemnts on PPDB such as re-ranking using the DAN(deep averaging network as in Iyyer et al)

Independent Study

Experiments tried include

Deciding on the dataset variant of PPDB: Trying XXL, XL gave a lot of pairs of words which were non-informative, redundant and most importantly high variance. For instance for the word "discarded" it is in pairs with

651 other words if we look into the XXL database
251 in the XL
115 in the L

The final count also filters out all pairs * Which have a PPDB2.0Score of less than 3.3. * We compute the edit distance between each word of the PPDB pair to threshold it to capture word overlap and redundancy between a word pair. It is important to remove redundant pairs else the glove vectors are not updated since every word is closely in context of every other word and a discriminating signal is not provided to differentially update the word embedding

Deciding on the Loss function: There are many variants of the loss function.
- The one used in the Weitling Paper depends on only a single negative sample and is equivalent to a max margin loss function. Further, this method choose the negative sample from the same mini-batch and is made as similar as possible to the target-context word
- I have followed the skipgram negative sampling loss function provided in the Word2vec paper with approximately 60 negative examples of context per positive context word for the given target word
  - Under this section we have two variants, one which uses a sigmoid output coupled with a binary cross entropy loss and the other is simply calculating the logsigmoid and negative logsigmoid and consider it as the total loss. I did not find a noticeable difference between either of these two sub variants
Deciding on the batch size: Through experimenting a very counter intuitive feature I noticed that significantly impacts the optimization process is the batch size. I initially tried a batch size of value 100,000 50,000 where basically I was trying to pack in as many samples as my system RAM could support. This turns out to be very incorrect as I was noticing no decrease in the loss and correspondingly tried smaller batches of size 100, 500, 1000 and noticed 100 to work best
Embedding weight initializations: The skip-gram model of word2vec utilizes two matrices the word embedding matrix which is used to lookup embeddings for the target words and the context embedding matrix which is used to lookup word context embeddings. In our model we follow the same convention but we initialize these matrices to the word embeddings of glove. Note: this is different from the model of Weitling et al who use only one matrix for both the target word and context word embedding lookup. Within this purview we could try many experiments
- Random initialization vs pre-trained glove: With random initialization we get gibberish results because the data set we have of about 200,000 pairs of sentences is not supportive of full training but suitable of fine tuning which we follow.
- Dimension of embedding: As expected increasing the number of dimensions does increase the nearest neighbor quality for a given query word but for our experiments we set it to 50 since with 300 embedding the computation time is expensive
Different Optimizers: I experimented with three different optimizers - (Adam, Adagrad, SGD) and found the best to be SGD with a constant learning rate of 0.1. Although, I believe that implementing a learning rate decay mechanism may improve the model performance. We notice converge in loss value after approximately 10 epochs

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
src		src
README.md		README.md
bilstm_model.py		bilstm_model.py
dan_net.py		dan_net.py
data_processing.ipynb		data_processing.ipynb
evaluation.py		evaluation.py
final_report.pdf		final_report.pdf
glove_exp.ipynb		glove_exp.ipynb
log.txt		log.txt
ppdb_processor.py		ppdb_processor.py
ppdb_train_results.pdf		ppdb_train_results.pdf
results_doc.docx		results_doc.docx
sent_enc.py		sent_enc.py
skipgram_ppdb_model.py		skipgram_ppdb_model.py
sneha_antonym.csv		sneha_antonym.csv
sneha_label.csv		sneha_label.csv
test.py		test.py
varrying_optimizer_results.txt		varrying_optimizer_results.txt
word2vec_test.ipynb		word2vec_test.ipynb
word_vec_negsmpl.py		word_vec_negsmpl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Independent Study

Sample results

About

Releases

Packages

Languages

hsgodhia/independent-study

Folders and files

Latest commit

History

Repository files navigation

Independent Study

Sample results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages