GitHub - faridsara/lexical-semantic-change-detection: train an unsupervised doc2vec model to do binary classification and ranking of words from atwo corpora

faridsara / lexical-semantic-change-detection Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

train an unsupervised doc2vec model to do binary classification and ranking of words from atwo corpora

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
trial_data_public		trial_data_public
.gitignore		.gitignore
Doc2Vec.py		Doc2Vec.py
README.txt		README.txt

Repository files navigation

Required packages to run:

gensim==3.8.1
scikit-learn==0.22.1

***The results for task 1 and task 2 are placed in trial_data_public\results.
The training stage may take around 2-3 minutes to complete.
Also due to the randomness of Doc2Vec algorithm, running the program again may produce different embeddings and therefore different distances (which could result in different results for task 1 and task 2)
To combat this I ran the program 10 times, and took the mean of the distances for each word, and used those values for my results.

I chose a threshold distance of 0.4 for task 2.