Word2Vec Demo

Simple scala project showcase of Word2Vec capabilities with web interface provided.

Parameters used for training chosen based on:

Training time on local machine, hence some parameter values can be incresed for better accuracy
Inspired by these 2 papers Word2Vec Explained and original Google paper Distributed Representations of Words and Phrases and their Compositionality
Inferred reading deeplearning4j source code

  new Word2Vec.Builder()
    .minWordFrequency(25)
    .iterations(2)
    .epochs(1)
    .layerSize(200)
    .batchSize(25)
    .seed(42)
    .useVariableWindow(5, 7)
    .negativeSample(0.1)
    .sampling(0.01)

Train the model

To train the model you need to run wordembedding.scripts.Word2VecEmbedding script. Make sure you satisfied with chosen parameters. The script will read file provided as source argument and will save the model in a file, provided as output argument. E.g.: You can run it using sbt with mem option specified:

sbt -mem 12000 "project word2vec" \
"run-main wordembedding.scripts.Word2VecEmbedding \
 --source TEXT_SOURCE_LOCAL_PATH --output MODEL_LOCAL_PATH"

if you need to pick column from csv file, you can specify csv-column option E.g.: to pick only text for column 2

sbt -mem 12000 "project word2vec" \
"run-main wordembedding.scripts.Word2VecEmbedding \
 --source TEXT_SOURCE_LOCAL_PATH --output MODEL_LOCAL_PATH
 --preprocess --csv-column 2"

for testging purposes you can use first N lines of original file E.g.: to concider only first 10000 lines

sbt -mem 12000 "project word2vec" \
"run-main wordembedding.scripts.Word2VecEmbedding \
 --source TEXT_SOURCE_LOCAL_PATH --output MODEL_LOCAL_PATH
 --preprocess --csv-column 1
 --lines 10000

Run web interface

Make sure to specify model file

word2vec {
  model_file = MODEL_LOCAL_PATH
}

Then you can run play project

sbt "project word2vec_web" run

Access the web page busing regular play homepage url

Have fun...

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
project		project
word2vec/src/main		word2vec/src/main
word2vec_web		word2vec_web
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Word2Vec Demo

Train the model

Run web interface

About

Uh oh!

Releases

Packages

Languages

vantoniuk/ml_experiments

Folders and files

Latest commit

History

Repository files navigation

Word2Vec Demo

Train the model

Run web interface

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages