semantic-analyzer

Spark Semantic Analyzer Example

A few lines of code to demo how similarity of sentences can be checked base on their semantics works with Spark, in particular using [Word Mover's Distance] (http://jmlr.org/proceedings/papers/v37/kusnerb15.pdf) algorithm by Kusner.

To submit the job to an existing Spark installation you can package the job with the following command:

sbt package

and then submit it with the following command:

$SPARK_HOME/bin/spark-submit \
  --master $SPARK_MASTER \
  --jars $DEPENDENCIES \
  --class sentenceAnalyzer.SentenceSimilariyCheck \
  target/scala-2.10/sentence-similarity-analyzer-using-wmd_2.10-1.0.0.jar input.txt output.txt stopwords.txt googlew2v.tsv

Note

I have converted the [google-news-word2vec-bin] (https://code.google.com/archive/p/word2vec/) file to tsv using genism package available for python.

After running the sbt package command you'll find the required JARs in your local Ivy cache ($HOME/.ivy2/cache/)

You can understand the code in detail on [learningfrombigdata] (http://learningfrombigdata.com)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/main/scala/sentenceAnalyzer		src/main/scala/sentenceAnalyzer
LICENSE.md		LICENSE.md
README.md		README.md
build.sbt		build.sbt
stopwords.txt		stopwords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

semantic-analyzer

Spark Semantic Analyzer Example

Note

About

Releases

Packages

Languages

License

anshul-cached/semantic-analyzer

Folders and files

Latest commit

History

Repository files navigation

semantic-analyzer

Spark Semantic Analyzer Example

Note

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages