Word embedding is one of the most popular representation of document vocabulary. It is capable of capturing context of a word in a document, semantic and syntactic similarity, relation with other words, etc.
Hmm not very important but maybe the only place you can find all word embedding for Persian to Train
or just download the Pretrained
version and of course one important thing is that, here I collect current best Models (2019) and I made a Lite
version of them to use in your JS or Android or C# or ... Application without using Online API or...
Important note: some models currently have pretrained version for Persian so I just made them lite
You can see backup status of Wikipedia in each language here. And you can see backup versions you can download for Persian Wikipedia here. Choose "latest" because we want to use the newest version. And we need to download fawiki-latest-pages-articles-multistream.xml.bz2 in the files.
- #Fasttext FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.
- Google’s Universal Sentence Encoder (This one is not Public Available)
- Train.py
- Train.ipynb
- Model.bin
- Model_Lite.bin
- Model.vec
- Model_Lite.vec
- FasTText
- Train.py
- Train.ipynb
- Model.bin
- Model_Lite.bin
- Model.vec
- Model_Lite.vec
- ELMo
- Train.py
- Train.ipynb
- Model.bin
- Model_Lite.bin
- Model.vec
- Model_Lite.vec
- Word2Vec
- Train.py
- Train.ipynb
- Model.bin
- Model_Lite.bin
- Model.vec
- Model_Lite.vec
- Glove
- Train.py
- Train.ipynb
- Model.bin
- Model_Lite.bin
- Model.vec
- Model_Lite.vec