Skip to content

Persian word embedding ( نشاننده واژه ها فارسی | تعبیه سازی کلمات فارسی )

License

Notifications You must be signed in to change notification settings

ashalogic/Persian-Word-Embedding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

*still under construction

Persian Word Embedding

Word is a word embedding model?

Word embedding is one of the most popular representation of document vocabulary. It is capable of capturing context of a word in a document, semantic and syntactic similarity, relation with other words, etc.








So what now?

Hmm not very important but maybe the only place you can find all word embedding for Persian to Train or just download the Pretrained version and of course one important thing is that, here I collect current best Models (2019) and I made a Lite version of them to use in your JS or Android or C# or ... Application without using Online API or...

Important note: some models currently have pretrained version for Persian so I just made them lite

Corpus? Wikipedia!




Where to Download Wikipedia Corpus?

You can see backup status of Wikipedia in each language here. And you can see backup versions you can download for Persian Wikipedia here. Choose "latest" because we want to use the newest version. And we need to download fawiki-latest-pages-articles-multistream.xml.bz2 in the files.

Here is some others Corpus to Download




Models

  • #Fasttext FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.

Project status

  • Google’s Universal Sentence Encoder (This one is not Public Available)
    • Train.py
    • Train.ipynb
    • Model.bin
    • Model_Lite.bin
    • Model.vec
    • Model_Lite.vec
  • FasTText
    • Train.py
    • Train.ipynb
    • Model.bin
    • Model_Lite.bin
    • Model.vec
    • Model_Lite.vec
  • ELMo
    • Train.py
    • Train.ipynb
    • Model.bin
    • Model_Lite.bin
    • Model.vec
    • Model_Lite.vec
  • Word2Vec
    • Train.py
    • Train.ipynb
    • Model.bin
    • Model_Lite.bin
    • Model.vec
    • Model_Lite.vec
  • Glove
    • Train.py
    • Train.ipynb
    • Model.bin
    • Model_Lite.bin
    • Model.vec
    • Model_Lite.vec

About

Persian word embedding ( نشاننده واژه ها فارسی | تعبیه سازی کلمات فارسی )

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages