Skip to content

Tool for Evaluating Multilingual WS-353 and SimLex-999

Notifications You must be signed in to change notification settings

Nempickaxe/eval-multilingual-simlex

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Tool for Evaluating Multilingual WS-353 and SimLex-999

Multilingual versions of WordSim-353 and SimLex-999 datasets are a valuable new resource for evaluating word vector spaces. A full description of the datasets can be found on their webpage.

This repository provides a script to evaluate collections of word vectors with respect to the four supported languages (English, German, Italian and Russian). The script reports the SimLex-999 and WS-353 scores (and coverage), as well as the scores for the WS-353 similarity and relatedness subsets.

###Usage

python evaluate.py word_vector_location language

The word vectors file should list one entry per line, with each word followed by the word vector itself. The words can either contain no language prefixes or language prefixes of the following form: en_dog, de_Hund, it_cane, ru_собака.

About

Tool for Evaluating Multilingual WS-353 and SimLex-999

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%