Skip to content

关于预训练词向量加载报错 #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jwc19890114 opened this issue Apr 17, 2019 · 1 comment
Open

关于预训练词向量加载报错 #2

jwc19890114 opened this issue Apr 17, 2019 · 1 comment

Comments

@jwc19890114
Copy link

在language model中,看到要加载word2vec.6B.100d这个预训练模型,我使用的是glove.6B.50d,但是会报错。求解

Traceback (most recent call last):
File "D:/DesktopBackup/right/MLHomework/AllenNLP/[NLP]Pytorch17_torchTextDemo.py", line 75, in
wvmodel = gensim.models.KeyedVectors.load_word2vec_format(r'D:\DesktopBackup\right\MLHomework\AllenNLP\data\glove.6B.50d.txt', binary=False, encoding='utf-8')
File "C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 1476, in load_word2vec_format
limit=limit, datatype=datatype)
File "C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\utils_any2vec.py", line 344, in _load_word2vec_format
vocab_size, vector_size = (int(x) for x in header.split()) # throws for invalid file format
File "C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\utils_any2vec.py", line 344, in
vocab_size, vector_size = (int(x) for x in header.split()) # throws for invalid file format
ValueError: invalid literal for int() with base 10: 'the'

@atnlp
Copy link
Owner

atnlp commented Apr 30, 2019

word2vec和glove的格式不同,你需要将glove转化为word2vec的格式,gensim有这个功能。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants