Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[new] Add a trick for StaticEmbedding #317

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

[new] Add a trick for StaticEmbedding #317

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Aug 13, 2020

Description:修改StaticEmbedding类中的_load_with_vocab方法,首先读取所有预训练词向量,然后遍历vocab中的word,依次判断原始word、全小写的word、全大写的word以及首字母大写的word是否存在于预训练词向量中,即:原始word匹配失败的话就为word分配一个语义尽可能相似的预训练词向量,从而提升vocab中word匹配到预训练词向量的概率。

Main reason: 原始的_load_with_vocab方法只在读入预训练词向量时,对预训练词向量中的word与vocab中的word进行硬匹配,因此匹配成功率很低,对最终的实验效果影响很大。

Checklist 检查下面各项是否完成

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (例如[bugfix]修复bug,[new]添加新功能,[test]修改测试,[rm]删除旧代码)
  • Changes are complete (i.e. I finished coding on this PR) 修改完成才提PR
  • All changes have test coverage 修改的部分顺利通过测试。对于fastnlp/fastnlp/的修改,测试代码必须提供在fastnlp/test/
  • Code is well-documented 注释写好,API文档会从注释中抽取
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change 修改导致例子或tutorial有变化,请找核心开发人员

Changes: 逐项描述修改的内容

  • 修改了StaticEmbedding类中的_load_with_vocab,在匹配预训练词向量时增加了多轮匹配,提升vocab中word匹配到预训练词向量的概率。

Mention: 找人review你的PR

@修改过这个文件的人
@核心开发人员

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

0 participants