Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

软负样本构建疑问 #8

Open
aidejieceng opened this issue Nov 24, 2022 · 1 comment
Open

软负样本构建疑问 #8

aidejieceng opened this issue Nov 24, 2022 · 1 comment

Comments

@aidejieceng
Copy link

尊敬的作者,您好:
拜读了您的文章,发现负样本是借助Spacy工具对句子进行句子解析,然后将原始样本的否定内容作为软负样本。如果不使用这种方式,而是借助BM25算法选择负样本,或者借助一个已经训练好的simcse来选择负样本,这种方式可行嘛,您有没有做过对比实验?

@phoenixsecularbird
Copy link
Collaborator

您好!谢谢您的关注!个人感觉主要问题在于怎么选择比较难的正样本负样本,就是和原样本看起来差的很远,但是语义很接近,或者看起来比较像,但语义差别比较大。BM25是稀疏检索方法,检索到的原样本在词和表达方面会很接近,但是我们无法确定其语义差别,因此很难确定是作为负样本还是正样本。simcse倒是理论上可行,我们没有做过相关尝试,不过道理上讲,如果一个句子被simcse误判为正样本,那么新模型训练的时候该句子会被一直作为正样本,没有改正的机会,因此也很难去找到比较难的正样本负样本。这就是我对这两种思路的认识,一家之言,仅供参考哈!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants