We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
尊敬的作者,您好: 拜读了您的文章,发现负样本是借助Spacy工具对句子进行句子解析,然后将原始样本的否定内容作为软负样本。如果不使用这种方式,而是借助BM25算法选择负样本,或者借助一个已经训练好的simcse来选择负样本,这种方式可行嘛,您有没有做过对比实验?
The text was updated successfully, but these errors were encountered:
您好!谢谢您的关注!个人感觉主要问题在于怎么选择比较难的正样本负样本,就是和原样本看起来差的很远,但是语义很接近,或者看起来比较像,但语义差别比较大。BM25是稀疏检索方法,检索到的原样本在词和表达方面会很接近,但是我们无法确定其语义差别,因此很难确定是作为负样本还是正样本。simcse倒是理论上可行,我们没有做过相关尝试,不过道理上讲,如果一个句子被simcse误判为正样本,那么新模型训练的时候该句子会被一直作为正样本,没有改正的机会,因此也很难去找到比较难的正样本负样本。这就是我对这两种思路的认识,一家之言,仅供参考哈!
Sorry, something went wrong.
No branches or pull requests
尊敬的作者,您好:
拜读了您的文章,发现负样本是借助Spacy工具对句子进行句子解析,然后将原始样本的否定内容作为软负样本。如果不使用这种方式,而是借助BM25算法选择负样本,或者借助一个已经训练好的simcse来选择负样本,这种方式可行嘛,您有没有做过对比实验?
The text was updated successfully, but these errors were encountered: