-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
“成都”the two chinese words won't recognize #132
Comments
from flashtext import KeywordProcessor text = "成都到北京高铁3小时,郑州到成都2小时" print(len(kp)) 2 Reference:https://blog.csdn.net/chen10314/article/details/122048726 |
still not a good solution |
from flashtext import KeywordProcessor
#text = "@苍月轶 再次核实:骆然5月8日持24小时核酸从宜昌回蓉,到成都24小时内核酸一次,9号回泸定,24小时内又做一次核酸,均阴性,健康码绿码。宜昌不是
AB区域。"
text = "成都到北京高铁3小时,郑州到成都2小时"
print(text)
kp = KeywordProcessor()
kp.add_keyword("到成都", ("成都", "ab"))
kp.add_keyword("宜昌", ("宜昌", "ab"))
print(len(kp))
print(kp)
word_index = kp.extract_keywords(text, span_info=True)
print(word_index)
for item in word_index:
print(text[item[1]:item[2]])
print('finished')
The text was updated successfully, but these errors were encountered: