Japanese text being identified as Kurmanji #7

ftkurt · 2020-06-07T02:18:30Z

It's probably because of this line.

language-detector/language_detector/prep/char_language.txt

Line 73 in e960b59

– Kurmanci 7.2974490546991655

See the following texts:

୨୧譲渡交換୨୧ ツイステ色紙コレクション vol.1 vol.2 譲┊︎デューストレイケイトジャミルオルトシルバー求┊︎同異種リドル or 定価(＋送料) 郵送 or 都内手渡し可能 ⿻ 各1BOX予約済みです。 ⿻…

東映HP更新✨ 来週はガルザとクランチュラがジャメンタルを研究🔍録りおろしナレーションたっぷりでお届けします！そしてHPで #キラトーーク延長戦！？魔進の声を演じるキャストのテンションMAX！なコメントを掲載しております✨ #キラ…

「DXヒューマギアプログライズキーセット」はご予約受付中！シェスタ、腹筋崩壊太郎、マモル、一貫ニギローのデータを宿したプログライズキーのセットです✨ 別売りのDXなりきりシリーズとも連動します。 URL…

DanielJDufour · 2020-06-07T02:42:13Z

Hi, @ftkurt . Thank you for identifying this issue! This package doesn't support Japanese yet, but it's easy to add a language. Would you like to submit a pull request? The documentation on how to add a language is here: https://github.com/DanielJDufour/language-detector/blob/master/CONTRIBUTING.md

ftkurt · 2020-06-07T14:06:10Z

I briefly looked at Japanese character sets, and it seems its a bit different than other languages as they have multiple sets. Therefore, I would rather prefer someone knowledgeable about Japanese do that. However, I am currently working on collecting Sorani and Kurmanji datasets. I might be able to add more data for those two Kurdish dialects in the coming days. I think this will help with making this package more reliable.

DanielJDufour · 2020-06-07T16:17:28Z

That would be great! Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Japanese text being identified as Kurmanji #7

Japanese text being identified as Kurmanji #7

ftkurt commented Jun 7, 2020

DanielJDufour commented Jun 7, 2020

ftkurt commented Jun 7, 2020

DanielJDufour commented Jun 7, 2020

Japanese text being identified as Kurmanji #7

Japanese text being identified as Kurmanji #7

Comments

ftkurt commented Jun 7, 2020

DanielJDufour commented Jun 7, 2020

ftkurt commented Jun 7, 2020

DanielJDufour commented Jun 7, 2020