-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect parsing of Japanese administrative division data #654
Comments
there's something going on with Japanese addresses and the way the training data was prepared. It may have to do with the way that tokens such as "市" are optionally left off. In the address_parser cli you can type could try grepping out the Japanese addresses, similar to what you have above, and training with those specifically, might improve things slightly. Some language/country-specific models have been trained previously. Another option may be to strip off the admins with a regex and then parse the remainder. |
Hi!
I was checking out libpostal, and saw something that could be improved.
My country is Japan
Here's how I'm using libpostal
parsing “茨城県取手市井野台”
Here's what I did
./address_parser
Here's what I got
{
"state": "茨城県取手",
"city": "市",
"city_district": "井野台"
}
Here's what I was expecting
{
"state": "茨城県",
"city": "取手市",
"city_district": "井野台"
}
For parsing issues, please answer "yes" or "no" to all that apply.
yes
yes
yes, the result of "井野台取手市茨城県" is right
not applicable
not applicable
Here's what I think could be improved
I checked the training file of 20170304 and found that this address was included, but it could not be parsed when parsed. Should I add more places data and retrain?
The text was updated successfully, but these errors were encountered: