-
Notifications
You must be signed in to change notification settings - Fork 767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix villages data #11
base: master
Are you sure you want to change the base?
Conversation
This data file is created by a script extracting data from http://mfdonline.bps.go.id/ . See https://github.com/edwardsamuel/Wilayah-Administratif-Indonesia/blob/master/scripts/run.sh#L12 It is not useful to modify this generated file. Your changes will be overwritten when the script runs next time. Is the BPS data wrong? If it is wrong, it needs to be fixed in the BPS source. You can see "ALUE DUA MUKA 0" and "SITIMERT0" are used in Other occasions where this data has appeared; https://www.google.com/search?q=%22SITIMERT0%22+%223506190010%22 And a 'bot' created Wikipedia articles: And it appears in a wordlist here: |
If we can confirm that the BPS data is wrong, one solution is for this repository to have a 'fixes' list, which |
Hi @prasastoadi, Agree with @jayvdb. Any generated files can't be edited manually. It will be overwritten in the next run. You need to modify the script that generates the files, in this project can be |
I am very confident that the two villages name are wrong. We know that 0 (zero) is not alphabet. Here is the Sitimerto village Alue Dua Muka O I propose very simple method before write the data to csv. |
@@ -72,6 +77,7 @@ def write_dict_to_csv(fname, data_dict, upper_level_key_length=0): | |||
def main(argv): | |||
if (len(argv) > 0): | |||
read_html_data(argv[0] + '/' + argv[1]) | |||
fix_villages({1105130121: 'ALUE DUA MUKA O', 3506190010: 'SITIMERTO'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method is quite dangerous. In case BPS rename 1105130121
and 3506190010
, the generated data will be not following BPS update. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, @prasastoadi only found issue for those two villages, what about the other data. Did he had already check entire village data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#16 is a way to check for more problems. But I think we should not wait for all problems to be found. They will be reported when people find them.
And we cant wait for government to fix them. That doesnt happen quickly.
But the fixes should be optional, so we can still use this repo's tools to obtain raw data.
IMO it's pointless to update anything in this repo while the source data from BPS still remain wrong. Dear @prasastoadi, one thing that you should do is ask BPS to update their data instead. |
Maybe fixes should be wrapped in a separate function call (and possibly separate data file), so that users can easily apply all fixes on top of the existing data. |
Fix '0' to 'O'
ALUE DUA MUKA 0 -> ALUE DUA MUKA O
SITIMERT0 -> SITIMERTO