Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No RKI updates since 4 days ... (Landkreis 16056 disappeared from RKI data set) #1748

Open
mathiasflick opened this issue Oct 4, 2021 · 16 comments

Comments

@mathiasflick
Copy link

There are no updates to the rki files since four days now (as of 2021-10-04, 20:45 local time).
Is there a problem with changes to the input data provided by RKI?
If yes, how can I help?

Greetings from Cologne
Mathias

@jgehrcke
Copy link
Owner

jgehrcke commented Oct 4, 2021

Thank you @mathiasflick for the report.

I had a quick look into logs and found

Traceback (most recent call last):
  File "tools/build-rki-csvs.py", line 499, in <module>
    main()
  File "tools/build-rki-csvs.py", line 52, in main
    df_by_lk, df_berlin_cases_sum, df_berlin_deaths_sum = fetch_and_clean_data()
  File "tools/build-rki-csvs.py", line 176, in fetch_and_clean_data
    assert lacking_wrt_ref == set([11000, 3152])
AssertionError

Looks like once again the set of amtliche gemeindeschlüssel changed in the RKI data set -- in the past that has always been a human error somewhere in the pipeline. The code might be overly strict. I might be able to precisely understand and fix this tomorrow. Hopefully.

@jgehrcke
Copy link
Owner

jgehrcke commented Oct 6, 2021

Data for this Landkreis were missing, recently:

  "16056": {
    "name": "SK Eisenach",
    "state": "Thüringen",
    "lat": 50.9833,
    "lon": 10.3167,
    "population": 42250
  },

@jgehrcke
Copy link
Owner

jgehrcke commented Oct 6, 2021

I may want to remove the lacking_wrt_ref check, update csv-epsilon-merge.py to allow for base set to contain more columns than extension set -- and then to forward-fill those columns.

@jgehrcke
Copy link
Owner

jgehrcke commented Oct 8, 2021

On vacation. Didn't get to this yet. Sorry about that :/

jgehrcke added a commit that referenced this issue Oct 20, 2021
In oct 2021 the AGS 16056 disappeared from
the RKI data set. That is, lacking_wrt_ref

was not

  set([11000, 3152])

anymore but

  set([11000, 3152, 16056])

which is when the program expectedly
crashed, leading up to issue #1748.

Now, deal with unexpected loss of a
column. This is accomodated for in the
epsilon merge tool which forward-fills
a column when it's in the base but not in
the extension.
@jgehrcke
Copy link
Owner

I have addressed this in #1827.

@jgehrcke
Copy link
Owner

I have looked at the data more closely to better understand what happened. The fact that 16056 disappeared from the RKI data set made me 'hope' that reporting for this Landkreis was merged with another Landkreis.

Indeed, there is a pretty suspicious case numer jump for Landkreis 16063 at the time when the case count for Landkreis 16056 did not change anymore:

Screenshot from 2021-10-20 13-44-07

That jump is specifically from 8579 to 10572:

>>> 10572 - 8579
1993

The last reported case count value for Landkreis 16056 was 1975.

I think we can safely conclude that on September 12, reporting for Landkreise 16056 and 16063 was merged, and reported together under AGS 16063.

@jgehrcke
Copy link
Owner

jgehrcke commented Oct 20, 2021

With the solution from #1827 I have now retained Landkreis 16056 in the CSV files, simply forwarding the last known value (1975). That's incorrect, the value should drop to 0 so that the sum over the Landkreise evolves more correctly. Given the relatively small number though I think I will just leave this as-is. Feedback appreciated.

@jgehrcke jgehrcke changed the title No RKI updates since 4 days ... No RKI updates since 4 days ... (Landkreis 16056 disappeared from RKI data set) Oct 20, 2021
@jgehrcke
Copy link
Owner

jgehrcke commented Oct 20, 2021

I have just looked at the columns 16056 and 16063 the RL data set. They have seemingly be synced a while ago: they contain the same values, for the entire time range of interest. (that is, the sum is also wrong)

@jgehrcke
Copy link
Owner

jgehrcke commented Oct 20, 2021

The two landkreise in question:

  "16056": {
    "name": "SK Eisenach",
    "state": "Thüringen",
  "16063": {
    "name": "LK Wartburgkreis",
    "state": "Thüringen",

on a map:
Screenshot from 2021-10-20 13-57-44

(from https://www.bik-gmbh.de/download/Gebietsreform_Thueringen_zum_GS1906.pdf)

@jgehrcke
Copy link
Owner

jgehrcke commented Oct 20, 2021

So, I think it's fair to say that Eisenach, kreisfreie Stadt case numbers are reported as part of Wartburgkreis, which geographically and organizationally might make sense.

@mathiasflick
Copy link
Author

Some research regarding local reporting of corona-related indicators (e.g. for Eisenach and Wartburgkreis) clearly support your assumption - although I was not able to find any kind of official confirmation. Probably it is a politically motivated move in order to get "better" (i.e. lower) numbers by averaging the high one out ... But that is just my personal opinion!
Anyway - this kind of "summarization" does create problems with the processing of data in dependent systems - leaving zero values and/or grey areas like e.g in the RKI dashboard:

Screenshot 2021-10-23 at 15-23-28 RKI COVID-19 Germany

By the way, the zero for Luckenwalde/Parchim is caused by a hacking incident - they are not able to deliver ...
Source: https://www.kreis-lup.de/corona/

Greetings from Cologne
Mathias

@jgehrcke
Copy link
Owner

Thank you Mathias for the additional insight! Huh. :)

@jgehrcke
Copy link
Owner

RL did drop the data colums for landkreis 16056 and that required further patches -- done in #1842.

Both the RL and RKI heatmaps now show 16056+16063 both using the data from 16063.

@mathiasflick
Copy link
Author

Perfect! Thank you so much for your work!
Now I need to start my own upstream patching ...
Greetings from Cologne
Mathias

@mathiasflick
Copy link
Author

After a little bit of research I probably found the reason for the unexpected change:
According to information provided by the state of Thüringen, Eisenach was officially made part of the Wartburgkreis (effective as of 2021-07-01).
Source: https://statistik.thueringen.de/datenbank/gemauswahl.asp
A problem remaining for me (I just do not remember ...) is, where we get the population from (ags.json) and whether the change is already incorporated there (important for 7di computation) and when officially updated maps (shapefiles) will be available.
Thank you again and greetings from Cologne
Mathias

@jgehrcke
Copy link
Owner

jgehrcke commented Nov 3, 2021

A problem remaining for me (I just do not remember ...) is, where we get the population from (ags.json) and whether the change is already incorporated there

Hey Mathias. Ouch. Thank you for that reminder. I will have to double-check, but it's likely that 7di number have been a little off for 16063 because I didn't think this through before. Thank you!

Keeping track of this topic here: https://github.com/opstrace/opstrace/issues/1472

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants