-
Notifications
You must be signed in to change notification settings - Fork 36
Source pages misrepresenting the actual process #640
Comments
We have a 10 day timeout on batch runs — older versions are re-used if a new batch process is requested within that time. Is it possible that’s what you are seeing? |
What I am seeing makes no sense.
The previous processing (2017-05-11) was finished way before last batch was
started, and (according to the log) was finished in approximately 6
minutes. No chance of reaching 10 day timeout limit.
If nothing was being processed for some legitimate reason then the date
should not be changed in that row, where all other links indicate to older
data and log.
I suspect the same thing happened in the previous (2015-05-15) batch run,
but have no diff to support my suspicion.
|
Here’s a simplified view of what’s in the database for Slovenia:
Run 195041 (second-to-last row) is public in Set 194433, and it was just a copy of Run 188019 from May 11 from within the prior ten days. Anything with a value in the copy of column is like that. Looks like that newest one, 195206, was processed in this PR about a day ago. Sorry if this is confusing! Many of the tradeoffs here are designed to get something recent-enough onto the OA page without unnecessarily re-processing upstream sources too frequently. |
Seeing so many records with the notion of "copy of" is a whole different story, more explanatory than just date being changed within the same, existing row, while keeping the linked data and logs old (as seen in the diff screenshot above). Recent-enough sources could be skipped altogether, changing the date of an old import (even if just visually) does not look sane. Perhaps the solution to confusion is to:
Both could be achieved by improving the individual source page, making it show:
|
Great suggestions, Stefan! Would you be interested in making a contribution to the project? I could help you get set up with the code so that you could develop these changes directly. |
http://results.openaddresses.io/sources/si/countrywide
Is not getting any new rows with each new batch, just the date in first row changes, but everything else stays the same.
In the last batch run date changed from 2017-05-15 to 2017-05-20, but linked log (https://s3.amazonaws.com/data.openaddresses.io/runs/188019/output.txt ) is still showing timestamps from 2017-05-11 (one of the previous batches). I would expect existing rows to remain the same, but new added on top of the table with new links and new addresses count (now 554000+, as new source zip is prepared daily).
Diff:
The text was updated successfully, but these errors were encountered: