-
Notifications
You must be signed in to change notification settings - Fork 36
What does hash values of those individual addresses for? #683
Comments
The hash value is calculated as a content hash, and it can be used to determine that two addresses are identical between different runs of a single source. The values are calculated each time a source is run, and you might use it to detect new addresses. I hope this helps! |
@migurski Thank you for responding! Your answer certainly helps me. Just a few more questions after seeing your answer.
|
It’s calculated based on the entire row in conform.py#L1210-L1219, which will include lat/lon and other details. There’s no particular promise that rows in different sources will have identical hashes. Where I’ve looked into overlaps such as cities and counties in the Bay Area, sources will have subtly different locations, such as here: |
@migurski Thanks for all these helpful information! Here is probably the last relevant question: I saw there is a "fingerprint" field in openaddresses.com/state.txt, and I wonder if that fingerprints also reflect changes on ALL the content of their corresponding sources? In another word, can i count on those footprints to detect changes on each source? |
I believe the fingerprint is an MD5 hash of the entire source. For static files, this is a great indicator of change. For ESRI FeatureServer sources, it might be more volatile than you want. What are you hoping to do? |
@migurski I am hoping to keep a local copy of OpenAddresses in PostGIS and keep it updated daily. So I need to find a way to figure out which source got updated and when. Besides scraping the web pages of http://results.openaddresses.io/?runs=all#runs, do you have any suggestion? |
You might find the plaintext version of that page useful for this purpose: http://results.openaddresses.io/state.txt It’ll tell you that a source was changed: the URLs for our processed files are immutable, so if you’ve already downloaded a zip file once you shouldn't ever need to request that same file again. |
@migurski Got it. Should I monitor "fingerprint" or any other field for changes? Just a thought, I like the "cached date" in http://results.openaddresses.io/?runs=all#runs as it is straight forward, it might be nice to also have it in http://results.openaddresses.io/state.txt |
@migurski Sorry to bother you again. I guess state.txt might be the life saver. Do you know any official documentation explaining the details or meanings of those columns in state.txt? |
I don't think there's an official doc, but here's the description of the fields:
|
Documentation for some of these fields is in: https://github.com/openaddresses/openaddresses/blob/master/CONTRIBUTING.md |
This might be a dumb question. But I really wonder what those values, which are assigned to all addresses records, indicate.
Added this line: My curiosities could be summarized as these 3 questions below:
Will those values get updated once their addresses or their correspondent lat/long get changes?
Or is it just an index and will never be changed?
Will the same address have two different indices, If, for example, the same address are provided in two sources (e.g. locality and state government) , which is quite common since OpenAddresses keep address points from different sources even through one of them might cover another?
The text was updated successfully, but these errors were encountered: