-
Notifications
You must be signed in to change notification settings - Fork 652
Geocode Caching implementation
When using geocoders with CartoDB most of the time there is an associated cost on performing those geocodes. This makes it hard for testing and invalidate cases when you have to recreate multiple times the same table and it is not a good idea to geocode again what was geocoded before.
Therefore we propose a Geocoder caching service. It will be used by any service that makes calls to external geocoder. The idea is to use a CartoDB database directly so it is managed with the rest of the infrastructure.
The GeocodingCache CartoDB database will have the following table:
geocodes(cartodb_id::serial,the_geom::geometry,geocode_string::text,counter::integer)
This table will store all geocoding rows being sent to the service, so it can grow to millions.
We will have the following index:
geocode_hash_idx on md5(geocode_string)
CREATE INDEX geocode_md5_idx on geocodes_1 (md5(geocode_string))
The geocoding service, in Ruby right now, will do the next steps:
- Try to resolve the addresses first through the caching with a query like:
WITH addresses(address) AS (
VALUES(
'Eloy Gonzalo 27,Madrid,Spain'),
('pepe'),
('sdsdsd')
)
SELECT st_x(g.the_geom) longitude,st_y(g.the_geom) latitude,g.geocode_string FROM addresses a LEFT JOIN geocodes_1 g ON md5(g.geocode_string)=md5(a.address)
This will return all rows sent to the service, but some will have latitude/longitude and others not. Obviously those with location info can be used for geocoding.
- Ruby will take those that he does not have location and will perform a query to the external geocoder service. Obviously if all records were match, no call to the geocoder will be needed.
It will obtain a list of geocodes with the string. With the aggregation of the values taken from the geocoding cache + the values obtained through the external geocoder it can return already the response to the client.
- Before so, ruby will go and register those newly obtained geocodes with something like this:
It will only of course insert the records that were not found on the cache and that has found locations.
INSERT INTO geocodes_1(geocode_string,the_geom)
VALUES
('Eloy Gonzalo 27,Madrid,Spain',CDB_LatLng(10,10)),
('pepe',CDB_LatLng(100,20)),
('sdsdsd',CDB_LatLng(60,40))