dotmap missing DC #77

sbma44 · 2019-09-19T19:31:45Z

results.openaddresses.io shows a run as recently as 9/10 and the sample data and cache file look ok to me. It's present in the US South extract (haven't checked global--should I?). Not sure what could be going on.

nvkelso · 2019-10-08T18:35:20Z

I hit this today, too:

iandees · 2019-10-08T18:36:51Z

Our style has four Mapbox studio layers ("southwest", "southeast", "northeast", "northwest") with the OA data split into each. When i look at one of these layers, the "select data" tab has an error e.g. "source layer 'openaddresses' or Source 'mapbox://open-addresses.southwest' not found.":

… but it's there:

When I try to select it, studio shows a different source getting selected at the top. (i selected southwest, but the source thing at the top shows southeast):

iandees · 2019-10-08T18:37:49Z

@migurski do you remember why we wrote out chunks of mbtiles instead of one big one?

migurski · 2019-10-08T18:41:26Z

Yeah, we split the world into four to slide under a Mapbox limitation on upload size: openaddresses/machine#631 (comment)

iandees · 2019-10-08T21:56:51Z

@sbma44 or @ingalls, can you (a) check if Mapbox's limitation on file upload size is still present and (b) if there's anything we can do to fix the Mapbox Studio issue described above?

ingalls · 2019-10-09T14:30:08Z

@iandees Looks like the limit is 25gb per Ref

What is the current size of the file, and are we uploading prebuild vector tiles, or raw geojson?

I'm not super familiar with the studio/tiles side of things but https://docs.mapbox.com/api/maps/#create-a-tileset looks like it could be a more permanent solution to our problem. However I image uploading individual tiles would take considerably longer than our current approach

migurski · 2019-10-09T16:12:07Z

We are uploading prebuilt MBTiles files full of vector tiles after running Tippecanoe in the build process.

ingalls · 2019-10-09T17:57:06Z

@migurski any chance you can pull the size of the current file? My guess is this is the issue once again :/

iandees · 2019-10-09T18:14:49Z

Looking at the logs, it seems that maybe the most recent two mbtiles builds on 9/29 and 10/1 stopped without finishing or uploading to Mapbox. I don't see any errors, so maybe the instance got terminated? It went from ~1600UTC to 0100UTC, which is ~8 hours. Maybe there's a time limit we need to bump up?

On the 9/11 build, most of the downloads failed because I changed permissions on the S3 bucket. The upload to Mapbox worked, though, so that's likely why there are so few points on the map as that's the most recent tileset in our account.

Actions to take:

Double-check machine's openaddr-update-dotmap is using the code that makes authenticated S3 requests instead of plain 'ol HTTP downloads (it is)
Increase the timeout on the job if there is one (although the slowness might come from trying to make requests through the CDN instead of using S3 directly) (timeout isn't the problem)

iandees · 2019-10-10T22:01:28Z

It looks like this is an issue with a hanging postgres query while pulling down the cached sources to use when iterating through all data.

This query is run and takes a very long time (times out after ~2 days):

SELECT MAX(id), source_path FROM runs
WHERE source_path IN (
    -- Get all source paths for successful runs in this set.
    SELECT source_path FROM runs
    WHERE set_id = 684369
  )
  -- Get only successful, merged runs.
  AND status = true
  AND (is_merged = true OR is_merged IS NULL)
GROUP BY source_path;

The Postgres analyzer says it should be relatively fast:

                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=9914.83..9937.92 rows=2309 width=32)
   Group Key: runs.source_path
   ->  Nested Loop  (cost=859.73..8098.89 rows=363188 width=32)
         ->  HashAggregate  (cost=850.31..850.45 rows=14 width=28)
               Group Key: runs_1.source_path
               ->  Index Scan using runs_set_ids on runs runs_1  (cost=0.42..843.36 rows=2780 width=28)
                     Index Cond: (set_id = 684369)
         ->  Bitmap Heap Scan on runs  (cost=9.42..516.46 rows=129 width=32)
               Recheck Cond: ((source_path = runs_1.source_path) AND status AND (is_merged OR (is_merged IS NULL)))
               ->  Bitmap Index Scan on runs_source_path_idx  (cost=0.00..9.39 rows=129 width=0)
                     Index Cond: (source_path = runs_1.source_path)
(11 rows)

I'll dive into why this is taking so long later.

iandees · 2019-10-17T13:55:18Z

With all the changes above, I'm seeing more dots on the dotmap now. Including DC:

The update process takes quite a bit longer now (it's still running) that we have more data, but it's going somewhere now!

iandees added the bug label Sep 19, 2019

iandees mentioned this issue Oct 10, 2019

Tell tippecanoe to drop densest points as needed openaddresses/machine#753

Merged

iandees mentioned this issue Oct 11, 2019

Possibly simplify the completed runs query? openaddresses/machine#756

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dotmap missing DC #77

dotmap missing DC #77

sbma44 commented Sep 19, 2019

nvkelso commented Oct 8, 2019

iandees commented Oct 8, 2019

iandees commented Oct 8, 2019

migurski commented Oct 8, 2019

iandees commented Oct 8, 2019

ingalls commented Oct 9, 2019

migurski commented Oct 9, 2019

ingalls commented Oct 9, 2019

iandees commented Oct 9, 2019 •

edited

Loading

iandees commented Oct 10, 2019

iandees commented Oct 17, 2019

dotmap missing DC #77

dotmap missing DC #77

Comments

sbma44 commented Sep 19, 2019

nvkelso commented Oct 8, 2019

iandees commented Oct 8, 2019

iandees commented Oct 8, 2019

migurski commented Oct 8, 2019

iandees commented Oct 8, 2019

ingalls commented Oct 9, 2019

migurski commented Oct 9, 2019

ingalls commented Oct 9, 2019

iandees commented Oct 9, 2019 • edited Loading

iandees commented Oct 10, 2019

iandees commented Oct 17, 2019

iandees commented Oct 9, 2019 •

edited

Loading