Ngnix Timeout Issue while building the load_datasets() Dictionary . #103

sjanga1736 · 2024-11-25T14:09:18Z

Hi Andrew,

I have 2000+ files that are included in the configuration sourced from S3 where file sizes are varying from 100 KB to 500 GB. on initial loading of dataset configurations (config.yaml) and creating the appropriate dataset objects using this datasets = {d["name"]: Dataset.from_config(**d) for d in config["datasets"]} i am running into 504 tIme out Error (though nginix was configured with timeout of 120s). How to resolve this.

Is their any better way to create datasets on startup when there are huge number of files?
or
do i need to create explicitly datasets based on dataset name?

The text was updated successfully, but these errors were encountered:

ajnisbet · 2025-01-03T23:51:06Z

How are your S3 files being added: are you mounting your S3 bucket to the local filesystem? Could you share your config.yaml file?

If all your S3 files are in the same projection, it would be fastest to prebuild a single VRT file referencing all the S3 files. Then opentopodata will see your dataset as a single file and not have to reach out to S3 when loading the datasets. There's an example of building a VRT from S3 files here: https://www.opentopodata.org/notes/cloud-storage/

sjanga1736 · 2025-01-28T04:59:04Z

This is the sample config file where the dataset projections are different based on the country and i am mounting S3 bucket to ec2 instance (locally)

{ "access_control_allow_origin": "*", "datasets": [ { "name": "32750", "path": "cloud_data/32750/" }, { "name": "32752", "path": "cloud_data/32752/" }, { "name": "32753", "path": "cloud_data/32753/" }, { "name": "32754", "path": "cloud_data/32754/" }, { "name": "32755", "path": "cloud_data/32755/" }, { "name": "32756", "path": "cloud_data/32756/" }, { "name": "auckland", "path": "cloud_data/auckland_contours/auckland-1m-dem-2013-vrt/" }, { "name": "christchurch", "path": "cloud_data/christchurch_contours/christchurch_1m_dem_2018_vrt/" }, { "name": "hawkesbay", "path": "cloud_data/hawkesbay/" }, { "name": "CA_NoCAL_Wildfires_B4_2018", "path": "cloud_data/california_contours/california/CA_NoCAL_Wildfires_B4_2018/CA_NoCAL_Wildfires_B4_2018_vrt/" }, { "name": "CA_SanBernardinoCo_AreaA_2013", "path": "cloud_data/california_contours/california/CA_SanBernardinoCo_AreaA_2013/CA_SanBernardinoCo_AreaA_2013_vrt/" }, { "name": "CA_SanBernardinoCo_AreaB_2013", "path": "cloud_data/california_contours/california/CA_SanBernardinoCo_AreaB_2013/CA_SanBernardinoCo_AreaB_2013_vrt/" }, { "name": "CA_SanDiegoQL2_2014", "path": "cloud_data/california_contours/california/CA_SanDiegoQL2_2014/CA_SanDiegoQL2_2014_vrt/" }, { "name": "AZ_CORiverBasin_L1_2014", "path": "cloud_data/california_contours/california/AZ_CORiverBasin_L1_2014/AZ_CORiverBasin_L1_2014_vrt/" }, { "name": "AZ_ColoradoRiverLot2_2014", "path": "cloud_data/california_contours/california/AZ_ColoradoRiverLot2_2014/AZ_ColoradoRiverLot2_2014_vrt/" }, { "name": "CA_Santa_Clara_DEM_2020_9330", "path": "cloud_data/california_contours/california/CA_Santa_Clara_DEM_2020_9330/" }, { "name": "CA_Eastern_SanDiegoCo_2016", "path": "cloud_data/california_contours/california/CA_Eastern_SanDiegoCo_2016/CA_Eastern_SanDiegoCo_2016_vrt/" }, { "name": "San_Bernadino_County_Flood_Control_Lidar", "path": "cloud_data/california_contours/california/San_Bernadino_County_Flood_Control_Lidar/San_Bernadino_County_Flood_Control_Lidar_vrt/" }, { "name": "CA_YosemiteNP_2019_D19", "path": "cloud_data/california_contours/california/CA_YosemiteNP_2019_D19/CA_YosemiteNP_2019_D19_vrt/" }, { "name": "CA_CarrHirzDeltaFires_2019_B19", "path": "cloud_data/california_contours/california/CA_CarrHirzDeltaFires_2019_B19/CA_CarrHirzDeltaFires_2019_B19_vrt/" }, { "name": "OR_RogueSiskiyouNF_2019_B19", "path": "cloud_data/california_contours/california/OR_RogueSiskiyouNF_2019_B19/OR_RogueSiskiyouNF_2019_B19_vrt/" }, { "name": "CA_AZ_FEMA_R9_Lidar_2017_D18", "path": "cloud_data/california_contours/california/CA_AZ_FEMA_R9_Lidar_2017_D18/CA_AZ_FEMA_R9_Lidar_2017_D18_vrt/" }, { "name": "CA_SantaClaraCounty_2020_A20", "path": "cloud_data/california_contours/california/CA_SantaClaraCounty_2020_A20/CA_SantaClaraCounty_2020_A20_vrt/" }, { "name": "radiant_st_4018", "path": "cloud_data/radiant_st_4018/" }, { "name": "beach_haven_estate_2430", "path": "cloud_data/beach_haven_estate_2430/" }, { "name": "maple_lane_rise_3352", "path": "cloud_data/maple_lane_rise_3352/" }, { "name": "montana_estate_3764", "path": "cloud_data/montana_estate_3764/" }, { "name": "elan_4500", "path": "cloud_data/elan_4500/" }, { "name": "kingsgrove_7109", "path": "cloud_data/kingsgrove_7109/" }, { "name": "donaldson_close_5255", "path": "cloud_data/donaldson_close_5255/" }, { "name": "mount_terry_estate_2527", "path": "cloud_data/mount_terry_estate_2527/" }, { "name": "the_village_grove_2560", "path": "cloud_data/the_village_grove_2560/" } ], "max_locations_per_request": 1000 }

ajnisbet · 2025-01-28T16:39:47Z

Gotcha. Are all 32 of these datasets a VRT?

Loading 32 VRTs via mounted S3 will take a while, though I'd expect it to take a bit less than 120s. If your mounting tool supports caching you could make those options more aggressive.

Otherwise, you could make a single GTI of these 32 datasets. Unlike VRTs, GTIs can handle projection differences: https://gdal.org/en/stable/drivers/raster/gti.html

Unfortunately I don't have plans to add caching to opentopodata. I'm open to it in theory, but it would need a design that can rescan updated datasets.

Perhaps a config option no_changes_since: 2025-01-07T23:54:33. OTD caches dataset info, but if the cache is older than no_changes_since it is rebuilt.

It would also somewhere to persist this information: perhaps mounting a second volume.

I'll think about this design some more!

sjanga1736 · 2025-01-29T17:19:49Z

The provided configuration file is just a sample, but I have around 2000+ files (both static and dynamic where static being big files in the order of 1G to 200 G and dynamic is of small file size in the order of 10 MB for different countries like AUS, NZL, Canada & USA).
Is there a way to optimize this or GTI is the only way?
Additionally, after creating the GTI, where should I place or specify the tile index?

ajnisbet · 2025-01-29T17:32:11Z

Yah scanning 2000 sequentially from on a cloud mount is gonna take a while.

In theory opentopodata could scan those files outside of an http request context, build a spatial index, and store that somewhere that persists between reloads. But that's what a GTI is!

You could store the tile index in S3 next to your datasets cloud_data/index/index.gti.

sjanga1736 changed the title ~~Ngnix Timeout Issue.~~ Ngnix Timeout Issue while building the load_datasets() Dictionary . Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ngnix Timeout Issue while building the load_datasets() Dictionary . #103

Ngnix Timeout Issue while building the load_datasets() Dictionary . #103

sjanga1736 commented Nov 25, 2024 •

edited

Loading

ajnisbet commented Jan 3, 2025

sjanga1736 commented Jan 28, 2025 •

edited

Loading

ajnisbet commented Jan 28, 2025

sjanga1736 commented Jan 29, 2025 •

edited

Loading

ajnisbet commented Jan 29, 2025

Ngnix Timeout Issue while building the load_datasets() Dictionary . #103

Ngnix Timeout Issue while building the load_datasets() Dictionary . #103

Comments

sjanga1736 commented Nov 25, 2024 • edited Loading

ajnisbet commented Jan 3, 2025

sjanga1736 commented Jan 28, 2025 • edited Loading

ajnisbet commented Jan 28, 2025

sjanga1736 commented Jan 29, 2025 • edited Loading

ajnisbet commented Jan 29, 2025

sjanga1736 commented Nov 25, 2024 •

edited

Loading

sjanga1736 commented Jan 28, 2025 •

edited

Loading

sjanga1736 commented Jan 29, 2025 •

edited

Loading