-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ngnix Timeout Issue while building the load_datasets() Dictionary . #103
Comments
How are your S3 files being added: are you mounting your S3 bucket to the local filesystem? Could you share your config.yaml file? If all your S3 files are in the same projection, it would be fastest to prebuild a single VRT file referencing all the S3 files. Then opentopodata will see your dataset as a single file and not have to reach out to S3 when loading the datasets. There's an example of building a VRT from S3 files here: https://www.opentopodata.org/notes/cloud-storage/ |
This is the sample config file where the dataset projections are different based on the country and i am mounting S3 bucket to ec2 instance (locally)
|
Gotcha. Are all 32 of these datasets a VRT? Loading 32 VRTs via mounted S3 will take a while, though I'd expect it to take a bit less than 120s. If your mounting tool supports caching you could make those options more aggressive. Otherwise, you could make a single GTI of these 32 datasets. Unlike VRTs, GTIs can handle projection differences: https://gdal.org/en/stable/drivers/raster/gti.html Unfortunately I don't have plans to add caching to opentopodata. I'm open to it in theory, but it would need a design that can rescan updated datasets. Perhaps a config option It would also somewhere to persist this information: perhaps mounting a second volume. I'll think about this design some more! |
|
Yah scanning 2000 sequentially from on a cloud mount is gonna take a while. In theory opentopodata could scan those files outside of an http request context, build a spatial index, and store that somewhere that persists between reloads. But that's what a GTI is! You could store the tile index in S3 next to your datasets |
Hi Andrew,
I have 2000+ files that are included in the configuration sourced from S3 where file sizes are varying from 100 KB to 500 GB. on initial loading of dataset configurations (config.yaml) and creating the appropriate dataset objects using this
datasets = {d["name"]: Dataset.from_config(**d) for d in config["datasets"]}
i am running into 504 tIme out Error (though nginix was configured with timeout of 120s). How to resolve this.Is their any better way to create
datasets
on startup when there are huge number of files?or
do i need to create explicitly
datasets
based on dataset name?The text was updated successfully, but these errors were encountered: