Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible datasets for benchmarks #1

Open
brendan-ward opened this issue Oct 1, 2021 · 2 comments
Open

Possible datasets for benchmarks #1

brendan-ward opened this issue Oct 1, 2021 · 2 comments

Comments

@brendan-ward
Copy link
Member

A few data sources to consider for bigger benchmarks:

U.S. high resolution hydrography data

These are served by 4th-code watersheds (download a *_gdb.zip) that have data within an ESRI File Geodatabase.

  • NHDFlowline: river / stream center lines
  • NHDWaterbody: lakes / rivers
  • (several other layers with other geometry types but generally smaller in volume)

We use some of these in pyogrio

Useful for testing intersection of waterbodies and flowlines, clipping, etc.

World Database on Protected Areas
(see the download button)

3GB dataset that has terrestrial and marine protected areas

One of the "advantages" for doing bencharks with some of these is that the geometries are not always clean, so these could be good for benchmarking things like making them valid or unioning them together, or intersecting them with admin boundaries like countries or EEZs (below).

Marine regions

For example, the World EEZ (Exclusive Economic Zones) dataset is a useful one to try and intersect with marine protected areas above.

@TLouf
Copy link

TLouf commented Oct 1, 2021

Another possibility is to leverage OpenStreetMap data, accessed using OSMnx for instance. OSM gives access to any kind of geometries and even mixes of them, using building shapes, administrative regions, railways, points of interest, water bodies...

I'm mentioning OSMnx because that's the package I know which makes it easiest to download and digest OSM data (see this example), but it could be anything else that does the trick. The downside of OSMnx for the purpose of this repo is that it requires networkx, which would be a useless dependency here.

@martinfleis
Copy link
Member

OSM is a good source. For performance reasons, it may be better to get the larger data using pyrosm but that is a minor detail.

British Ordnance Survey has a series of GB-wide open datasets with polygons, lines and points at https://osdatahub.os.uk/downloads/open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants