Skip to content

Forest Carbon: Global model vector clipping and reprojection #1875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

megannissel
Copy link
Contributor

Description

Fixes #1689

Background:

Enabling gdal.UseExceptions led to the discovery that invalid geometries were being created during reprojection of the Forest Carbon "Global Regression Models" vector (forest_carbon_edge_regression_model_parameters.shp). This was happening even with partial reprojection enabled. Digging into it, it seems partial reprojection doesn't necessarily remove entire geometries, just points that can't be reprojected; as such, you can end up with geometries that have insufficient points to be valid polygons.

Based on recommendations from @emlys and @phargogh, this PR does away with partial reprojection and instead clips the global vector prior to reprojection, so that only those polygons relevant to the model run need be reprojected. Clipping is based on the LULC raster's bounding box, plus a buffer (explained below).

While this model does include an AIO input, it's optional and is only used to calculate aggregate statistics at the end of the model run. For consistency, I'm ignoring it for the purposes of clipping.

Bounding Box Buffering:

The Global Regression Model vector data is used to construct a kd-tree. When querying the kd-tree, we set a distance_upper_bound. To ensure the clipped vector will contain all of the relevant data to the querying process, I'm buffering the LULC bounding box by the same DISTANCE_UPPER_BOUND value (currently set to 500km).

I'm also keeping all polygons that intersect with the clipping polygon, rather than using gdal.Layer.Clip(), in order to preserve the shape of the polygons (this is important because we use the centroid when constructing the kd-tree).

To summarize my clipping methodology:

  • Get the bounding box of the LULC raster; buffer it by DISTANCE_UPPER_BOUND
  • Reproject the buffered bounding box to the coordinate system of the vector
  • Copy all polygons from the global vector that intersect with the bounding box to a new vector
  • Pass the clipped vector along to _build_spatial_index, to be reprojected and used to create the kd-tree

Additional notes:

The User's Guide says, "Note that all spatial inputs must be in the same projected coordinate system and in linear meter units." However, we weren't validating the units within the model. I've updated the model spec to include "projection_units": u.meter for validation purposes.

Digging into this model, I also noticed a contradiction. The user's guide says, related to the "Number Of Points To Average," (in "How it works"):
"The user can designate the number of local models used in the interpolation scheme which is defaulted to 10 but can range anywhere from 1 (only closest point) to 2635 (every regression model on the planet)."

However, within the code itself, we specify the aforementioned DISTANCE_UPPER_BOUND of 500km that we use when querying the kd-tree (there's a note that reads, "grid cells are 100km. Becky says 500km is a good upper bound to search"). We may want to update the User's Guide to reflect this, especially now that we're using that DISTANCE_UPPER_BOUND as a buffer when clipping the global vector to the LULC raster's bounding box.

Checklist

  • Updated HISTORY.rst and link to any relevant issue (if these changes are user-facing)
  • Updated the user's guide (if needed)
  • Tested the Workbench UI (if relevant)

@megannissel megannissel marked this pull request as ready for review April 10, 2025 14:29
@megannissel megannissel requested a review from phargogh April 10, 2025 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Forest Carbon test data: invalid geometries after reprojection
1 participant