Skip to content

Shape reader: multiple variables alowed #1460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lencart
Copy link

@lencart lencart commented Dec 17, 2024

Shape reader can ingest multiple variables, one per polygon. For this the shape file needs a string field called variable that contains the name of the variable requested by get_variables.

All tests pass for standard land_binary_mask fed directly without the variable field.

All tests pass for a model requesting a landmask when this is in a file together with other variables.

Variable requests tests pass for variables that are not land_binary_mask.

Multiple shapefile read tests pass for variables with the variable field

Shape reader can ingest multiple variables, one per polygon.
For this the shape file needs a string field called `variable` that
contains the name of the variable requested by `get_variables`.

All tests pass for standard `land_binary_mask` fed directly without
the `variable` field.

All tests pass for a model requesting a landmask when this is in
a file together with other variables.

Variable requests tests pass for variables that are not
`land_binary_mask`.

Multiple shapefile read tests pass for variables with the
`variable` field

Fixes #5380, refs #5379 and refs #5321
@knutfrode
Copy link
Collaborator

Perhaps @gauteh could also have a look at this one.
Code seems ok to me. But could the tests be performed with the shapefiles already existing in the test-data-folder?
It is desirable to limit the number of files in the repository.

@knutfrode
Copy link
Collaborator

I downloaded the test_more_vars.shp and ran ogrinfo.
The output below shows attributes variable for each layer. Is this a standard thing, or a custom attribute that you added specifically for this demonstration/purpose?
If so, one must also make (and maintain) a description for how other users can create such specific shapefiles, and thus the general usage is diminshed.
It might perhaps be more generic to let users map variable names to existing layer names (if such a thing does exist?) or layer numbers.

$ ogrinfo test_more_vars.shp test_more_vars
INFO: Open of `test_more_vars.shp'
      using driver `ESRI Shapefile' successful.

Layer name: test_more_vars
Geometry: Polygon
Feature Count: 4
Extent: (13.718623, 68.421273) - (14.371705, 68.541077)
Layer SRS WKT:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
variable: String (64.0)
OGRFeature(test_more_vars):0
  variable (String) = var0
  POLYGON ((13.71862294033 68.4273643750906,13.753603948622 68.4291974443011,13.7566250357017 68.4229289246788,13.7203719907446 68.4212728034976,13.71862294033 68.4273643750906))

OGRFeature(test_more_vars):1
  variable (String) = var1
  POLYGON ((13.8240288368071 68.5095945112955,13.8639767232961 68.5118974800354,13.8684153773505 68.5036715263414,13.8275797600506 68.5000511691098,13.8240288368071 68.5095945112955))

OGRFeature(test_more_vars):2
  variable (String) = var1
  POLYGON ((14.2249146574074 68.5117792466294,14.2654707460295 68.5410768503108,14.3075866842141 68.5215493169071,14.2249146574074 68.5117792466294))

OGRFeature(test_more_vars):3
  variable (String) = var0
  POLYGON ((14.3318717414316 68.4978365792753,14.3039184377401 68.4991249973943,14.3095090984784 68.5114900958117,14.3717051991921 68.5138078026737,14.3696087014152 68.5032474445955,14.3633192080846 68.4965480881174,14.3318717414316 68.4978365792753))

@lencart
Copy link
Author

lencart commented Dec 17, 2024

Firs of all let me say that the reader works with shapefiles without any attributes or with attributes that are not called variable. So, the previous use case is therefore guaranteed.

Shapefiles have features and attributes. As far as I know they don't have layers embedded, you need an extra file to do that and that would add to the complexity.

The use case for this change is to allow for the introduction of other binary masks in addition to land binary mask. It's used to test if the particle is inside a polygon with the value that represents the variable name, such as for instance an habitat: e.g. sandy, seagrass, ice, ...
In this way, get_variables() would be called with [..., "sandy", ...] as one of the variables and it will return the result of the reader's __on_land__() method so that it get stored in the model's self.environment.sandy as a boolean that can be checked against. This will of course make the implementation of this facility not general, unless there are CF-compliant variable names for these variables that need to be accessed as binary masks.

For uses of the shape file reader to get variables other than land binary mask, I can write a short tutorial for it. It just requires the user to create a polygon or multi-polygon shapefile with the string attribute named variable that holds the name of the variable, with each of the features (polygons) representing where their attribute name is True. In this way you can have a single shapefile with separate polygons having the values for the attribute variable equal to sandy, seagrass, land_binary_mask, etc,...

@knutfrode
Copy link
Collaborator

We are generally a bit skeptical to include things that are fairly specific, as it increases complexity and future maintenance. In fact we are striving to reduce complexity as much as possible.

However, this is well programmed with docstrings and unittests. Thus I am willing to merge this if you can add a small tutorial so that this can also be used by others.

Maybe the best would be to make an example script for the gallery, where you could add some text/comments on how to produce and make use of such a shapefile. It could be named something like example_shapefile_with_variables.
This example should use the shapefiles already used for the unittest to avoid having more shapefiles than necessary in the repository.
It could also be nice if it is possible to create a shapefile with available libraries and some synthetic polygons, as addition to using the one in the test folder. But this could be added later.

Maybe one alternative to needing to have attributes named "variable", could be to specify a mapping to the shapereader constructor, e.g. { 0: 'sandy', 1: 'seagrass', 2: 'land_binary_mask'}, where keys/integers could be the feature number. Then the user would not need to modify the shapefile. This could also eventually be added later.

If you can add such an example script to this PR, I can eventually merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants