Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAS-2278 - Handle spatial subsetting in products with all fills in lat/lon coordinates #26

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

joeyschultz
Copy link

@joeyschultz joeyschultz commented Jan 29, 2025

Description

  • Updates to hoss_config.json to add resolution-specific master geotransform attributes to each grid_mapping polar reference
  • Move the logic that gets the grid mapping attributes out of get_variable_crs into a new function get_grid_mapping_attributes.
  • New function create_dimension_arrays_from_geotransform to create the dimension arrays from the master geotransform

Jira Issue ID

DAS-2303
DAS-2278

Local Test Steps

Test steps will be detailed when branch is ready for formal PR

PR Acceptance Checklist

  • Jira ticket acceptance criteria met.
  • CHANGELOG.md updated to include high level summary of PR changes.
  • docker/service_version.txt updated if publishing a release.
  • Tests added/updated and passing.
  • Documentation updated (if needed).

@joeyschultz
Copy link
Author

joeyschultz commented Jan 29, 2025

I've opened this draft PR to get feedback on the analysis completed in DAS-2303 (the analysis ticket that is being worked in preparation of DAS-2278). Unit tests are expected to fail at this moment.

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joey, here's a bunch of comments, I think you're on the right track and like a lot of this. feel free to respond inline or reach out and we can talk through anything. I'll approve the analysis ticket now. and leave this as a comment on the draft pr

column_dimensions = [
col_row_to_xy(geotranform, i, 0) for i in range(lat_arr.shape[1])
]
row_dimensions = [col_row_to_xy(geotranform, 0, i) for i in range(lat_arr.shape[0])]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a massive nit. but can you use different temp vars in your comprehensions? I would use i, and j and follow the standard conventions, or I'd probably just use row and col.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should go look and see if this was in my code too :awkwardsockmonkey:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also just so I understand, this routine doesn't need lat_arr, an array of the latitudes, it just needs the shape of that variable? is there a way to get that without reading the whole array?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a current way I'm aware of to get the shape of the variable without reading the whole array. DAS-2287 is addressing this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use 'row' and 'col' in 7d5e34c

# pull out dimension values
x_values = np.array([x for x, y in column_dimensions], dtype=np.float64)
y_values = np.array([y for x, y in row_dimensions], dtype=np.float64)
projected_y, projected_x = tuple(projected_dimension_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you copied this from the other code, but why not use * notation?
of if you know that projected_dimension_names is alway 2 values, just do direct unpacking

    projected_y, projected_x = projected_dimension_names

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, updated in 7d5e34c

@@ -119,13 +119,27 @@
{
"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3FT(P|P_E)",
"ShortNamePath": "SPL3FTP",
"VariablePattern": "(?i).*polar.*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need to be

Suggested change
"VariablePattern": "(?i).*polar.*"
"VariablePattern": "(?i).*Polar.*"

Well, I look dumb, the (?i) is the case insensitive flag..

That aside, if this is only for SPL3FTP, the group is defined as Freeze_Thaw_Retrieval_Data_Polar and I'm assuming (from looking at the file) the variables don't have the name polar except in their fully qualified path.

e.g.
Variable full name: Freeze_Thaw_Retrieval_Data_Polar/open_water_body_fraction
So why not just be explicit and make that

 "VariablePattern": "Freeze_Thaw_Retrieval_Data_Polar.*"

Would that work and be clearer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this would make it more clear. I've updated this in 7d5e34c. A leading slash is required so I decided to use:

"VariablePattern": "/Freeze_Thaw_Retrieval_Data_Polar/.*"

"Applicability": {
"Mission": "SMAP",
"ShortNamePath": "SPL3FTP_E",
"VariablePattern": "(?i).*polar.*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 7d5e34c

@@ -70,7 +77,7 @@ def get_variable_crs(variable: str, varinfo: VarInfoFromDmr) -> CRS:
cf_attributes = varinfo.get_missing_variable_attributes(grid_mapping)

if cf_attributes:
crs = CRS.from_cf(cf_attributes)
return cf_attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at this and going to complain that the complexity is a little out of hand, but I see that none of it was you... So this seem fine.

Comment on lines 43 to 49
def get_variable_crs(cf_attributes: str) -> CRS:
"""Create a `pyproj.CRS` object from the grid mapping variable metadata
attributes.

"""
return CRS.from_cf(cf_attributes)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if instead, you left this signature as it was, and moved a call to get_grid_mapping_attributes into here, then you wouldn't need to make two calls below and you've still exposed a function to get the cf_attributes.

And then actually, you could have another get_master_geotransform() function you could use down near L259 where you are switching how you get your dimension arrays.

let me know if that makes sense or sounds stupid.

hoss/spatial.py Outdated
crs,
projected_dimension_names,
)
if 'master_geotransform' in grid_mapping_attributes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked in a huddle about this. I'm seeing that it's used specifically for getting ranges, and this is sorta at the level of thought process in the function, so I don't think it buys you anything to bury this in another function just to hide this if statement.

I do still think you might have a dedicated get_master_geotransform() function that would hide all calls to get_grid_mapping_attributes from this level of abstraction (like I mentioned above) Also why not primary_geotransform instead of master_geotransform?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to implement a dedicated get_master_geotransform() function as you suggest. As for master_geotransform vs primary_geotransform I went with the naming suggested by @D-Auty.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - "master geotransform" here refers to the notion that the geotransform for a "whole earth" grid - e.g., what is defined in the GPD files, and for the EASE-GRID standard grids. A geotransform as a granule attribute by itself could be considered redundant with the dimension variables - which provide the "coordinate" in meters for the data arrays, but is specific to the granule and the array sizes. A "master geotransform" would be a collection level attribute, applicable across many granules of different sizes (e.g. tiles) and likely, many collections even. Hopefully the reference of master geotransform avoids the confusion with the specific extents of the granule itself. It seemed that master was a better reference than primary in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants