feature/issue-61: Resolve Dimensions with MetaData #68

nlenssen2013 · 2022-03-21T14:26:56Z

Github Issue: #61

Description

SNDR and TROPOMI have variables without dimensions that need to be transfered

Overview of work done

Resolve issues that xarray_enhancements wasn't completely fixing dimensions

Overview of verification done

Summarize the testing and verification you've done. This includes unit tests or testing with specific data

Overview of integration done

Explain how this change was integration tested. Provide screenshots or logs if appropriate. An example of this would be a local Harmony deployment.

PR checklist:

Linted
Updated unit tests
Updated changelog
Integration testing

See Pull Request Review Checklist for pointers on reviewing this pull request

skorper · 2022-03-22T22:21:32Z

podaac/subsetter/subset.py

@@ -1125,7 +1123,7 @@ def subset(file_to_subset, bbox, output_file, variables=None,  # pylint: disable
            ) for lat_var_name in lat_var_names
        ]
        chunks_dict = calculate_chunks(dataset)
-
+        print (lat_var_names)


Can you remove this print statement and any others?

skorper · 2022-03-22T22:23:18Z

tests/test_subset.py

@@ -1688,6 +1687,98 @@ def test_get_time_OMI(self):
            assert "Time" in time_var_names[0]
            assert "Latitude" in lat_var_names[0]

+    def test_sndr_dims(self):


It seems like this test contains a lot of duplicate code from the subsetter itself -- can you call a function from the subsetter and make assertions on the result rather than doing it this way?

Trying to minimize calling the entire subset.subset method and just calling the subset_with_bbox method. Will just add assertions to the dimensions input vs output

skorper · 2022-03-22T22:25:36Z

podaac/subsetter/subset.py

-            while not var_dims:
-                var_group_parent = var_group_parent.parent
-                var_dims = list(var_group_parent.dimensions.keys())
+            var_dims = []


Wouldn't var_dims already be an empty list? It seems like this line does nothing. Also I'm a little concerned about this breaking functionality with our existing datasets

Yes, I removed the var_dims reassign. Do you have an idea of what existing datasets might break because all the unittests pass?

skorper · 2022-03-22T22:26:36Z

podaac/subsetter/subset.py

-                    var for var in dataset.data_vars.keys()
-                    if var in variables and var not in group_vars and not var.startswith(tuple(lat_var_prefix))
-                ])
+            group_vars.extend([


Why did you pull this out of the if variables block? Wouldn't this do nothing because the if var in variables would always be False?

Re-added the if variables statement

skorper · 2022-04-12T23:52:33Z

podaac/subsetter/xarray_enhancements.py

            if original_type != new_type:
                new_dataset[variable_name] = xr.apply_ufunc(cast_type, new_dataset[variable_name],
                                                            str(original_type), dask='allowed',
                                                            keep_attrs=True)

+            if partial_dim_in_in_vars and (indexers.keys() - dataset[variable_name].dims) and set(


A couple questions:

Why is this only applicable to cases where there is no _FillValue?

If there is a partial overlap with subset dims, will we lose that subset?

It is not, to my knowledge. The logic in line 201 needs to be applied to cases with and without _FillValue. A few SNDR variables were not getting picked up in this logic, thus the excess dimensions

If I understand correctly you mean the case when a variable has only one of the dimensions that are being subset? /sat_vel in SNDR has dimensions of (atrack,spatial) and after a bbox subset the atrack dimension is lower as expected

It is not, to my knowledge. The logic in line 201 needs to be applied to cases with and without _FillValue. A few SNDR variables were not getting picked up in this logic, thus the excess dimensions

Are you planning on making that change before we merge?

It is not, to my knowledge. The logic in line 201 needs to be applied to cases with and without _FillValue. A few SNDR variables were not getting picked up in this logic, thus the excess dimensions

Are you planning on making that change before we merge?

The line that you commented on is the change. Was just explaining what the change was doing and what was lacking before

Are you referring to the following 3 lines where this same logic is applied in the missing FillValue case?

l2ss-py/podaac/subsetter/xarray_enhancements.py

Lines 225 to 228 in 94f1630

new_dataset[variable_name] = indexed_var

new_dataset[variable_name].attrs = indexed_var.attrs

variable.attrs = indexed_var.attrs

My concern is that the block above is only supposed to be applied in cases where this is no FillValue. When there is FillValue, data outside the provided bounds are replaced with FillValue. If there is no FillValue, what we use the original data, meaning data outside the bounds will still be in the subsetted result, which results in an incomplete subset. It seems like you're applying this logic to cases where there is a FillValue, meaning we will get an incomplete subset even when we don't have to.

after a bbox subset the atrack dimension is lower as expected

This doesn't necessarily mean the data is fully subset. Our subset operation has two steps:

Cut dimensions as much as possible, meaning dimension size itself will shrink. This is the "indexers" logic you see here

Within the remaining data, mask values that are not within the bounds. This means data within the now reduced dimensions will be replaced with "nan" if they are outside of the bounds.

If the variable has FillValue, replace nan with FillValue

If the variable has no FillValue, replace nan with original values

Sorry for the essay, please let me know if this makes sense

Yes thanks for this. The .where function adds the xtrack and atrack dimension to all variables, and variables that do not have these dimensions and have no intersection need to have the extra dimensions removed. My understanding is that there is no logic for this case and it really should just be exactly the same as the original variable before subsetting.
SNDR /air_pres is that
if partial_dim_in_in_vars and (indexers.keys() - dataset[variable_name].dims) and set( indexers.keys()).intersection(dataset[variable_name].dims)

returns false and:
if partial_dim_in_in_vars and (indexers.keys() - dataset[variable_name].dims) and set( indexers.keys()).intersection(new_dataset[variable_name].dims)
returns true

@nlenssen2013 Thanks for the explanation. I think my concerns have been alleviated and I will approve once you fix the conflict 🙂

sonarqubecloud · 2022-04-13T18:49:18Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

92.9% Coverage
0.0% Duplication

skorper · 2022-04-13T18:56:31Z

@frankinspace SonarCloud is reporting code smell for this PR: https://sonarcloud.io/project/issues?resolved=false&severities=CRITICAL&sinceLeakPeriod=true&types=CODE_SMELL&pullRequest=68&id=podaac_l2ss-py&open=AX-3X9FKvbEZitqyh0cy

It seems the issue is cognitive complexity in the subset function. This PR only lightly touches this code so I'm thinking we should open a separate issue for simplifying this function and merge this PR/dismiss this SolarCloud failure. Thoughts?

frankinspace · 2022-04-13T20:15:45Z

I was thinking the same thing @skorper. We need to simplify that function but probably not in this PR

skorper · 2022-04-13T20:39:57Z

#81

nlenssen2013 added 30 commits February 24, 2022 13:26

Update environment files

0c18048

Adding flattening he5 file functionality

4922de7

Update changelog

f1fe30b

Remove duplicated test

041dd4e

Resolve testing file error

2c44f2e

Added OMI file for flattening tests

2004475

Clean up SNDR files

939e1be

Remove excess SNDR files

d9c8831

Changes to get coordinates Latitude was duplicated

713608a

Change tests for smaller SNDR files

146b313

Add Try/Except to pass subsetting for OCO3

5dbd95e

Add test to get coordinates for OMI files

987a826

Updating subset to change branches

51d43c6

xarray .where investigating

2115e8e

Merge branch 'develop' of https://github.com/podaac/l2ss-py into develop

26a4195

resolved dimension issues in SNDR

901f158

commit to switch branches

04aac53

improve dimensions for SNDR operations

37d86f9

add dimension check test

2bb13fa

Xarray enhancements correction for comment

cf6a491

Update tests and subset in git

bdcd37c

Merge branch 'develop' of https://github.com/podaac/l2ss-py into develop

6fce619

Merge develop and feature/issue-61

9b4be97

Update Changelog

b3761a1

Fix pylint issues

2cef21c

Fix github pylinting issues

d8b834f

Fix github pylinting issues

42c279e

Commit version changes to branch

eb282f9

Only one latitude

46eed9d

Correct var_dims==None logic from break to pass

bfbfc55

Pylint fix

77e27c9

skorper changed the title ~~Feature/issue 61 - Resolve Dimensions with MetaData~~ feature/issue-61: Resolve Dimensions with MetaData Mar 22, 2022

skorper suggested changes Mar 22, 2022

View reviewed changes

nlenssen2013 added 3 commits March 23, 2022 08:52

Resolve and remove unneeded code

66d636e

remove and improve sndr testing

f351aba

remove print

571629e

nlenssen2013 requested a review from skorper March 23, 2022 16:37

Resolve merging conflicts

94f1630

skorper reviewed Apr 12, 2022

View reviewed changes

nlenssen2013 added 2 commits April 13, 2022 13:59

Resolved logical issues with partial_dim_in block

46d2722

Branch conflict resolution

c34cf86

skorper approved these changes Apr 13, 2022

View reviewed changes

skorper merged commit aaad841 into develop Apr 13, 2022

skorper deleted the feature/issue-61 branch April 13, 2022 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature/issue-61: Resolve Dimensions with MetaData #68

feature/issue-61: Resolve Dimensions with MetaData #68

nlenssen2013 commented Mar 21, 2022

skorper Mar 22, 2022

skorper Mar 22, 2022

nlenssen2013 Mar 23, 2022

skorper Mar 22, 2022

nlenssen2013 Mar 23, 2022

skorper Mar 22, 2022

nlenssen2013 Mar 23, 2022

skorper Apr 12, 2022

nlenssen2013 Apr 13, 2022

skorper Apr 13, 2022

nlenssen2013 Apr 13, 2022

skorper Apr 13, 2022

nlenssen2013 Apr 13, 2022

skorper Apr 13, 2022

sonarqubecloud bot commented Apr 13, 2022 •

edited

Loading

skorper commented Apr 13, 2022

frankinspace commented Apr 13, 2022

skorper commented Apr 13, 2022

	new_dataset[variable_name] = indexed_var

	new_dataset[variable_name].attrs = indexed_var.attrs
	variable.attrs = indexed_var.attrs

feature/issue-61: Resolve Dimensions with MetaData #68

feature/issue-61: Resolve Dimensions with MetaData #68

Conversation

nlenssen2013 commented Mar 21, 2022

Description

Overview of work done

Overview of verification done

Overview of integration done

PR checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Apr 13, 2022 • edited Loading

skorper commented Apr 13, 2022

frankinspace commented Apr 13, 2022

skorper commented Apr 13, 2022

sonarqubecloud bot commented Apr 13, 2022 •

edited

Loading