Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dtype for swath -> area resampling with gradient search #618

Merged
merged 13 commits into from
Sep 18, 2024

Conversation

pnuu
Copy link
Member

@pnuu pnuu commented Sep 2, 2024

Resampling float32 swath data resulted in float64 output. This is a simple fix for this data type promotion.

  • Closes #xxxx
  • Tests added
  • Tests passed
  • Fully documented

@pnuu pnuu self-assigned this Sep 2, 2024
Copy link

codecov bot commented Sep 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.98%. Comparing base (8fe29a0) to head (8cc6ec8).
Report is 20 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #618   +/-   ##
=======================================
  Coverage   93.97%   93.98%           
=======================================
  Files          86       86           
  Lines       13825    13839   +14     
=======================================
+ Hits        12992    13006   +14     
  Misses        833      833           
Flag Coverage Δ
unittests 93.98% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pnuu
Copy link
Member Author

pnuu commented Sep 2, 2024

Performance comparisons with my 12 AVHRR composites:

Satpy main, Pyresample main:
Screenshot 2024-09-02 at 14-44-01 Bokeh Plot

Satpy pytroll/satpy#2893 , Pyresample main:
Screenshot 2024-09-02 at 14-45-19 Bokeh Plot

Satpy pytroll/satpy#2893 , Pyresample: this PR
Screenshot 2024-09-02 at 14-46-07 Bokeh Plot

So from 5.6 GB to 3.5 GB of RAM and from 87.3 s to 68.6 s processing time.

@coveralls
Copy link

coveralls commented Sep 2, 2024

Coverage Status

coverage: 93.682% (+0.006%) from 93.676%
when pulling 8cc6ec8 on pnuu:bugfix-gradient-swath2area-dtype
into 8eb252c on pytroll:main.

@@ -421,7 +421,7 @@ def parallel_gradient_search(data, src_x, src_y, dst_x, dst_y,
dst_x[i], dst_y[i],
method=method)
res = da.from_delayed(res, (num_bands, ) + dst_x[i].shape,
dtype=np.float64)
dtype=np.float64).astype(arr.dtype)
Copy link
Member

@mraspaud mraspaud Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think we can do better than this. What I see here (and a bit above) is that we convert the input data to float64, do the resampling, then convert back to float32.
Using cython fused types, we should be able to template the cython function to accept both float64, float32, or even ints. This cython doc section should help. I can give you a hand if you don't feel comfortable with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not going to learn Cython and its typing that deeply at this point 😅

@mraspaud
Copy link
Member

mraspaud commented Sep 3, 2024

@pnuu I pushed a couple of commits that introduce fused types in the cython so that both float64 and float32 input data is allowed and handled correctly in principle.
I also made a fused type for the coordinates, in case we want to experiment with float32 for these also. But that will require changes in the python code which I haven't looked at.
Tell me what you think!

@pnuu
Copy link
Member Author

pnuu commented Sep 3, 2024

@mraspaud definitely a better solution!

I don't see a difference in the performance though:
Screenshot 2024-09-03 at 14-17-44 Bokeh Plot

@mraspaud
Copy link
Member

mraspaud commented Sep 3, 2024

@pnuu did you recompile the cython code?

@pnuu
Copy link
Member Author

pnuu commented Sep 3, 2024

Yeah, I ran pip install -e . after which I saw a modified docstring chang in import. I now also did python setup.py build_ext --inplace and got this graph (no real change)
Screenshot 2024-09-03 at 15-00-55 Bokeh Plot

@mraspaud
Copy link
Member

mraspaud commented Sep 3, 2024

After a discussion on slack, it was noted that float32 is not precise enough for cartesian coordinates at least, so I removed the fused type for coordinates, only allowing float64 (double in cython)

@mraspaud
Copy link
Member

@pnuu where are we on this one?

@pnuu
Copy link
Member Author

pnuu commented Sep 13, 2024

I think we're good and ready to merge. The RAM footpring nor processing time didn't change with the coordinate dtype de-fusing from the previous graph above. All the datasets stay as they were, for example with pytroll/satpy#2893 the datasets/composites stay float32 as without this PR the were converted to float64.

@mraspaud
Copy link
Member

@djhoese would you mind giving your review on this?

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple small requests.

pyresample/gradient/_gradient_search.pyx Outdated Show resolved Hide resolved
pyresample/gradient/_gradient_search.pyx Outdated Show resolved Hide resolved
@pnuu
Copy link
Member Author

pnuu commented Sep 18, 2024

Thanks for the updates and tweaks @mraspaud!

@pnuu
Copy link
Member Author

pnuu commented Sep 18, 2024

Yikes, the resulting image has transposed!

day_microphysical_3a_with_night_microphysical_20240901_200143-0

@pnuu
Copy link
Member Author

pnuu commented Sep 18, 2024

Apparently this is the case also with main. The kd-tree based nearest neighbour resampling produces images that are oriented properly:

day_microphysical_3a_with_night_microphysical_20240901_200143-0

@mraspaud
Copy link
Member

Yikes, the resulting image has transposed!

day_microphysical_3a_with_night_microphysical_20240901_200143-0

What???

@pnuu
Copy link
Member Author

pnuu commented Sep 18, 2024

@mraspaud this seems to be a problem only with polar data (AVHRR L1b in my test case), so StackingGradientSearchResampler class. The ResampleBlocksGradientSearchResampler class used for data having AreaDefinition has been working properly.

@mraspaud
Copy link
Member

@pnuu ok, if this a problem also in main, can we merge this and fix the other issue in another PR?

@pnuu
Copy link
Member Author

pnuu commented Sep 18, 2024

@mraspaud yes, I'm fine with that 👍

@mraspaud mraspaud merged commit b279d35 into pytroll:main Sep 18, 2024
26 checks passed
@pnuu pnuu deleted the bugfix-gradient-swath2area-dtype branch September 19, 2024 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants