Skip to content

Conversation

@aaronzedwick
Copy link
Member

@aaronzedwick aaronzedwick commented May 26, 2025

Closes #1277

Overview

This adds parallelization to numba functions within the dual function and within the bilinear weights calculation.

PR Checklist

General

  • An issue is linked created and linked
  • Add appropriate labels
  • Filled out Overview and Expected Usage (if applicable) sections

Testing

  • Adequate tests are created if there is new functionality
  • Tests cover all possible logical paths in your function
  • Tests are not too basic (such as simply calling a function and nothing else)

Documentation

  • Docstrings have updated with any function changes

@aaronzedwick aaronzedwick added the run-benchmark Run ASV benchmark workflow label May 26, 2025
@github-actions
Copy link

github-actions bot commented May 26, 2025

ASV Benchmarking

Benchmark Comparison Results

Benchmarks that have improved:

Change Before [d27bbfc] After [d37a581] Ratio Benchmark (Parameter)
- 776M 345M 0.45 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
failed 754M n/a face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
- 50.6±1ms 25.4±0.1ms 0.50 mpas_ocean.DualMesh.time_dual_mesh_construction('120km')
- 5.02±0.2ms 3.89±0.1ms 0.78 mpas_ocean.DualMesh.time_dual_mesh_construction('480km')
- 377M 297M 0.79 mpas_ocean.Integrate.peakmem_integrate('480km')

Benchmarks that have stayed the same:

Change Before [d27bbfc] After [d37a581] Ratio Benchmark (Parameter)
343M 344M 1.00 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
375M 374M 1.00 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
19.0±0.02ms 19.1±0.2ms 1.00 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
3.33±0.03ms 3.31±0.03ms 1.00 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
17.1±0.06ms 17.2±0.05ms 1.01 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
2.10±0ms 2.07±0.02ms 0.99 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
2.11±0.05s 2.08±0.03s 0.99 import.Imports.timeraw_import_uxarray
855±7μs 872±10μs 1.02 mpas_ocean.CheckNorm.time_check_norm('120km')
694±20μs 672±10μs 0.97 mpas_ocean.CheckNorm.time_check_norm('480km')
662±6ms 669±5ms 1.01 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km')
43.2±0.9ms 43.2±0.3ms 1.00 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km')
637±10μs 622±6μs 0.98 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km')
516±7μs 507±5μs 0.98 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km')
6.44±0.09ms 6.39±0.07ms 0.99 mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('120km')
4.73±0.02ms 4.77±0.05ms 1.01 mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('480km')
3.58±0.04s 3.53±0.02s 0.99 mpas_ocean.ConstructFaceLatLon.time_welzl('120km')
224±0.8ms 225±3ms 1.00 mpas_ocean.ConstructFaceLatLon.time_welzl('480km')
18.1±0.02ms 18.1±0.03ms 1.00 mpas_ocean.ConstructTreeStructures.time_ball_tree('120km')
959±10μs 964±5μs 1.01 mpas_ocean.ConstructTreeStructures.time_ball_tree('480km')
11.2±0.07ms 11.3±0.07ms 1.01 mpas_ocean.ConstructTreeStructures.time_kd_tree('120km')
1.16±0.01ms 1.16±0.01ms 1.00 mpas_ocean.ConstructTreeStructures.time_kd_tree('480km')
547±6ms 549±4ms 1.00 mpas_ocean.CrossSections.time_const_lat('120km', 1)
274±1ms 274±1ms 1.00 mpas_ocean.CrossSections.time_const_lat('120km', 2)
142±0.5ms 143±1ms 1.01 mpas_ocean.CrossSections.time_const_lat('120km', 4)
426±5ms 420±2ms 0.99 mpas_ocean.CrossSections.time_const_lat('480km', 1)
216±4ms 213±1ms 0.99 mpas_ocean.CrossSections.time_const_lat('480km', 2)
111±2ms 112±0.8ms 1.01 mpas_ocean.CrossSections.time_const_lat('480km', 4)
915±7ms 919±6ms 1.00 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False)
53.5±0.8ms 52.7±0.3ms 0.99 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True)
74.6±1ms 73.9±0.2ms 0.99 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False)
5.44±0.07ms 5.30±0.1ms 0.97 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True)
233M 233M 1.00 mpas_ocean.Gradient.peakmem_gradient('120km')
217M 217M 1.00 mpas_ocean.Gradient.peakmem_gradient('480km')
2.73±0.02ms 2.78±0.03ms 1.02 mpas_ocean.Gradient.time_gradient('120km')
310±1μs 311±2μs 1.00 mpas_ocean.Gradient.time_gradient('480km')
207±3μs 206±2μs 0.99 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('120km')
118±1μs 118±1μs 1.00 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('480km')
309M 310M 1.00 mpas_ocean.Integrate.peakmem_integrate('120km')
145±0.3ms 151±1ms 1.04 mpas_ocean.Integrate.time_integrate('120km')
18.7±0.3ms 18.1±0.1ms 0.97 mpas_ocean.Integrate.time_integrate('480km')
345±2ms 354±4ms 1.03 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude')
349±4ms 352±9ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include')
348±2ms 351±5ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split')
23.0±0.2ms 23.3±0.2ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude')
22.9±0.2ms 23.3±0.3ms 1.02 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include')
22.9±0.2ms 23.3±0.1ms 1.02 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split')
failed failed n/a mpas_ocean.PointInPolygon.time_face_search('120km')
failed failed n/a mpas_ocean.PointInPolygon.time_face_search('480km')
267±3ms 237±2ms ~0.89 mpas_ocean.RemapDownsample.time_bilinear_remapping
288±2ms 291±0.8ms 1.01 mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping
29.3±0.2ms 28.9±0.2ms 0.99 mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping
1.19±0.01s 1.18±0.01s 0.99 mpas_ocean.RemapUpsample.time_bilinear_remapping
51.3±0.2ms 51.2±0.4ms 1.00 mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping
26.2±0.2ms 26.4±0.2ms 1.01 mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping
28.1±0.2ms 28.3±0.4ms 1.01 mpas_ocean.ZonalAverage.time_zonal_average('120km')
5.65±0.06ms 5.51±0.03ms 0.98 mpas_ocean.ZonalAverage.time_zonal_average('480km')
215M 213M 0.99 quad_hexagon.QuadHexagon.peakmem_open_dataset
213M 213M 1.00 quad_hexagon.QuadHexagon.peakmem_open_grid
7.56±0.1ms 7.77±0.4ms 1.03 quad_hexagon.QuadHexagon.time_open_dataset
6.46±0.01ms 6.60±0.1ms 1.02 quad_hexagon.QuadHexagon.time_open_grid

@philipc2 philipc2 removed the run-benchmark Run ASV benchmark workflow label May 28, 2025
@aaronzedwick aaronzedwick marked this pull request as ready for review June 3, 2025 02:23
@aaronzedwick aaronzedwick requested review from philipc2 and rajeeja June 3, 2025 14:34
@rajeeja rajeeja added the scalability Related to scalability & performance efforts label Jun 10, 2025
@philipc2 philipc2 requested a review from Copilot July 15, 2025 20:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds parallelization to dual mesh construction and bilinear weight calculation functions to improve performance. The changes implement Numba's prange for parallel execution in computationally intensive loops.

  • Enables parallel processing in dual mesh face construction using prange
  • Adds parallelization to bilinear weight calculation functions
  • Improves numerical stability with bounds checking and error handling

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
uxarray/remap/bilinear.py Adds parallel execution to weight calculation loop and improves array indexing
uxarray/grid/dual.py Implements parallel dual mesh construction with enhanced numerical stability checks
test/test_grid.py Adds comprehensive tests for parallel dual mesh construction and validation
Comments suppressed due to low confidence (1)

test/test_grid.py:877

  • [nitpick] The 30-second timeout threshold for performance testing may be too lenient and could vary significantly across different hardware configurations. Consider using a more relative performance metric or making this configurable.
    assert avg_time < 30.0, f"Dual construction too slow: {avg_time:.2f}s"

@philipc2 philipc2 changed the title Update dual creation to use parallelization Parallelize Dual Grid Construction Jul 16, 2025
Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my i7 local machine for a 15km Grid

image

There's a noticeable speedup, but scaling is not great. I'm happy to move forward with this though.

@rajeeja rajeeja requested a review from philipc2 July 18, 2025 00:13
pre-commit-ci bot and others added 8 commits August 4, 2025 18:01
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.12.3 → v0.12.4](astral-sh/ruff-pre-commit@v0.12.3...v0.12.4)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.12.4 → v0.12.5](astral-sh/ruff-pre-commit@v0.12.4...v0.12.5)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…t reliable solution since it's simple, fast, and fully supported by Numba. Numba doesn't fully support np.clip in nopython mode.
@rajeeja rajeeja merged commit c40e239 into main Aug 5, 2025
20 checks passed
@erogluorhan erogluorhan deleted the zedwick/parallelize_dual branch September 26, 2025 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scalability Related to scalability & performance efforts

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallelize Dual Creation

5 participants