Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove remnants of libfabric parcelport #6474

Merged
merged 1 commit into from
Nov 3, 2024
Merged

Conversation

hkaiser
Copy link
Member

@hkaiser hkaiser commented Apr 17, 2024

No description provided.

Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.02% 100.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (eb2bf57) 217975 185524 85.11%
Head commit (a32228d) 190892 (-27083) 162508 (-23016) 85.13% (+0.02%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#6474) 1 1 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation

@biddisco
Copy link
Contributor

Before you do away with the libfabric parcelport, be aware that I have a branch with a lot of improvements, that I had hoped to submit a PR for. We had an intern here last year and he workde on anew allocator, but unfortunately the allocator didn't quite fulfill all of our requirements, so I didn't yet proceed on integrating it with the parcelport.

The parcelport needs to allocate pinned memory and cache it, the allocator provides thiscapability by creating a custom mimalloc arena (ie, requires mimalloc), however the arena cannot be resized after creation, so it isn't ideal (though it would work). Thus a pre-allocated slabo f memory can be requested on start of the allocator.

Apart from this allocator issue, the parcelport is actually in good shape, though about 1 year out of sync with hpx main and I am aware that a lot of other parcelport related changes have gone in during that time - so breakage is almost inevitable. Perhaps one of the other parcelports has addressed the memory pinning issue and could provide a drop in replacement?

Note that the parcelport also has some quite well optimized polling and dispatching routines, though not as nice as the sender polling for mpi/cuda that is now in pika, which also handles transfer of continuations to the "right place" so that the custom polling pool remains uncluttered with extraneous work.

Given that the mimallloc allocator isn't perfect - would a PR to improve the libfabric in hpx (tested on many machines including amazon AWS etc) be of interest still - or do all the other parcelports now provide good performance and libfabric is no longer needed.

@hkaiser
Copy link
Member Author

hkaiser commented Apr 22, 2024

@biddisco I'd be more than happy to accept a PR that makes the current libfabric parcelport usable.

@biddisco
Copy link
Contributor

@biddisco I'd be more than happy to accept a PR that makes the current libfabric parcelport usable.

Do any of the other parcelports have code that manages memory pinning etc? Has anyone else worked on this recently?

@hkaiser
Copy link
Member Author

hkaiser commented Apr 22, 2024

@biddisco I'd be more than happy to accept a PR that makes the current libfabric parcelport usable.

Do any of the other parcelports have code that manages memory pinning etc? Has anyone else worked on this recently?

IIUC, @JiakunYan pins memory in the LCI parcelport, but that could be part of LCI proper, not the parcelport.

@JiakunYan
Copy link
Contributor

JiakunYan commented Apr 22, 2024

Yes, the memory registration caching code is implemented in LCI. However, it is disabled by default when LCI uses the libfabric backend. I believe libfabric has its own memory registration cache, so I just register and deregister memory buffers plainly. I haven't encountered performance issues related to memory registration.

@hkaiser hkaiser modified the milestones: 1.10.0, 1.11.0 May 3, 2024
@StellarBot
Copy link

Performance test report

HPX Performance

Comparison

BENCHMARKFORK_JOIN_EXECUTORPARALLEL_EXECUTORSCHEDULER_EXECUTOR
For Each------

Info

PropertyBeforeAfter
HPX Commitd27ac2e3a3f3d3
HPX Datetime2024-03-18T14:00:30+00:002024-11-02T14:34:20+00:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Clusternamerostamrostam
Datetime2024-03-18T09:18:04.949759-05:002024-11-02T09:40:37.265378-05:00
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/18.1.5/bin/clang++ 18.1.5
Envfile

Comparison

BENCHMARKNO-EXECUTOR
Future Overhead - Create Thread Hierarchical - Latch--

Info

PropertyBeforeAfter
HPX Commitd27ac2e3a3f3d3
HPX Datetime2024-03-18T14:00:30+00:002024-11-02T14:34:20+00:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Clusternamerostamrostam
Datetime2024-03-18T09:19:53.062988-05:002024-11-02T09:42:27.943726-05:00
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/18.1.5/bin/clang++ 18.1.5
Envfile

Comparison

BENCHMARKFORK_JOIN_EXECUTOR_DEFAULT_FORK_JOIN_POLICY_ALLOCATORPARALLEL_EXECUTOR_DEFAULT_PARALLEL_POLICY_ALLOCATORSCHEDULER_EXECUTOR_DEFAULT_SCHEDULER_EXECUTOR_ALLOCATOR
Stream Benchmark - Add------
Stream Benchmark - Scale-----
Stream Benchmark - Triad------
Stream Benchmark - Copy-----

Info

PropertyBeforeAfter
HPX Commitd27ac2e3a3f3d3
HPX Datetime2024-03-18T14:00:30+00:002024-11-02T14:34:20+00:00
Hostnamemedusa08.rostam.cct.lsu.edumedusa08.rostam.cct.lsu.edu
Clusternamerostamrostam
Datetime2024-03-18T09:20:13.002391-05:002024-11-02T09:42:48.678037-05:00
Compiler/opt/apps/llvm/13.0.1/bin/clang++ 13.0.1/opt/apps/llvm/18.1.5/bin/clang++ 18.1.5
Envfile

Explanation of Symbols

SymbolMEANING
=No performance change (confidence interval within ±1%)
(=)Probably no performance change (confidence interval within ±2%)
(+)/(-)Very small performance improvement/degradation (≤1%)
+/-Small performance improvement/degradation (≤5%)
++/--Large performance improvement/degradation (≤10%)
+++/---Very large performance improvement/degradation (>10%)
?Probably no change, but quite large uncertainty (confidence interval with ±5%)
??Unclear result, very large uncertainty (±10%)
???Something unexpected…

Copy link

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
Report missing for 1ed28bc1 100.00%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (1ed28bc) Report Missing Report Missing Report Missing
Head commit (56075d1) 191386 162850 85.09%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#6474) 1 1 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

Footnotes

  1. Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.

@hkaiser hkaiser merged commit 3cfb67a into master Nov 3, 2024
66 of 80 checks passed
@hkaiser hkaiser deleted the remove_libfabric_pp branch November 3, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants