-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the caching memory allocator lock-free #46658
base: master
Are you sure you want to change the base?
Conversation
cms-bot internal usage |
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46658/42578 Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
Move the live block descriptors to the alpaka buffers: instead of tracking the descriptors for the live blocks in a global map, store each block's descriptor within the deleter of the block itself. Reimplement the free list as a vector of tbb::concurrent_queue objects, one per bin of the caching allocator. Since a block may be in the free list but not actually be available for reuse, blocks are popped from the queue until an available block is found and reused, or the queue is empty and new block is requested. Then all popped blocks are pushed back to the queue. Use atomic operations for the individual statistics. Access to the whole set of statistics may not be fully consistent, but they should only be used for debugging or monitoring.
d9df89c
to
3dd76b4
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46658/42580 |
A new Pull Request was created by @fwyzard for master. It involves the following packages:
@cmsbuild, @fwyzard, @makortel can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@makortel, I split the implementation of the CachingAllocator into header and source file to ease the development. In principle I can move back to a header-only implementation, if that's better ? |
@makortel I would like to make some other clean up in the code, but first I wanted to see if you have any comments on this implementation. Then I can push the follow up changes here, or in a separate PR. |
enable gpu |
please test |
@Dr15Jones I would appreciate any feedback from you as well :-) |
-1 Failed Tests: Build BuildI found compilation error when building: In file included from src/HeterogeneousCore/AlpakaInterface/interface/CachedBufAlloc.h:6, from src/HeterogeneousCore/AlpakaInterface/interface/memory.h:9, from src/DataFormats/Portable/interface/PortableHostCollection.h:11, from src/DataFormats/Portable/interface/PortableCollection.h:6, from src/DataFormats/Portable/test/test_catch2_portableCollectionOnHost.cc:3: src/HeterogeneousCore/AlpakaInterface/interface/CachingAllocator.h:11:10: fatal error: tbb/concurrent_queue.h: No such file or directory 11 | #include | ^~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. In file included from src/HeterogeneousCore/AlpakaInterface/interface/CachedBufAlloc.h:6, from src/HeterogeneousCore/AlpakaInterface/interface/memory.h:9, |
please test with #46657 |
+1 Size: This PR adds an extra 12KB to repository Comparison SummarySummary:
GPU Comparison SummarySummary:
|
PR description:
Move the live block descriptors to the alpaka buffers: instead of tracking the descriptors for the live blocks in a global map, store each block's descriptor within the deleter of the block itself.
Reimplement the free list as an
std::vector
oftbb::concurrent_queue
objects, one perbin of the caching allocator.
Since a block may be in the free list but not actually be available for reuse, blocks are popped from the queue until an available block is found and reused, or the queue is empty and new block is requested. Then all popped blocks are pushed back to the queue.
Use atomic operations for the individual statistics.
Access to the whole set of statistics may not be fully consistent, but they should only be used for debugging or monitoring.
PR validation:
Validated on top of CMSSW 14.0.x with the 2024 HLT menu.