Make the caching memory allocator lock-free #46658

fwyzard · 2024-11-11T13:01:34Z

PR description:

Move the live block descriptors to the alpaka buffers: instead of tracking the descriptors for the live blocks in a global map, store each block's descriptor within the deleter of the block itself.

Reimplement the free list as an std::vector of tbb::concurrent_queue objects, one per
bin of the caching allocator.
Since a block may be in the free list but not actually be available for reuse, blocks are popped from the queue until an available block is found and reused, or the queue is empty and new block is requested. Then all popped blocks are pushed back to the queue.

Use atomic operations for the individual statistics.
Access to the whole set of statistics may not be fully consistent, but they should only be used for debugging or monitoring.

PR validation:

Validated on top of CMSSW 14.0.x with the 2024 HLT menu.

cmsbuild · 2024-11-11T13:02:02Z

cms-bot internal usage

fwyzard · 2024-11-11T13:02:57Z

With CMSSW_14_0_15_patch1, running the HLT with 128 threads we see:

With these changes, the contention moves to the CUDA mutex (to be addressed by further improvements):

cmsbuild · 2024-11-11T13:04:28Z

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46658/42578

Code check has found code style and quality issues which could be resolved by applying following patch(s)

code-format:
https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46658/42578/code-format.patch
e.g. curl -k https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46658/42578/code-format.patch | patch -p1
You can also run scram build code-format to apply code format directly

Move the live block descriptors to the alpaka buffers: instead of tracking the descriptors for the live blocks in a global map, store each block's descriptor within the deleter of the block itself. Reimplement the free list as a vector of tbb::concurrent_queue objects, one per bin of the caching allocator. Since a block may be in the free list but not actually be available for reuse, blocks are popped from the queue until an available block is found and reused, or the queue is empty and new block is requested. Then all popped blocks are pushed back to the queue. Use atomic operations for the individual statistics. Access to the whole set of statistics may not be fully consistent, but they should only be used for debugging or monitoring.

cmsbuild · 2024-11-11T13:14:39Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46658/42580

cmsbuild · 2024-11-11T13:15:01Z

A new Pull Request was created by @fwyzard for master.

It involves the following packages:

HeterogeneousCore/AlpakaInterface (heterogeneous)

@cmsbuild, @fwyzard, @makortel can you please review it and eventually sign? Thanks.
@makortel, @missirol, @rovere this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

fwyzard · 2024-11-11T13:15:02Z

@makortel, I split the implementation of the CachingAllocator into header and source file to ease the development. In principle I can move back to a header-only implementation, if that's better ?

fwyzard · 2024-11-11T13:16:06Z

@makortel I would like to make some other clean up in the code, but first I wanted to see if you have any comments on this implementation.

Then I can push the follow up changes here, or in a separate PR.

fwyzard · 2024-11-11T13:16:19Z

enable gpu

fwyzard · 2024-11-11T13:16:22Z

please test

fwyzard · 2024-11-11T13:21:21Z

@Dr15Jones I would appreciate any feedback from you as well :-)

cmsbuild · 2024-11-11T14:34:39Z

-1

Failed Tests: Build
Size: This PR adds an extra 28KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49dcac/42726/summary.html
COMMIT: 3dd76b4
CMSSW: CMSSW_14_2_X_2024-11-11-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46658/42726/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

In file included from src/HeterogeneousCore/AlpakaInterface/interface/CachedBufAlloc.h:6,
                 from src/HeterogeneousCore/AlpakaInterface/interface/memory.h:9,
                 from src/DataFormats/Portable/interface/PortableHostCollection.h:11,
                 from src/DataFormats/Portable/interface/PortableCollection.h:6,
                 from src/DataFormats/Portable/test/test_catch2_portableCollectionOnHost.cc:3:
src/HeterogeneousCore/AlpakaInterface/interface/CachingAllocator.h:11:10: fatal error: tbb/concurrent_queue.h: No such file or directory
   11 | #include 
      |          ^~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from src/HeterogeneousCore/AlpakaInterface/interface/CachedBufAlloc.h:6,
                 from src/HeterogeneousCore/AlpakaInterface/interface/memory.h:9,

fwyzard · 2024-11-11T20:09:43Z

please test with #46657

cmsbuild · 2024-11-12T02:48:44Z

+1

Size: This PR adds an extra 12KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-49dcac/42740/summary.html
COMMIT: 3dd76b4
CMSSW: CMSSW_14_2_X_2024-11-11-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46658/42740/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 6 differences found in the comparisons
DQMHistoTests: Total files compared: 46
DQMHistoTests: Total histograms compared: 3343588
DQMHistoTests: Total failures: 419
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3343149
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 45 files compared)
Checked 202 log files, 172 edm output root files, 46 DQM output files
TriggerResults: no differences found

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 7
DQMHistoTests: Total histograms compared: 53031
DQMHistoTests: Total failures: 74
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 52957
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
Checked 24 log files, 30 edm output root files, 7 DQM output files
TriggerResults: no differences found

Fix BuildFiles, includes and comments

782f69c

cmsbuild added this to the CMSSW_14_2_X milestone Nov 11, 2024

cmsbuild added pending-signatures tests-pending orp-pending code-checks-pending heterogeneous-pending labels Nov 11, 2024

cmsbuild added code-checks-rejected and removed code-checks-pending labels Nov 11, 2024

fwyzard force-pushed the lock_free_allocator_142x branch from d9df89c to 3dd76b4 Compare November 11, 2024 13:13

cmsbuild added code-checks-pending and removed code-checks-rejected labels Nov 11, 2024

cmsbuild added code-checks-approved and removed code-checks-pending labels Nov 11, 2024

cmsbuild added tests-started and removed tests-pending labels Nov 11, 2024

cmsbuild added tests-rejected and removed tests-started labels Nov 11, 2024

cmsbuild added requires-external tests-started and removed tests-rejected labels Nov 11, 2024

cmsbuild added tests-approved and removed tests-started labels Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the caching memory allocator lock-free #46658

Make the caching memory allocator lock-free #46658

fwyzard commented Nov 11, 2024

cmsbuild commented Nov 11, 2024 •

edited

Loading

fwyzard commented Nov 11, 2024 •

edited

Loading

cmsbuild commented Nov 11, 2024

cmsbuild commented Nov 11, 2024

cmsbuild commented Nov 11, 2024

fwyzard commented Nov 11, 2024

fwyzard commented Nov 11, 2024

fwyzard commented Nov 11, 2024

fwyzard commented Nov 11, 2024

fwyzard commented Nov 11, 2024

cmsbuild commented Nov 11, 2024

fwyzard commented Nov 11, 2024

cmsbuild commented Nov 12, 2024

Make the caching memory allocator lock-free #46658

Are you sure you want to change the base?

Make the caching memory allocator lock-free #46658

Conversation

fwyzard commented Nov 11, 2024

PR description:

PR validation:

cmsbuild commented Nov 11, 2024 • edited Loading

fwyzard commented Nov 11, 2024 • edited Loading

cmsbuild commented Nov 11, 2024

cmsbuild commented Nov 11, 2024

cmsbuild commented Nov 11, 2024

fwyzard commented Nov 11, 2024

fwyzard commented Nov 11, 2024

fwyzard commented Nov 11, 2024

fwyzard commented Nov 11, 2024

fwyzard commented Nov 11, 2024

cmsbuild commented Nov 11, 2024

Build

fwyzard commented Nov 11, 2024

cmsbuild commented Nov 12, 2024

Comparison Summary

GPU Comparison Summary

cmsbuild commented Nov 11, 2024 •

edited

Loading

fwyzard commented Nov 11, 2024 •

edited

Loading