Replace uses of `std::shared_ptr` with shared ptr without atomic refcounts #1601

bbannier · 2023-11-27T08:52:03Z

Closes #1600.

awelzel · 2023-11-27T12:30:56Z

The microbenchmark from #1600 doesn't show the single vs multi-threading difference anymore. This branch appears to run a tiny bit slower than main when single-threaded though. I haven't looked why, maybe std::shared_ptr is overall faster than the boost version even when atomic instructions are in-use?

$ hyperfine -w 1 '/opt/zeek-dev-prod/bin/spicy-driver -f test-data.txt nested-prev.hlto' 'spicy-driver -f test-data.txt nested.hlto'
Benchmark 1: /opt/zeek-dev-prod/bin/spicy-driver -f test-data.txt nested-prev.hlto
  Time (mean ± σ):     12.137 s ±  0.209 s    [User: 12.112 s, System: 0.010 s]
  Range (min … max):   11.996 s … 12.704 s    10 runs
 
Benchmark 2: spicy-driver -f test-data.txt nested.hlto
  Time (mean ± σ):     12.360 s ±  0.158 s    [User: 12.347 s, System: 0.011 s]
  Range (min … max):   12.134 s … 12.529 s    10 runs
 
Summary
  '/opt/zeek-dev-prod/bin/spicy-driver -f test-data.txt nested-prev.hlto' ran
    1.02 ± 0.02 times faster than 'spicy-driver -f test-data.txt nested.hlto'

/opt/zeek-dev-prod/bin/spicy-driver being current main.

We heavily use `std::shared_ptr` in the runtime library in types which we expect to be cheap to copy, e.g., in views or safe iterators. By design `std::shared_ptr` has to use atomic ints for its strong and weak counts. Since parsing is single threaded the use of atomics can introduce overhead without any need. This patch replaces uses of `std::shared_ptr` in the runtime library with a shared_ptr without atomic refcounts. Since we still need weak pointers this has to be a slightly more complete replacement.

This uses our shared pointer with non-atomic refcounts from the runtime library elsewhere in Spicy, noticeably in the compiler. This might provide slight performance improvements on platforms where `std::shared_ptr`'s atomic refcounts made it slightly more expensive to copy.

bbannier · 2023-11-29T12:16:31Z

I benchmarked this PR against a huge real world Zeek analyzer we are developing at Corelight. In summary, it seems doubtful that this PR provides a clear benefit when running real analyzers in Zeek.

This branch vs. `main`

Before (main):

$ hyperfine -w 1 'zeek -Cr large.pcap ../build/spicy-modules/*.hlto ../analyzer/scripts'
Benchmark 1: zeek -Cr large.pcap ../build/spicy-modules/*.hlto ../analyzer/scripts
  Time (mean ± σ):     19.428 s ±  0.419 s    [User: 20.509 s, System: 1.326 s]
  Range (min … max):   18.884 s … 20.448 s    10 runs

After (this PR):

$ hyperfine -w 1 'zeek -Cr large.pcap ../build/spicy-modules/*.hlto ../analyzer/scripts'
Benchmark 1: zeek -Cr large.pcap ../build/spicy-modules/*.hlto ../analyzer/scripts
  Time (mean ± σ):     19.223 s ±  0.371 s    [User: 20.314 s, System: 1.310 s]
  Range (min … max):   18.774 s … 19.844 s    10 runs

I want to see that the mean runtime of this particular benchmark seems to improve, but the difference here was not significant.

This branch on top of #1590

Among other things in #1590 we also removed some unneeded copying, so we might expect to see different behavior of this PR since e.g., overall copying of Bytes instances might be reduced.

Before (#1590):

$ hyperfine -w 1 'zeek -Cr large.pcap ../build/spicy-modules/*.hlto ../analyzer/scripts'
Benchmark 1: zeek -Cr large.pcap ../build/spicy-modules/*.hlto ../analyzer/scripts
  Time (mean ± σ):     17.308 s ±  0.299 s    [User: 18.322 s, System: 1.331 s]
  Range (min … max):   16.860 s … 17.813 s    10 runs

After (this PR on top of #1590):

$ hyperfine -w 1 'zeek -Cr large.pcap ../build/spicy-modules/*.hlto ../analyzer/scripts'
Benchmark 1: zeek -Cr large.pcap ../build/spicy-modules/*.hlto ../analyzer/scripts
  Time (mean ± σ):     17.578 s ±  0.335 s    [User: 18.607 s, System: 1.297 s]
  Range (min … max):   17.120 s … 18.134 s    10 runs

The trend of the means seems to be reversed here, i.e., the new "improved" smart pointer implementation seems to introduce new overhead, but the result again is not significant.

bbannier · 2023-12-11T09:39:31Z

We won't move forward with this ATM.

bbannier self-assigned this Nov 27, 2023

bbannier force-pushed the topic/bbannier/issue-1600 branch from 1640ee7 to e03c0c7 Compare November 27, 2023 08:58

bbannier force-pushed the topic/bbannier/issue-1600 branch from e03c0c7 to 0756284 Compare November 28, 2023 12:28

bbannier added 2 commits November 28, 2023 13:55

bbannier force-pushed the topic/bbannier/issue-1600 branch from 0756284 to 80f78bc Compare November 28, 2023 12:55

bbannier closed this Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace uses of `std::shared_ptr` with shared ptr without atomic refcounts #1601

Replace uses of `std::shared_ptr` with shared ptr without atomic refcounts #1601

bbannier commented Nov 27, 2023

awelzel commented Nov 27, 2023

bbannier commented Nov 29, 2023

bbannier commented Dec 11, 2023

Replace uses of std::shared_ptr with shared ptr without atomic refcounts #1601

Replace uses of std::shared_ptr with shared ptr without atomic refcounts #1601

Conversation

bbannier commented Nov 27, 2023

awelzel commented Nov 27, 2023

bbannier commented Nov 29, 2023

This branch vs. main

This branch on top of #1590

bbannier commented Dec 11, 2023

Replace uses of `std::shared_ptr` with shared ptr without atomic refcounts #1601

Replace uses of `std::shared_ptr` with shared ptr without atomic refcounts #1601

This branch vs. `main`