Skip to content

perf: Improve Xor method performance by ~20% for big sets #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 16, 2025

Conversation

romshark
Copy link
Contributor

@romshark romshark commented Feb 16, 2025

Handling larger bitsets in 8-batches is more efficient on modern CPUs.
I assume it's related to instruction-level parallelism.
This technique can effectively be applied to most bitset methods and functions.

goos: darwin
goarch: arm64
pkg: github.com/KernelPryanic/bitmask
cpu: Apple M1 Max
                    │   old.txt   │              new.txt               │
                    │   sec/op    │   sec/op     vs base               │
BitSet_Xor/empty-10   2.498n ± 4%   2.493n ± 3%        ~ (p=0.372 n=6)
BitSet_Xor/5-10       2.491n ± 1%   2.492n ± 1%        ~ (p=0.729 n=6)
BitSet_Xor/10k-10     76.10n ± 1%   49.79n ± 1%  -34.57% (p=0.002 n=6)
BitSet_Xor/1m-10      8.453µ ± 0%   5.112µ ± 1%  -39.52% (p=0.002 n=6)
geomean               44.73n        35.46n       -20.72%

                    │   old.txt    │              new.txt               │
                    │     B/op     │    B/op     vs base                │
BitSet_Xor/empty-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/5-10       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/10k-10     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/1m-10      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                          ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                    │   old.txt    │              new.txt               │
                    │  allocs/op   │ allocs/op   vs base                │
BitSet_Xor/empty-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/5-10       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/10k-10     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/1m-10      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                          ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Handling larger bitsets in 8-batches is more efficient on modern CPUs.
I assume it's related to instruction-level parallelism.

goos: darwin
goarch: arm64
pkg: github.com/KernelPryanic/bitmask
cpu: Apple M1 Max
                    │   old.txt   │              new.txt               │
                    │   sec/op    │   sec/op     vs base               │
BitSet_Xor/empty-10   2.498n ± 4%   2.493n ± 3%        ~ (p=0.372 n=6)
BitSet_Xor/5-10       2.491n ± 1%   2.492n ± 1%        ~ (p=0.729 n=6)
BitSet_Xor/10k-10     76.10n ± 1%   49.79n ± 1%  -34.57% (p=0.002 n=6)
BitSet_Xor/1m-10      8.453µ ± 0%   5.112µ ± 1%  -39.52% (p=0.002 n=6)
geomean               44.73n        35.46n       -20.72%

                    │   old.txt    │              new.txt               │
                    │     B/op     │    B/op     vs base                │
BitSet_Xor/empty-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/5-10       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/10k-10     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/1m-10      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                          ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                    │   old.txt    │              new.txt               │
                    │  allocs/op   │ allocs/op   vs base                │
BitSet_Xor/empty-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/5-10       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/10k-10     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/1m-10      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                          ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean
@KernelPryanic KernelPryanic merged commit 63daa84 into KernelPryanic:main Feb 16, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants