-
Notifications
You must be signed in to change notification settings - Fork 760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] cache secret array in assembly code for aarch64 SVE #693
Comments
Ok I'm having a little trouble following the assembly code (I'll need some time to digest it 😅), but by caching do you mean something like a ring buffer? void XXH3_accumulate(...)
{
u64 secret_cache[8];
for (int i = 0; i < 8; i++) {
secret_cache[i] = XXH_read64(secret);
secret += XXH_SECRET_CONSUME_RATE;
}
int cache_idx = 0;
for (int i = 0; i < nbStripes; i++) {
for (int j = 0; j < 8; j++) {
XXH_accumulate_512_round(..., secret_cache[(cache_idx + j) % 8];
}
secret_cache[cache_idx] = XXH_read64(secret);
secret += XXH_SECRET_CONSUME_RATE;
cache_idx = (cache_idx + 1) % 8;
...
}
} Edit I see it now. |
No, it's not a ring buffer. There're 31 Z registers in SVE. So use some to them to store the value of secret array. |
@Cyan4973 @easyaspi314
I try to cache a whole secret array in assembly code. And the performance is better. The maximum performance data could read 18xxx. How do you think about this idea? (https://github.com/hzhuang1/xxHash/tree/dirty_v0.2.1)
Of course, current code is still ugly. Let's discuss the idea first.
=== benchmarking 4 hash functions ===
benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27)
xxh3 , 3516, 6130, 9465, 12771, 15576, 17283, 18393, 14563, 13763, 13596, 13689, 13672, 13775, 12043, 5684, 4912, 4834, 4923, 4874
XXH32 , 1338, 1436, 1486, 1521, 1534, 1542, 1546, 1536, 1505, 1507, 1507, 1507, 1508, 1458, 1264, 1206, 1213, 1212, 1208
XXH64 , 2509, 2799, 2973, 3066, 3122, 3102, 3154, 3136, 3057, 3063, 3066, 3066, 3064, 2899, 2197, 2005, 2009, 2007, 1993
XXH128 , 3392, 5864, 9421, 12495, 15354, 17187, 18306, 14565, 13733, 13822, 14105, 14172, 14208, 12229, 5779, 5016, 4898, 4898, 4851
Throughput small inputs of fixed size (from 2 to 2 bytes):
xxh3 , 72821732
The text was updated successfully, but these errors were encountered: