bpf: use FNV-1a variant for ratelimit hash#4719
Conversation
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
75cab46 to
9247725
Compare
The existing logic used the first 40 arg bytes concatenated as the ratelimit cache key. One difficulty with this approach is that file paths often share substantial prefixes causing spurious drops for distinct events. In order to reduce false negatives for these cases, we can hash argument contents instead. FNV-1a is a very simple non-cryptographic hash function. It accumulates each byte of input using an XOR and a multiply. As a result, its code size is ~10x smaller than comparable hash functions totaling just 35 bpf instructions. While not the fastest, it may be a reasonable tradeoff in service of reducing inaccurate deduplication.
FNV-1a is a very simple non-cryptographic hash algorithm. It accumulates
each byte of input using an XOR and a multiply. As a result, its code
size is ~10x smaller than comparable hash functions.
However byte-by-byte hash functions often struggle over large input.
Here, FNV-1a is nearly 10x slower than a comparable algorithm, Murmur
Hash, for 1K inputs (about the upper bound of what we'd care about
here).
Word-by-word hash functions like Murmur Hash can be much faster but risk
insufficient "mixing" across bits within each word. To provide this
bit-mixing, Murmur Hash defines a final mixer ("fmix").
A relatively common way of repurposing a byte-by-byte hasher to word
size is to add a mixer and doing so here with "fmix" on top of FNV-1a
has great results with no loss of collision-resistence, randomness, or
skewedness.
Code size:
FNV-1a 8mix - 100 ins
FNV-1a - 35 ins
MurmurHash3 - 350 ins
Jenkins - 400 ins
1K Hash Speed:
FNV-1a 8mix - 150 ns
FNV-1a - 1500 ns
MurmurHash3 - 175 ns
Jenkins - 2500 ns
forge-parent: tkvkksyznwtp
9247725 to
5923a04
Compare
|
bump |
There was a problem hiding this comment.
Hey 👋 so I think the reasoning behind the PR seems to make sense, I lack a bit of context on this and would need to investigate a bit more on how rate limit currently work.
Anyway I haven't verified this but if it's true, it totally makes sense to hash here.
The existing logic used the first 40 arg bytes concatenated as the
ratelimit cache key. One difficulty with this approach is that file
paths often share substantial prefixes causing spurious drops for
distinct events. In order to reduce false negatives for these cases, we
can hash argument contents instead.
Could @kevsecurity take a look and say what he thinks about that? We also have performance implications here on that hotpath?
Also I'd reorganize a bit your PR maybe:
- first patch: introduce your new BPF function for hashing
- second patch: wire your new hashing function into the rate limit feature
|
Thanks @mtardy! I'd be happy to split it up. Was especially interested to see CI results here but it looks like it may have broken a bunch of tests 🙁 . I was only able to get it working on an older kernel (early 5.X?) using --force-small-progs so that may be a issue that needs addressing. |
yeah that's tricky, feel free to push updates here I'll try to click the button so that the CI runs. But to make sure kernel versions works for you, I'd use cilium/little-vm-helper, the README will explain how to download the images used in the CI and with host-mount and port options it's easy to SSH into the machine and share the folder you are working on (like tetragon) to run inside the VM: On x86_64 that would be something like that (see the readme for more info) |
FNV-1a is a very simple non-cryptographic hash algorithm. It accumulates
each byte of input using an XOR and a multiply. As a result, its code
size is ~10x smaller than comparable hash functions.
However byte-by-byte hash functions often struggle over large input.
Here, FNV-1a is nearly 10x slower than a comparable algorithm, Murmur
Hash, for 1K inputs (about the upper bound of what we'd care about
here).
Word-by-word hash functions like Murmur Hash can be much faster but risk
insufficient "mixing" across bits within each word. To provide this
bit-mixing, Murmur Hash defines a final mixer ("fmix").
A relatively common way of repurposing a byte-by-byte hasher to word
size is to add a mixer and doing so here with "fmix" on top of FNV-1a
has great results with no loss of collision-resistence, randomness, or
skewedness.
Code size:
FNV-1a 8mix - 100 ins
FNV-1a - 35 ins
MurmurHash3 - 350 ins
Jenkins - 400 ins
1K Hash Speed:
FNV-1a 8mix - 150 ns
FNV-1a - 1500 ns
MurmurHash3 - 175 ns
Jenkins - 2500 ns