Skip to content

Commit 7d72c96

Browse files
authored
Fix: OOB in sz_find_skylake / sz_rfind_skylake tail for null-byte needles (#312)
In the tail section of `sz_find_skylake` and `sz_rfind_skylake`, masked loads (`_mm512_maskz_loadu_epi8`) zero out bytes beyond the valid haystack range. The subsequent `_mm512_cmpeq_epi8_mask` comparisons are unmasked — they compare all 64 lanes including the zeroed masked-off positions. When the needle characters selected by `sz_locate_needle_anomalies_` are all `\0` (e.g., needle = "\0\0\0\0"), the zeroed lanes falsely match, producing spurious bits in `matches` at offsets beyond `h_length - n_length`. For `n_length <= 3`, this returns an out-of-bounds pointer without any validation. For `n_length > 3`, `sz_equal_skylake` reads past the haystack boundary, causing a heap-buffer-overflow. The fix is to AND `matches` with `mask` (the valid-position bitmask) before entering the match loop, filtering out spurious matches from masked-off positions. The same fix is applied to both `sz_find_skylake` and `sz_rfind_skylake`. Other implementations (serial, westmere, haswell, neon) are not affected because they fall back to `sz_find_serial` for the tail instead of using AVX-512 masked loads. Reproducer: sz_find_skylake("AAAAAAAAAA", 10, "\0\0", 2) // Returns offset 9 (OOB) instead of NULL Found via ClickHouse CI AST fuzzer (MSan build): SELECT count() FROM system.schema_inference_cache WHERE toNullable(65537) > countSubstrings(source, '\0\0\0\0') Co-authored-by: Raúl Marín <664253+Algunenano@users.noreply.github.com>
1 parent d126d12 commit 7d72c96

2 files changed

Lines changed: 13 additions & 0 deletions

File tree

include/stringzilla/find.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1374,6 +1374,7 @@ SZ_PUBLIC sz_cptr_t sz_find_skylake(sz_cptr_t h, sz_size_t h_length, sz_cptr_t n
13741374
_mm512_cmpeq_epi8_mask(h_first_vec.zmm, n_first_vec.zmm),
13751375
_mm512_cmpeq_epi8_mask(h_mid_vec.zmm, n_mid_vec.zmm)),
13761376
_mm512_cmpeq_epi8_mask(h_last_vec.zmm, n_last_vec.zmm));
1377+
matches &= mask;
13771378
while (matches) {
13781379
int potential_offset = sz_u64_ctz(matches);
13791380
if (n_length <= 3 || sz_equal_skylake(h + potential_offset, n, n_length)) return h + potential_offset;
@@ -1456,6 +1457,7 @@ SZ_PUBLIC sz_cptr_t sz_rfind_skylake(sz_cptr_t h, sz_size_t h_length, sz_cptr_t
14561457
_mm512_cmpeq_epi8_mask(h_first_vec.zmm, n_first_vec.zmm),
14571458
_mm512_cmpeq_epi8_mask(h_mid_vec.zmm, n_mid_vec.zmm)),
14581459
_mm512_cmpeq_epi8_mask(h_last_vec.zmm, n_last_vec.zmm));
1460+
matches &= mask;
14591461
while (matches) {
14601462
int potential_offset = sz_u64_clz(matches);
14611463
if (n_length <= 3 || sz_equal_skylake(h + 64 - potential_offset - 1, n, n_length))

scripts/test_stringzilla.cpp

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4195,6 +4195,17 @@ void test_search_with_misaligned_repetitions() {
41954195
test_search_with_misaligned_repetitions("abc\0", "abc\0");
41964196
test_search_with_misaligned_repetitions("abcd\0", "abcd");
41974197

4198+
// When searching for all-null needles in a haystack with no null bytes.
4199+
// This exercises the SIMD tail path where masked-off lanes are zeroed:
4200+
// if the needle characters are also zero, spurious matches appear at
4201+
// invalid offsets beyond the haystack, causing OOB reads.
4202+
test_search_with_misaligned_repetitions("a", {"\0\0", 2});
4203+
test_search_with_misaligned_repetitions("a", {"\0\0\0", 3});
4204+
test_search_with_misaligned_repetitions("a", {"\0\0\0\0", 4});
4205+
test_search_with_misaligned_repetitions("a", {"\0\0\0\0\0", 5});
4206+
test_search_with_misaligned_repetitions("abcd", {"\0\0", 2});
4207+
test_search_with_misaligned_repetitions("abcd", {"\0\0\0\0", 4});
4208+
41984209
// When haystack is formed of equidistant needles:
41994210
test_search_with_misaligned_repetitions("ab", "a");
42004211
test_search_with_misaligned_repetitions("abc", "a");

0 commit comments

Comments
 (0)