Supporting keys > 127 bytes long #42

scottcarey · 2019-11-21T21:09:20Z

HaloDB encodes key length as a signed byte on disk, and in memory. It rejects keys longer than 127 bytes.

Although most of my string keys in different databases are between 10 and 80 bytes, I have some rare outliers as large as 780 bytes. Every other K/V store I work with (I have an API abstraction that wraps about 10) can support large keys; only one other breaks at 512 bytes.

There are some options for longer keys:

Easy: read the byte as an unsigned value, and then sizes from 0 to 255 are supported. However, this will make it even harder to support larger keys.

Harder: Allow for larger sizes, perhaps up to 2KB.

Index/Record/Tombstone files: Steal the top 3 bits from the version byte. Since the version byte is currently 0, new code versions would interpret existing data files the same ( top 3 bits of existing version byte | key size byte ). Old code would interpret any 'extended' keys as a version mismatch and thus still be safe. Therefore I think this can remain version 0. If versions were to get up to 32, a different format would be needed at that time.
SegmentWithMemoryPool: No change for now, it will not support key sizes larger than its configured fixedKeySize which would be 127 or less. It could be extended to support key overflow when fixedKeySize is set to 8 or larger. In this case, when the key length is larger than fixedKeySize, then the slot holds a pointer to extended key data, plus whatever prefix of the key fits in the remaining slot (fixedKeySize - 8). An alternative when fixedKeySize is large enough is to keep a portion of the key hash in this area as well, so that the pointer to the extended key data does not need to be followed for most lookups. Even just one byte of the hash that was not used for accessing the hash bucket would decrease the chance that the pointer is followed by a factor of 256 on a miss.
SegmentNonMemoryPool: Since all hash entries are individually allocated, (it appears to be closed addressing with a linked chain of entries), the allocated entry in memory can either use a variable length integer encoding for the key/value lengths, or a constant two bytes for the key.

The text was updated successfully, but these errors were encountered:

scottcarey · 2019-12-05T23:02:54Z

I have a PR available now (PR #49)

The solution I came to for the SegmentWithMemoryPool differs from my ideas above. I used the existing chain structure within the slots inside the memory pool to store additional 'overflow' key data.

An alternate solution might be to 'branch' the chain so that the additional key data is in a branch off of the main linked list, instead of inlined into it. This however, requires that 'fixedKeyLength' be at least 4, or else there will be no room in the pool slot for the additional pointer.

The drawback is that if significantly large keys (> 66 bytes) are in use at the same time as there is a large load factor, lookups will have to regularly skip past existing keys that take up 3 or more links in the chain, slowing lookups down. But if load factors are 'normal' -- below 0.75 -- it probably won't make much of a difference. At load factor 0.75, the average chain length of a filled table is about 1.5, and so about half the lookups will have an additional key to skip over when searching, and half the time it will be the second key, so about 25% of the time a successful lookup will need to follow 2 more linked list links. Theoretically this is about a 12% increase in slots scanned, at the cost of not being able to support large keys unless fixedKeySize is at least 4.

So I think that is open to discussion. I could also attempt such a variant, and see how much better it performs on my large key workload. But that will take some time to build.
(I have one from a 7.2GB data set using 60 to 600 byte keys -- most under 90 bytes, that I've been running benchmarks with -- my other datasets are 8 to 12 byte keys and 8.3GB and one with 2.1GB with 4 to 70 byte keys, mostly less than 20).

The overall performance is about 3% (non pooled) to 6% (pooled) better than before for query throughput with my data sets.

wangtao724 mentioned this issue Dec 26, 2019

Max key length is hardcoded to Byte.MAX_VALUE (127) #26

Closed

bellofreedom added the enhancement New feature or request label Dec 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting keys > 127 bytes long #42

Supporting keys > 127 bytes long #42

scottcarey commented Nov 21, 2019 •

edited

scottcarey commented Dec 5, 2019

Supporting keys > 127 bytes long #42

Supporting keys > 127 bytes long #42

Comments

scottcarey commented Nov 21, 2019 • edited

scottcarey commented Dec 5, 2019

scottcarey commented Nov 21, 2019 •

edited