Skip to content

Commit

Permalink
Update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Firstyear committed May 6, 2021
1 parent 32b0c49 commit d8d2220
Show file tree
Hide file tree
Showing 4 changed files with 41 additions and 4 deletions.
45 changes: 41 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,47 @@ These operations are heavily used in low-level implementations of databases
for their indexing logic, but has applications with statistical analysis and
other domains that require logical set operations.

This seems very specific to only use u64, but has been chosen for a good reason. On
64bit cpus, native 64bit operations are faster than 32/16. Additionally,
due to the design of the library, unsigned types are simpler to operate
on for the set operations.
How Does It Work?
-----------------

Each set initially is "sparse". This is stored in the manner you expect historically,
using a `Vec<u64>` internally.

::

[ 0, 1, 2, 3, ... , 1024 ]

You can then call `maybe_compress` on the set, which will look at the content and determine
if it would be beneficial to compress this. When compressed, each value is transformed into
a tuple pair of `range` and `mask`. The range represents the starting value of this set of
64 values, and the mask determines if a value of that range is present. For example:

<p align="center">
<img src="https://raw.githubusercontent.com/Firstyear/idlset/master/static/idl_4.png" width="60%" height="auto" />
</p>

As these now contain a bit mask, we can use CPU operations for logical operations like `AND`, `OR` and
`AND NOT`. This example demonstrates an `AND` operation.

<p align="center">
<img src="https://raw.githubusercontent.com/Firstyear/idlset/master/static/idl_5.png" width="60%" height="auto" />
</p>

Due to this compression, on high density sets, memory is reduced, as is improvements to CPU cache
behaviour due to lower pressure on the caches. It also allows faster seeking through sets to determine
value presence.

<p align="center">
<img src="https://raw.githubusercontent.com/Firstyear/idlset/master/static/idl_6.png" width="60%" height="auto" />
</p>

During operations between compressed and uncompressed sets, the "better" choice of compressed or
uncompressed is preserved for the result set based on the inputs and operation performed.
In other words, the result set may be compressed or uncompressed
depending on the operation and it's interactions, to improve performance of subsequent operations.
This helps to carry forward these optimisation choices to result sets meaning that chained and
many operations over sets, and reduces memory consumption of intermediate set results during
operations.

Contributing
------------
Expand Down
Binary file added static/idl_4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/idl_5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/idl_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d8d2220

Please sign in to comment.