diff --git a/README.md b/README.md index cb32ee7..fffbda8 100644 --- a/README.md +++ b/README.md @@ -13,10 +13,47 @@ These operations are heavily used in low-level implementations of databases for their indexing logic, but has applications with statistical analysis and other domains that require logical set operations. -This seems very specific to only use u64, but has been chosen for a good reason. On -64bit cpus, native 64bit operations are faster than 32/16. Additionally, -due to the design of the library, unsigned types are simpler to operate -on for the set operations. +How Does It Work? +----------------- + +Each set initially is "sparse". This is stored in the manner you expect historically, +using a `Vec` internally. + +:: + + [ 0, 1, 2, 3, ... , 1024 ] + +You can then call `maybe_compress` on the set, which will look at the content and determine +if it would be beneficial to compress this. When compressed, each value is transformed into +a tuple pair of `range` and `mask`. The range represents the starting value of this set of +64 values, and the mask determines if a value of that range is present. For example: + +

+ +

+ +As these now contain a bit mask, we can use CPU operations for logical operations like `AND`, `OR` and +`AND NOT`. This example demonstrates an `AND` operation. + +

+ +

+ +Due to this compression, on high density sets, memory is reduced, as is improvements to CPU cache +behaviour due to lower pressure on the caches. It also allows faster seeking through sets to determine +value presence. + +

+ +

+ +During operations between compressed and uncompressed sets, the "better" choice of compressed or +uncompressed is preserved for the result set based on the inputs and operation performed. +In other words, the result set may be compressed or uncompressed +depending on the operation and it's interactions, to improve performance of subsequent operations. +This helps to carry forward these optimisation choices to result sets meaning that chained and +many operations over sets, and reduces memory consumption of intermediate set results during +operations. Contributing ------------ diff --git a/static/idl_4.png b/static/idl_4.png new file mode 100644 index 0000000..35e0fcc Binary files /dev/null and b/static/idl_4.png differ diff --git a/static/idl_5.png b/static/idl_5.png new file mode 100644 index 0000000..9e68d60 Binary files /dev/null and b/static/idl_5.png differ diff --git a/static/idl_6.png b/static/idl_6.png new file mode 100644 index 0000000..90652ab Binary files /dev/null and b/static/idl_6.png differ