Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full IntSets? #999

Open
meooow25 opened this issue Apr 6, 2024 · 5 comments
Open

Full IntSets? #999

meooow25 opened this issue Apr 6, 2024 · 5 comments

Comments

@meooow25
Copy link
Contributor

meooow25 commented Apr 6, 2024

I wonder if it will be useful to add another constructor to IntSet: Full Prefix. This indicates that this is a full tree with a shared prefix.

I expect this to show good improvements for use cases where contiguous runs of Ints are likely. It will also reduce space usage in such cases. For instance, to store [0..n] we will need only $O(\log n)$ memory instead of $O(n)$.

But this would add small overheads in various places, so it will likely not be a clear win in all use cases.
Also, to be clear, this would be small added costs and not a change in the worst case bounds.

For sure we would want to test something like this out on some real world use cases (like GHC, maybe others?) to see how it affects things. Just documenting this idea for now.

@alexfmpe
Copy link
Contributor

alexfmpe commented Apr 6, 2024

Think this could technically be represented with existing constructors

-- Invariant: The Mask is a power of 2. It is the largest bit position at which
-- two elements of the set differ.

If the lowest bit in the Mask is set, that means there's two elements in the full sub-tree sharing all but the last bit. If a higher bit in the Mask is set, the lower ones are ignored so maybe one could have

isFullTree = mask .&. 1 == 1
fullTree prefix mask = Bin prefix (mask .|. 1) Tip Tip

This saves a constructor entry at the cost of 2 Tip in the full sub-tree case (which is still better than an actually full tree) but might require more bit fiddling to account for the possibly-set lower bit in the existing operations.

#998 would affect the bit fiddling costs of this I guess.

@meooow25
Copy link
Contributor Author

meooow25 commented Apr 7, 2024

Ah sure, it could be done by assigning the case to some currently invalid configuration. But what is the benefit of keeping the number of constructors at 3?

We can even reduce it to 2 today if we want, by replacing Nil with, say, Tip 0 0. Again, I don't know if or why that would be good.

@treeowl
Copy link
Contributor

treeowl commented Apr 7, 2024

Replacing Nil with Tip 0 0 is an interesting idea. I wonder how that would perform.

@alexfmpe
Copy link
Contributor

alexfmpe commented Apr 7, 2024

But what is the benefit of keeping the number of constructors at 3?

My thinking was it might decrease the amount of branch prediction needed in some cases, (assuming the bit fiddling doesn't eclipse benefits), but it also depends on what constructor matching gets compiled to and I don't know how ghc behaves there.

@meooow25
Copy link
Contributor Author

meooow25 commented Jun 6, 2024

I found an implementation of this: https://hackage.haskell.org/package/intset

Haven't looked into how it compares performance-wise yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants