-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Method advanceIfNeeded
returns a count of skipped items
#562
Comments
It would almost surely require additional non-trivial computations. The function And given that current users do not need this functionality, it does not seem like a good idea to modify the existing It would be more reasonable to implement a new method, e.g., A pull request is invited. |
You can do this already with an iterator and call rank for each value yielded by the iterator. Unfortunately the loop would have quadratic complexity. A new kind of skippable iterator which produces the value and the rank together, could be generally useful. |
Thank you both for replying. I'm not familiar with the internals of the containers, so I'm not able to prepare a pull request. I thought the information of container cardinality and hi-low bounds are readily available in the containers so that you can switch among the types (array/bitmap/run). And therefore, when you advance over entire container, you know what has been skipped. But this idea surely comes with my lack of knowledge - you probably change the type of the container during add/remove operations which require "looking into" the data of the container. |
Yes. That part is easy. But you don't typically skip over an entire containers. So think about a bitset... it is just an array of bits... |
When I think about it - the reason I thought |
@novoj That would be a bug. If you provide a buffer, it should fill it up as much as possible. If you can identify such a case, please report it (as a distinct issue). |
IIRC that iterator yields at container boundaries, but returns how many elements were filled. |
I observed the behaviour confirmed by @richardstartin - when iterating RoaringBitmap with 5000 elements using int buffer of size 500 elements I usually need more than 10 iterations to go through. But I can do it in a safe way thanks to the information about the peak record filled in the buffer. I didn't consider this a bug but a feature :). |
Ah. Interesting. |
Is your feature request related to a problem? Please describe.
My use-case is that I have a very large array of objects which contain an ID in their property and I need to reduce the array to a smaller one with elements which id match some requested set. I may do a binary search and use a "comparator" on id property in the original array, but each comparison will require "hop" to a memory location where the object resides to retrieve the id property (which is rather slow).
That's why I created a RoaringBitmap with the ids that are extracted from the records in the large array and search ids in it using
org.roaringbitmap.RoaringBatchIterator
. The problem is that I need to "materialize" all records in RoaringBitmap via.nextBatch
in order to count all examined elements of the RoaringBitmap. I'd like to useadvanceIfNeeded
approach to skip entire segments if the currently looked up id is not within them. The problem is that theadvanceIfNeeded
method doesn't return the count of elements, that has been skipped which is neccessary for me to compute the correct index in the original "large array of fat objects".Describe the solution you'd like
Adding a new method or changing the return type of
advanceIfNeeded
toint
(the latter would probably break backward binary compatibility of RoaringBitmaps) that would represent a number of elements that has been skipped during "fast forward" of the iterator.Does it make sense to you?
Code snippet reflecting the situation
The text was updated successfully, but these errors were encountered: