Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more BTreeMap like apis to LiteMap #6068

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

arthurprs
Copy link
Contributor

@arthurprs arthurprs commented Feb 5, 2025

Add BTreeMap-like APIs to LiteMap to get closer to the std lib BTreeMap, which has familiarity benefits and allows easier integration of LiteMap into existing projects. This is a follow up of #5894

  • Extend is based on insert, which can degrade to quadratic in theory. But it should be fine for the map sizes recommended for LiteMap.
    • Note that this is already the case for the existing FromIterator implementation. It's maybe worth looking into a sort() + dedup() kind of implementation if it falls out of the fast-path
  • Even though try_get_or_insert exists, the Entry API is a more familiar API with good performance, as it may avoid a potentially wasted second binary search.

Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've been missing these! A couple small comments

utils/litemap/src/map.rs Show resolved Hide resolved
utils/litemap/src/map.rs Show resolved Hide resolved
@arthurprs arthurprs changed the title Add/change BTreeMap like apis to LiteMap Add more BTreeMap like apis to LiteMap Feb 5, 2025
@sffc
Copy link
Member

sffc commented Feb 5, 2025

I'm all in favor of Entry APIs

I've pushed back for a long time on a generic extend function because it is a performance footgun; I think it is even worse than O(N^2) in the worst case; could be O(N^2 log N) (because there are N insertions, and each insertion is N log N, with log N for the lookup and N for the array insert).

FromIterator is O(N log N) which is the optimal complexity: it collects things into a Vec and then sorts the Vec, and it uses the same Vec for its eventual storage.

We have extend_from_litemap which does the more efficient O(N) thing since it knows the input is sorted, and I would support adding extend_from_btreemap since we can make a similar assumption there. But I lean toward not having a O(N^2 log N) generic extend function.

@Manishearth
Copy link
Member

I agree around extend, it could be extend_sorted() which takes an ExactSizeIterator, perhaps, but that's about it.

@sffc
Copy link
Member

sffc commented Feb 5, 2025

If someone wants to use .extend(collection), they are better off doing .extend_from_litemap(collection.into_iter().collect()). There is an extra allocation, but the asymptotic performance is much faster.

If we did add .extend, I would want it to be implemented like that, I think.

@arthurprs
Copy link
Contributor Author

I agree that extend is tricky, but I was hoping that we could add a sort + dedup-based version to curb the worst case while allowing good drop-in experience.

I agree around extend, it could be extend_sorted() which takes an ExactSizeIterator, perhaps, but that's about it.

Let's add that, sounds good.

FromIterator is O(N log N) which is the optimal complexity: it collects things into a Vec and then sorts the Vec, and it uses the same Vec for its eventual storage.

I'm afraid I could be missing something obvious, but I don't think there is a sort step right now and it's N² worst case.

@arthurprs arthurprs requested a review from a team as a code owner February 5, 2025 23:28
@sffc
Copy link
Member

sffc commented Feb 5, 2025

I see, I thought FromIterator was N log N but it isn't currently implemented that way. Maybe it should be.

fn lm_sort_from_iter<I: IntoIterator<Item = (K, V)>>(iter: I) -> Self {

@arthurprs
Copy link
Contributor Author

arthurprs commented Feb 5, 2025

I fixed the quadratic behavior and implemented from_iter and extend as described in the first comment. A sorted fast-path with fallback to sort+dedup.

I benchmarked and I think it's a good direction. Let me know if you agree with this extend version otherwise I can take it out. Edit: duplications can still lead to quadratic behavior, so I'm going remove extend.

Before
litemap/from_iter/small 460ns
litemap/from_iter/large 1.4s
litemap/from_iter_sorted/small 49ns
litemap/from_iter_sorted/large 476us

After
litemap/from_iter/small 217ns
litemap/from_iter/large 15ms
litemap/from_iter_sorted/small 46ns
litemap/from_iter_sorted/large 427us
litemap/extend/large 39ms
litemap/extend_via_litemap/large 1.19s

@arthurprs
Copy link
Contributor Author

arthurprs commented Feb 6, 2025

Once this PR is sorted, I'll open another PR to improve the serde deserialization and avoid quadratic worst case whenever possible. As it stands it's a potential attack vector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants