Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative optimisations for insertion #980

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

HuwCampbell
Copy link

@HuwCampbell HuwCampbell commented Jan 11, 2024

This is to open the conversation. I haven't looked at the strict version or gone 100% at this stage.

I'm not convinced that the (unsafe) optimisations for identical pointers is worth the complexity, and it hurts performance for inserts of new elements and the replacement of elements compared to this version.

Here, instead of checking pointer equality, we instead pass upwards whether a rebalance is required.

Old
  insert absent:                  OK (5.11s)
    309  μs ± 9.5 μs
  insert present:                 OK (2.05s)
    249  μs ±  21 μs
  insert alternate:               OK (2.09s)
    253  μs ±  21 μs
  insertWith absent:              OK (2.61s)
    313  μs ±  16 μs
  insertWith present:             OK (2.16s)
    260  μs ±  22 μs

New
  insert absent:                  OK (2.90s)
    351  μs ±  14 μs
  insert present:                 OK (3.20s)
    194  μs ±  16 μs
  insert alternate:               OK (3.18s)
    192  μs ±  11 μs
  insertWith absent:              OK (1.43s)
    345  μs ±  31 μs
  insertWith present:             OK (1.66s)
    200  μs ±  14 μs

So inserting when the key is missing (common) or a new item at the same key (also common), are faster, while only literally inserting the same identical item (i.e., a no-op, probably less common) is slower.

I've redone the benchmarks, adding some extra calls to rnf in the suite. It looks like insert is about 22% faster if the key is matched and 14% slower if it's not.

I've also done the same trick for insertWith, where it's closer to 25% faster and 10% slower respectively.

For insertR, I've instead manually built the continuation, and just return the top level map if the key is found. I'm not sure if this helps much at this stage; union uses it, but it doesn't lean on it too hard.

The current logic uses unsafe pointer equality to determine if
a rebalance is required, but this case is niche, and we can instead
pass this information upwards.

This performs well in benchmarks.
@HuwCampbell HuwCampbell changed the title Perf/lazy insert ptr removal Alternative optimisations for insertion, Jan 11, 2024
@HuwCampbell HuwCampbell changed the title Alternative optimisations for insertion, Alternative optimisations for insertion Jan 11, 2024
The pointer equality is used to check if we can not do anything when
we retraverse back up the chain, but, if we turn it around and use
an explicit continuation instead, we can just choose to not run it.
@treeowl
Copy link
Contributor

treeowl commented Jan 12, 2024

I'm surprised you're getting good results with explicit continuations. Do you know how GHC is transforming those?

@HuwCampbell
Copy link
Author

HuwCampbell commented Jan 12, 2024

Yeah I couldn't really see any differences in the union benchmarks (maybe one was marginally faster?). From trying an explicit continuation on the other inserts, I expect that it is slower when there is a value inserted.

I haven't looked at the core yet.

I also considered just throwing a sentinel exception if EQ is found, then catching and returning the top map. It would probably be fast, but exceptions for control flow are a bit gross.

(Actually, delimited continuations are in GHC now, could use those).

@@ -84,6 +86,9 @@ main = do
, bench "mapMaybeWithKey" $ whnf (M.mapMaybeWithKey (const maybeDel)) m
, bench "lookupIndex" $ whnf (lookupIndex keys) m
, bench "union" $ whnf (M.union m_even) m_odd
, bench "union_identical" $ whnf (M.union m_even) m_even
, bench "union_sparse" $ whnf (M.union m_even) m_sparse
, bench "union_into_sparse" $ whnf (M.union m_sparse) m_even
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more with m_sparse and m_odd would show if there's a slowdown in insertR.

@treeowl
Copy link
Contributor

treeowl commented Jan 12, 2024 via email

@HuwCampbell
Copy link
Author

I'm not as concerned with insertR as insert and insertWith.

I guess the question is do we want a 10-15% sacrifice on insertion when the key in not already present but a 20-30% improvement when it is?

How would that decision be made?

@meooow25
Copy link
Contributor

meooow25 commented Oct 6, 2024

I'm not convinced that the (unsafe) optimisations for identical pointers is worth the complexity, and it hurts performance for inserts of new elements and the replacement of elements compared to this version.

So inserting when the key is missing (common) or a new item at the same key (also common), are faster, while only literally inserting the same identical item (i.e., a no-op, probably less common) is slower.

This sounds reasonable. The current code seems optimized for a very rare case. We have a comment acknowledging the same:

-- Unlike insertR, we only get sharing here
-- when the inserted value is at the same address
-- as the present value. We try anyway; this condition
-- seems particularly likely to occur in 'union'.

But why is it likely in union? It would happen when unioning two maps with a large overlap in shared entries. That again seems like a rare situation.


There are a few variants of insert we can try:

A. Pointer equality (current)
B. Return a bool which controls rebalancing (this PR)
C. No tricks
D. Both A and B (let's not go there)

Curiously, strict map uses C instead of A:

case compare kx ky of
LT -> balanceL ky y (go kx x l) r
GT -> balanceR ky y l (go kx x r)
EQ -> Bin sz kx x l r

And there are 3 situations that can occur:

  1. It's a new key. C is the best here, A and B waste work.
  2. It's an existing key but the value is new. B is the best here. How likely is this? Perhaps not too unlikely, shared key objects might be around.
  3. It's an existing key and existing value. C is the best here, and B should also help. This seems rather unlikely.

So, there's a bunch of benchmarking involved, if you want to figure this out.
For union benchmarks you can use the set-operations-map, it covers various sets of inputs.
Related to the history of the current code: #315, #416. Though I don't see any new information that is not already documented.
Also, let's set aside insertR for now. That one doesn't seem too problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants