Alternative optimisations for insertion #980

HuwCampbell · 2024-01-11T08:36:16Z

This is to open the conversation. I haven't looked at the strict version or gone 100% at this stage.

I'm not convinced that the (unsafe) optimisations for identical pointers is worth the complexity, and it hurts performance for inserts of new elements and the replacement of elements compared to this version.

Here, instead of checking pointer equality, we instead pass upwards whether a rebalance is required.

Old
  insert absent:                  OK (5.11s)
    309  μs ± 9.5 μs
  insert present:                 OK (2.05s)
    249  μs ±  21 μs
  insert alternate:               OK (2.09s)
    253  μs ±  21 μs
  insertWith absent:              OK (2.61s)
    313  μs ±  16 μs
  insertWith present:             OK (2.16s)
    260  μs ±  22 μs

New
  insert absent:                  OK (2.90s)
    351  μs ±  14 μs
  insert present:                 OK (3.20s)
    194  μs ±  16 μs
  insert alternate:               OK (3.18s)
    192  μs ±  11 μs
  insertWith absent:              OK (1.43s)
    345  μs ±  31 μs
  insertWith present:             OK (1.66s)
    200  μs ±  14 μs

So inserting when the key is missing (common) or a new item at the same key (also common), are faster, while only literally inserting the same identical item (i.e., a no-op, probably less common) is slower.

I've redone the benchmarks, adding some extra calls to rnf in the suite. It looks like insert is about 22% faster if the key is matched and 14% slower if it's not.

I've also done the same trick for insertWith, where it's closer to 25% faster and 10% slower respectively.

For insertR, I've instead manually built the continuation, and just return the top level map if the key is found. I'm not sure if this helps much at this stage; union uses it, but it doesn't lean on it too hard.

The current logic uses unsafe pointer equality to determine if a rebalance is required, but this case is niche, and we can instead pass this information upwards. This performs well in benchmarks.

The pointer equality is used to check if we can not do anything when we retraverse back up the chain, but, if we turn it around and use an explicit continuation instead, we can just choose to not run it.

treeowl · 2024-01-12T02:24:26Z

I'm surprised you're getting good results with explicit continuations. Do you know how GHC is transforming those?

HuwCampbell · 2024-01-12T03:31:09Z

Yeah I couldn't really see any differences in the union benchmarks (maybe one was marginally faster?). From trying an explicit continuation on the other inserts, I expect that it is slower when there is a value inserted.

I haven't looked at the core yet.

I also considered just throwing a sentinel exception if EQ is found, then catching and returning the top map. It would probably be fast, but exceptions for control flow are a bit gross.

(Actually, delimited continuations are in GHC now, could use those).

HuwCampbell · 2024-01-12T03:34:20Z

containers-tests/benchmarks/Map.hs

@@ -84,6 +86,9 @@ main = do
        , bench "mapMaybeWithKey" $ whnf (M.mapMaybeWithKey (const maybeDel)) m
        , bench "lookupIndex" $ whnf (lookupIndex keys) m
        , bench "union" $ whnf (M.union m_even) m_odd
+        , bench "union_identical" $ whnf (M.union m_even) m_even
+        , bench "union_sparse" $ whnf (M.union m_even) m_sparse
+        , bench "union_into_sparse" $ whnf (M.union m_sparse) m_even


Two more with m_sparse and m_odd would show if there's a slowdown in insertR.

treeowl · 2024-01-12T03:50:52Z

I would expect exceptions to be slow. I think I might have tried? I don't remember for sure. If you try, use unsafeDupablePerformIO to catch the exception. Also consider using raise# and catch# manually, to allow an exception type other than SomeException (this may require reallyUnsafePtrEquality# to determine whether you got the not-present exception).

…

On Thu, Jan 11, 2024, 10:31 PM Huw Campbell ***@***.***> wrote: Yeah I couldn't really see any differences in the union benchmarks (maybe one was marginally faster?). From trying an explicit continuation on the other inserts, I expect that it is slower when there is a value inserted. I haven't looked at the core yet. I also considered just throwing a sentinel exception if EQ is found, then catching and returning the top map. It would probably be fast, but exceptions for control flow are a bit gross. — Reply to this email directly, view it on GitHub <#980 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOOF7I45DIO6Q6DACGP4PLYOCVARAVCNFSM6AAAAABBWC3ULWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGM3TQNRTGA> . You are receiving this because you commented.Message ID: ***@***.***>

HuwCampbell · 2024-01-13T06:41:57Z

I'm not as concerned with insertR as insert and insertWith.

I guess the question is do we want a 10-15% sacrifice on insertion when the key in not already present but a 20-30% improvement when it is?

How would that decision be made?

meooow25 · 2024-10-06T11:35:24Z

I'm not convinced that the (unsafe) optimisations for identical pointers is worth the complexity, and it hurts performance for inserts of new elements and the replacement of elements compared to this version.

So inserting when the key is missing (common) or a new item at the same key (also common), are faster, while only literally inserting the same identical item (i.e., a no-op, probably less common) is slower.

This sounds reasonable. The current code seems optimized for a very rare case. We have a comment acknowledging the same:

containers/containers/src/Data/Map/Internal.hs

Lines 785 to 788 in 41783ed

    
           -- Unlike insertR, we only get sharing here 
        
           -- when the inserted value is at the same address 
        
           -- as the present value. We try anyway; this condition 
        
           -- seems particularly likely to occur in 'union'.

But why is it likely in union? It would happen when unioning two maps with a large overlap in shared entries. That again seems like a rare situation.

There are a few variants of insert we can try:

A. Pointer equality (current)
B. Return a bool which controls rebalancing (this PR)
C. No tricks
D. ~~Both A and B~~ (let's not go there)

Curiously, strict map uses C instead of A:

containers/containers/src/Data/Map/Strict/Internal.hs

Lines 503 to 506 in 41783ed

    
           case compare kx ky of 
        
               LT -> balanceL ky y (go kx x l) r 
        
               GT -> balanceR ky y l (go kx x r) 
        
               EQ -> Bin sz kx x l r

And there are 3 situations that can occur:

It's a new key. C is the best here, A and B waste work.
It's an existing key but the value is new. B is the best here. How likely is this? Perhaps not too unlikely, shared key objects might be around.
It's an existing key and existing value. C is the best here, and B should also help. This seems rather unlikely.

So, there's a bunch of benchmarking involved, if you want to figure this out.
For union benchmarks you can use the set-operations-map, it covers various sets of inputs.
Related to the history of the current code: #315, #416. Though I don't see any new information that is not already documented.
Also, let's set aside insertR for now. That one doesn't seem too problematic.

HuwCampbell added 2 commits January 11, 2024 18:45

Simplify how calculation of rebalance is required.

49a0871

The current logic uses unsafe pointer equality to determine if a rebalance is required, but this case is niche, and we can instead pass this information upwards. This performs well in benchmarks.

New benchmark

e4b1ae7

HuwCampbell changed the title ~~Perf/lazy insert ptr removal~~ Alternative optimisations for insertion, Jan 11, 2024

HuwCampbell changed the title ~~Alternative optimisations for insertion,~~ Alternative optimisations for insertion Jan 11, 2024

HuwCampbell added 2 commits January 12, 2024 10:00

Use explicit continuation for insertR instead of pointer equality.

d951885

The pointer equality is used to check if we can not do anything when we retraverse back up the chain, but, if we turn it around and use an explicit continuation instead, we can just choose to not run it.

Use same trick for insertWith

2d6ca6b

HuwCampbell commented Jan 12, 2024

View reviewed changes

meooow25 mentioned this pull request Oct 12, 2024

Make Set/Map balance calls faster #1053

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative optimisations for insertion #980

Alternative optimisations for insertion #980

HuwCampbell commented Jan 11, 2024 •

edited

Loading

treeowl commented Jan 12, 2024

HuwCampbell commented Jan 12, 2024 •

edited

Loading

HuwCampbell Jan 12, 2024

treeowl commented Jan 12, 2024 via email

HuwCampbell commented Jan 13, 2024

meooow25 commented Oct 6, 2024 •

edited

Loading

Alternative optimisations for insertion #980

Are you sure you want to change the base?

Alternative optimisations for insertion #980

Conversation

HuwCampbell commented Jan 11, 2024 • edited Loading

treeowl commented Jan 12, 2024

HuwCampbell commented Jan 12, 2024 • edited Loading

HuwCampbell Jan 12, 2024

Choose a reason for hiding this comment

treeowl commented Jan 12, 2024 via email

HuwCampbell commented Jan 13, 2024

meooow25 commented Oct 6, 2024 • edited Loading

HuwCampbell commented Jan 11, 2024 •

edited

Loading

HuwCampbell commented Jan 12, 2024 •

edited

Loading

meooow25 commented Oct 6, 2024 •

edited

Loading