Speed up fromList for IntMap #653

treeowl · 2019-07-03T01:40:33Z

Make fromList and fromListWithKey for IntMap smarter. Rather
than rebuilding the path from the root for each element, insert
as many elements as possible into each subtree before backing out.

Make `fromList` and `fromListWithKey` for `IntMap` smarter. Rather than rebuilding the path from the root for each element, insert as many elements as possible into each subtree before backing out.

treeowl · 2019-07-03T01:45:07Z

@jwaldmann, your review/benchmarking would be greatly appreciated.

containers/src/Data/IntMap/Internal.hs

Make `fromList` for `IntSet` better for partially sorted input. Performance seems to be similar to the old implementation for random input, but nearly as fast as `fromDistinctAscList` for sorted or reverse sorted input. There are pathological cases where the new implementation is significantly but not horribly slower than the old. In particular, I noticed that ```haskell iterate (\n -> (n ^ 2) `rem` (2^32-1)) 20 ``` is pretty bad for the new implementation for some reason.

treeowl · 2019-07-03T22:32:44Z

Interestingly, it seems that the new fromList implementation is faster than fromAscList. Surprisingly (to me), it's sometimes much faster. So if we replace fromList, we should probably define fromAscList = fromList unless and until someone finds a faster implementation of fromAscList.

treeowl · 2019-07-04T02:53:03Z

Hmm..... I have the feeling that part of the trouble with fromAscList is that fromDistinctAscList is (probably) less efficient than it can be. It uses an explicit stack in a rather confusing fashion. If we could make it use the GHC reduction stack instead, that might cut allocation and speed things up while also making the code less hard to read. But it's so convoluted that I can't really work out the control flow....

jwaldmann · 2019-07-05T14:22:20Z

Hi. I was reading your code and did some experiments.

Code: the description should make it more clear that insertSome/Many inserts elements as long as they fit, and stops when encountering the first element that does not (and returns everything after that). I had to scan the source to confirm. The wording of your comment would also allow the interpretation that all of the list is scanned for elements that fit. Of course we should not do that. Or should we? My first reaction is "that can become quadratic" but I am not sure.

Experiment: since the algorithm does not have look-ahead, it can be fooled. In the following, I am using

easy 3 = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
hard 3 = [0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15]

This is a ghci session on Data.IntMap.Internal (using your version of fromList). Measurements are totally unscientific ...

easy e = [0 :: Int ..2^(e+1)-1]
m [] ys = ys ; m (x:xs) ys = x : m ys xs
hard e = m [0 :: Int ..2^e-1] [2^e .. 2^(e+1)-1]

:set +s

size $ fromList $ zip (easy 18) $ repeat ()  -- 7 seconds
size $ fromList $ zip (hard 18) $ repeat () -- 27 seconds

-- compare to naive fromList implementation:
fromList0 = Foldable.foldl' (\m (k,v) -> insert k v m) empty

size $ fromList0 $ zip (easy 18) $ repeat () -- 16 seconds
size $ fromList0 $ zip (hard 18) $ repeat () -- 17 seconds

-- compare to from*AscList
size $ fromAscList $ zip (easy 18) $ repeat () -- 6 sec
size $ fromDistinctAscList  $ zip (easy 18) $ repeat () -- 5 sec

Do you want to export insertAll? Of course insertAll t kvs = union t (fromList kvs) but it may be faster.

evens e = [0 :: Int , 2 .. 2^e-1]
odds e = [1 :: Int , 3 .. 2^e-1]

t = fromList $ zip (evens 19) $ repeat ()
size t -- just to evaluate it

size $ insertAll t $ zip (odds 19) $ repeat () -- 5 seconds
size $ union t $ fromList (zip (odds 19) $ repeat ()) -- 6 seconds

treeowl · 2019-07-06T01:18:28Z

Thanks, @jwaldmann! If your results are for a -O2-compiled Data.IntMap.Internal, then the bad-case slowdown from 17 seconds to 27 seconds seems a bit unfortunate, though probably tolerable. Do you or @int-e have any ideas for cutting that down? I have to wonder if unproductive insertMany calls are to blame. In that case, we pattern match on the list, then throw away the result of doing so only to match on it again immediately. Perhaps there's some way to avoid that (maybe with unboxed sums)? It's also a bit sketchy to call these insertion functions on trees we just created.

treeowl · 2019-07-06T01:20:54Z

Oh, and yes, I would prefer to export insertAll, but that will require some mailing list discussion to settle on a name.

jwaldmann · 2019-07-06T15:46:37Z

-O2 compiled

they were for ghci, now I compiled them (https://github.com/jwaldmann/containers/tree/intmap-fromList) and get these numbers. The difference (between contiguous and interleaved) is smaller than with ghci (where interleaved is a more descriptive name for [0,2^e,1,2^e+1,..])
For (pseudo) random data, times are quite high - but don't differ from previous implementation.

stack bench --resolver=lts-13.27  containers-tests:intmap-benchmarks --ba "-m pattern fromList"
containers-tests> benchmarks
Running 1 benchmarks...
Benchmark intmap-benchmarks: RUNNING...
benchmarked pointwise/fromList/contiguous
time                 6.568 ms   (6.474 ms .. 6.694 ms)
                     0.997 R²   (0.996 R² .. 0.999 R²)
mean                 6.388 ms   (6.280 ms .. 6.469 ms)
std dev              300.2 μs   (217.6 μs .. 439.2 μs)
variance introduced by outliers: 21% (moderately inflated)

benchmarked pointwise/fromList/interleaved
time                 21.05 ms   (20.77 ms .. 21.32 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 20.28 ms   (19.48 ms .. 20.56 ms)
std dev              986.3 μs   (459.2 μs .. 1.921 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarked pointwise/fromList/sparse
time                 7.084 ms   (6.953 ms .. 7.241 ms)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 6.603 ms   (6.466 ms .. 6.705 ms)
std dev              352.3 μs   (257.0 μs .. 522.1 μs)
variance introduced by outliers: 25% (moderately inflated)

benchmarked pointwise/fromList/random
time                 34.74 ms   (34.21 ms .. 35.22 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 34.33 ms   (33.80 ms .. 34.79 ms)
std dev              1.024 ms   (663.2 μs .. 1.433 ms)

benchmarked pointwise/fromList_via_foldl_insert/contiguous
time                 8.886 ms   (8.758 ms .. 9.022 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 8.643 ms   (8.529 ms .. 8.733 ms)
std dev              282.4 μs   (207.7 μs .. 389.0 μs)
variance introduced by outliers: 13% (moderately inflated)

benchmarked pointwise/fromList_via_foldl_insert/interleaved
time                 18.84 ms   (18.36 ms .. 19.39 ms)
                     0.997 R²   (0.996 R² .. 0.999 R²)
mean                 18.46 ms   (18.15 ms .. 18.72 ms)
std dev              670.2 μs   (488.1 μs .. 972.3 μs)

benchmarked pointwise/fromList_via_foldl_insert/sparse
time                 9.198 ms   (9.075 ms .. 9.353 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 9.097 ms   (8.987 ms .. 9.184 ms)
std dev              277.5 μs   (200.0 μs .. 376.3 μs)
variance introduced by outliers: 10% (moderately inflated)

benchmarked pointwise/fromList_via_foldl_insert/random
time                 33.30 ms   (32.80 ms .. 33.76 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 33.55 ms   (33.27 ms .. 33.83 ms)
std dev              605.4 μs   (466.4 μs .. 791.3 μs)

Benchmark intmap-benchmarks: FINISH

treeowl · 2019-07-06T16:37:40Z

Could you do me a favor and also run some tests with longer lists? I saw much bigger differences for contiguous data in my own testing, where I was using millions of elements.

jwaldmann · 2019-07-06T18:21:23Z

one millon:

benchmarked 2^20/pointwise/contiguous/fromList                                                          
time                 34.67 ms   (33.36 ms .. 35.76 ms)                                                  
                     0.996 R²   (0.992 R² .. 0.999 R²)                                                  
mean                 33.09 ms   (31.78 ms .. 34.01 ms)                                                  
std dev              2.167 ms   (1.404 ms .. 2.926 ms)                                                  
variance introduced by outliers: 20% (moderately inflated)                                              
 
benchmarking 2^20/pointwise/contiguous/fromList_via_foldl_insert ... took 9.431 s, total 56 iterations
benchmarked 2^20/pointwise/contiguous/fromList_via_foldl_insert
time                 148.2 ms   (140.6 ms .. 154.7 ms)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 162.6 ms   (157.9 ms .. 167.7 ms)
std dev              8.422 ms   (5.956 ms .. 13.63 ms)

benchmarking 2^20/pointwise/interleaved/fromList ... took 12.49 s, total 56 iterations
benchmarked 2^20/pointwise/interleaved/fromList
time                 211.5 ms   (209.6 ms .. 213.4 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 214.7 ms   (213.7 ms .. 215.7 ms)
std dev              1.685 ms   (1.233 ms .. 2.245 ms)

benchmarking 2^20/pointwise/interleaved/fromList_via_foldl_insert ... took 9.429 s, total 56 iterations
benchmarked 2^20/pointwise/interleaved/fromList_via_foldl_insert
time                 148.0 ms   (136.0 ms .. 155.3 ms)
                     0.994 R²   (0.985 R² .. 0.999 R²)
mean                 165.5 ms   (158.6 ms .. 173.5 ms)
std dev              12.84 ms   (9.967 ms .. 18.18 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarked 2^20/pointwise/sparse/fromList
time                 30.37 ms   (28.97 ms .. 31.55 ms)
                     0.995 R²   (0.991 R² .. 0.998 R²)
mean                 29.69 ms   (28.51 ms .. 30.51 ms)
std dev              2.073 ms   (1.329 ms .. 2.924 ms)
variance introduced by outliers: 25% (moderately inflated)

benchmarking 2^20/pointwise/sparse/fromList_via_foldl_insert ... took 10.16 s, total 56 iterations
benchmarked 2^20/pointwise/sparse/fromList_via_foldl_insert
time                 171.2 ms   (169.8 ms .. 172.9 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 174.0 ms   (173.1 ms .. 175.4 ms)
std dev              1.874 ms   (1.107 ms .. 2.941 ms)

benchmarking 2^20/pointwise/random/fromList ... took 52.66 s, total 56 iterations
benchmarked 2^20/pointwise/random/fromList
time                 972.0 ms   (961.2 ms .. 986.7 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 918.6 ms   (886.5 ms .. 941.2 ms)
std dev              45.42 ms   (24.70 ms .. 63.02 ms)

benchmarking 2^20/pointwise/random/fromList_via_foldl_insert ... took 51.92 s, total 56 iterations
benchmarked 2^20/pointwise/random/fromList_via_foldl_insert
time                 980.2 ms   (948.5 ms .. 1.005 s)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 897.7 ms   (859.0 ms .. 925.6 ms)
std dev              53.87 ms   (37.18 ms .. 68.48 ms)
variance introduced by outliers: 18% (moderately inflated)

jwaldmann · 2019-07-06T18:47:00Z

30 million:

benchmarking 2^25/pointwise/contiguous/fromList ... took 245.4 s, total 56 iterations
benchmarked 2^25/pointwise/contiguous/fromList
time                 4.108 s    (3.742 s .. 4.314 s)
                     0.992 R²   (0.978 R² .. 0.999 R²)
mean                 3.903 s    (3.583 s .. 4.124 s)
std dev              462.4 ms   (245.2 ms .. 738.7 ms)
variance introduced by outliers: 38% (moderately inflated)

benchmarking 2^25/pointwise/contiguous/fromList_via_foldl_insert ... took 477.9 s, total 56 iterations
benchmarked 2^25/pointwise/contiguous/fromList_via_foldl_insert
time                 8.201 s    (8.084 s .. 8.314 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 8.336 s    (8.229 s .. 8.582 s)
std dev              250.5 ms   (104.9 ms .. 427.0 ms)

benchmarking 2^25/pointwise/interleaved/fromList ... took 540.3 s, total 56 iterations
benchmarked 2^25/pointwise/interleaved/fromList
time                 9.415 s    (9.263 s .. 9.603 s)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 9.467 s    (9.364 s .. 9.731 s)
std dev              256.9 ms   (96.10 ms .. 435.1 ms)

benchmarking 2^25/pointwise/interleaved/fromList_via_foldl_insert ... took 501.8 s, total 56 iterations
benchmarked 2^25/pointwise/interleaved/fromList_via_foldl_insert
time                 8.928 s    (8.745 s .. 9.141 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 8.585 s    (8.200 s .. 8.792 s)
std dev              436.2 ms   (234.8 ms .. 657.1 ms)

benchmarking 2^25/pointwise/sparse/fromList ... took 275.4 s, total 56 iterations
benchmarked 2^25/pointwise/sparse/fromList
time                 4.879 s    (4.526 s .. 5.229 s)
                     0.995 R²   (0.989 R² .. 0.999 R²)
mean                 4.365 s    (3.827 s .. 4.666 s)
std dev              651.5 ms   (326.7 ms .. 969.3 ms)
variance introduced by outliers: 48% (moderately inflated)

benchmarking 2^25/pointwise/sparse/fromList_via_foldl_insert ... took 486.0 s, total 56 iterations
benchmarked 2^25/pointwise/sparse/fromList_via_foldl_insert
time                 8.918 s    (8.732 s .. 9.150 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 8.227 s    (7.912 s .. 8.492 s)
std dev              488.9 ms   (332.2 ms .. 682.6 ms)
variance introduced by outliers: 18% (moderately inflated)

benchmarking 2^25/pointwise/random/fromList ... took 4490 s, total 56 iterations
benchmarked 2^25/pointwise/random/fromList
time                 81.35 s    (80.35 s .. 82.21 s)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 80.59 s    (79.77 s .. 81.26 s)
std dev              1.309 s    (927.2 ms .. 1.838 s)

benchmarking 2^25/pointwise/random/fromList_via_foldl_insert ... took 4491 s, total 56 iterations
benchmarked 2^25/pointwise/random/fromList_via_foldl_insert
time                 81.46 s    (80.88 s .. 82.66 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 80.64 s    (79.80 s .. 81.12 s)
std dev              1.050 s    (533.6 ms .. 1.686 s)

treeowl · 2019-07-06T18:54:52Z

It really looks like the effects we're seeing are real. The new fromList is much better when the keys are contiguous, strictly increasing, or presumably clustered in general. It's about the same for random keys, and it's significantly but not horribly worse in the interleaved case. I'm convinced it's good enough to use. But I'd be a bit happier if we could reduce the regression in the interleaved case. Any ideas for doing so? fromDistinctAscList passes a bit of extra information around to avoid extra pattern matching on the subtree; can we do something similar? Might it be worth passing something extra back in Inserted?

3noch · 2019-07-06T20:12:24Z

Perhaps it makes sense to expose the old version as fromListInterleaved or similar.

treeowl · 2019-07-06T23:03:52Z

@3noch I'm not generally a big fan of cluttering the API with one or two line functions, especially when anyone likely to need them will almost certainly know how to write them. fromListNaive is nothing more than foldl' (flip (uncurry insert)) empty, and fromListNaiveWithKey isn't much more.

3noch · 2019-07-07T01:42:29Z

Perhaps it makes sense then to simply describe where fromList is less ideal performance wise and show the better performing version in the docs for that use case.

treeowl · 2019-07-07T01:48:45Z

@3noch, yes, definitely. I'm just still hoping there's a way to cut down on the regression....

int-e · 2019-07-07T23:41:26Z

Somewhat surprisingly it looks like we can indeed speed things up slightly by avoiding the repeated destruction of the same list, based around the following return type:

data Inserted' a
   = InsertedNil  !(IntMap a)
   | InsertedCons !(IntMap a) !Key a ![(Key,a)]

See https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-hs-L102-L163 for code and https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-out for some benchmarks. (Functions: fromList is the current implementation in Data.IntMap; fromList1 is the version from this pull request; fromList2 is using the modified Inserted' type, and fromList3 did some manual inlining that didn't pay off at all.)

But for the most part I believe the slowdown for that adverserial alternating list case is the price one has to pay for checking, on each level of the tree, whether the new key fits or not, where the naive version simply starts at the root again each time.

treeowl · 2019-07-07T23:47:13Z

That is somewhat surprising. I wonder what the Core looks like. I'd have thought the best bet for that sort of thing would've been an unboxed sum as a result, but maybe GHC is doing something clever.... Did you experiment with passing along a key in the map and a mask the way you do for fromAscList?

treeowl · 2019-07-08T00:03:02Z

But for the most part I believe the slowdown for that adverserial alternating list case is the price one has to pay for checking, on each level of the tree, whether the new key fits or not, where the naive version simply starts at the root again each time.

Yes, there is certainly some inherent arithmetic cost to that. All we can hope to do is minimize additional costs from allocation and pattern matching.

int-e · 2019-07-08T00:03:42Z

Did you experiment with passing along a key in the map and a mask the way you do for fromAscList?

No I didn't try that... it's not directly applicable. The point with fromAscList is that we know that keys will never be inserted into subtrees we've already built, so we can summarize the current state into a current prefix (expressed by the combination of a key and a mask). Here, we're always potentially inserting into an existing tree, so I believe it's unavoidable to inspect the actual tree we built and there we will find the prefix (expressed as prefix and mask) anyway.

P.S. the term "prefix" is hopelessly overloaded in this context... a) on a high level, a prefix is just a finite bit string that may be shorter than a word. b) in IntMap, it's also used for a key with the lower bits masked. c) in IntSet there's a split into a "prefix" and a "bitmask"...

treeowl · 2019-07-08T00:09:34Z

I'm not so sure. In the bad case, we call insertMany on a tree that doesn't fit the next key. So if we keep enough information around to determine that without inspecting the tree, then maybe we can win something.

int-e · 2019-07-08T00:09:48Z

That is somewhat surprising. I wonder what the Core looks like. I'd have thought the best bet for that sort of thing would've been an unboxed sum as a result, but maybe GHC is doing something clever....

IIRC GHC's "enter" operation on the STG level actually takes several continuations for multi-constructor datatypes. So the Inserted' values should never be allocated. But I'm not sure, and I was not at all convinced that this idea would pay off when I started to implement it.

treeowl · 2019-07-08T00:23:31Z

insertMany' Nil k x kxs' = InsertedNil Nil doesn't look so great, because it makes insertMany' lazy in the key and list. Since we don't reach that anyway, it might as well force those things.

…

On Sun, Jul 7, 2019, 7:41 PM Bertram Felgenhauer ***@***.***> wrote: Somewhat surprisingly it looks like we can indeed speed things up slightly by avoiding the repeated destruction of the same list, based around the following return type: data Inserted' a = InsertedNil !(IntMap a) | InsertedCons !(IntMap a) !Key a ![(Key,a)] See https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-hs-L102-L163 for code and https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-out for some benchmarks. But for the most part I believe the slowdown for that adverserial alternating list case is the price one has to pay for checking, on each level of the tree, whether the new key fits or not, where the naive version simply starts at the root again each time. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#653?email_source=notifications&email_token=AAOOF7JT3564HU25PV7FVJ3P6J5KPA5CNFSM4H5BDIM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZLVJAA#issuecomment-509039744>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOOF7OLXSXZJOZI4QV7TMTP6J5KPANCNFSM4H5BDIMQ> .

int-e · 2019-07-08T00:40:06Z

insertMany' Nil k x kxs' = InsertedNil Nil doesn't look so great, because it makes insertMany' lazy in the key and list. Since we don't reach that anyway, it might as well force those things.

I changed that, but it had no effect on the running time.

treeowl · 2019-07-08T00:43:23Z

Probably got inlined away.... Did you check the CSE between link and branchMask calls? If I'm not mistaken, link makes essentially the same branchMask call you do, but with the arguments flipped.

…

On Sun, Jul 7, 2019, 8:40 PM Bertram Felgenhauer ***@***.***> wrote: insertMany' Nil k x kxs' = InsertedNil Nil doesn't look so great, because it makes insertMany' lazy in the key and list. Since we don't reach that anyway, it might as well force those things. I changed that, but it had no effect on the running time. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#653?email_source=notifications&email_token=AAOOF7IAX3MVZCWFH2J2J5TP6KEGNA5CNFSM4H5BDIM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZLWLBY#issuecomment-509044103>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOOF7OKYZQ6NVWA7BIBSDDP6KEGNANCNFSM4H5BDIMQ> .

int-e · 2019-07-08T01:27:20Z

Somewhat surprisingly it looks like we can indeed speed things up slightly by avoiding the repeated destruction of the same list, based around the following return type:
data Inserted' a
   = InsertedNil  !(IntMap a)
   | InsertedCons !(IntMap a) !Key a ![(Key,a)]

Btw, we should also consider that the resulting speedup is miniscule and the code becomes more difficult to read as a result.

int-e · 2019-07-08T01:49:41Z

Probably got inlined away.... Did you check the CSE between link and branchMask calls? If I'm not mistaken, link makes essentially the same branchMask call you do, but with the arguments flipped.

No I had not checked that... and no, it did not do the CSE, regardless of the order of arguments to branchMask. I've now added a variant of link that takes a precomputed mask (to avoid inlining link manually), and the result does seem to improve performance a bit for short lists without hurting long ones. But there's hardly any signal here... mostly noise.

linkWithMask :: Mask -> Prefix -> IntMap a -> Prefix -> IntMap a -> IntMap a
linkWithMask m p1 t1 p2 t2
  | zero p1 m = Bin p m t1 t2
  | otherwise = Bin p m t2 t1
  where
    p = mask p1 m
{-# INLINE linkWithMask #-}

[1,1]/fromAscList1a                      mean 19.91 ns  ( +- 14.99 ps  )
[1,1]/fromAscList1a'                     mean 19.38 ns  ( +- 199.5 ps  )
[10,1]/fromAscList1a                     mean 145.6 ns  ( +- 1.948 ns  )
[10,1]/fromAscList1a'                    mean 140.1 ns  ( +- 3.528 ns  )
[-10,51791]/fromAscList1a                mean 160.2 ns  ( +- 2.175 ns  )
[-10,51791]/fromAscList1a'               mean 149.8 ns  ( +- 3.113 ns  )
[100,1]/fromAscList1a                    mean 1.549 μs  ( +- 28.80 ns  )
[100,1]/fromAscList1a'                   mean 1.444 μs  ( +- 23.56 ns  )
[-100,51791]/fromAscList1a               mean 1.614 μs  ( +- 48.31 ns  )
[-100,51791]/fromAscList1a'              mean 1.454 μs  ( +- 46.67 ns  )
[1000,1]/fromAscList1a                   mean 16.24 μs  ( +- 183.1 ns  )
[1000,1]/fromAscList1a'                  mean 15.57 μs  ( +- 167.7 ns  )
[-1000,51791]/fromAscList1a              mean 17.01 μs  ( +- 492.8 ns  )
[-1000,51791]/fromAscList1a'             mean 16.07 μs  ( +- 466.1 ns  )
[10000,1]/fromAscList1a                  mean 232.7 μs  ( +- 2.194 μs  )
[10000,1]/fromAscList1a'                 mean 232.3 μs  ( +- 2.882 μs  )
[-10000,51791]/fromAscList1a             mean 250.8 μs  ( +- 4.353 μs  )
[-10000,51791]/fromAscList1a'            mean 246.1 μs  ( +- 5.896 μs  )
[100000,1]/fromAscList1a                 mean 9.101 ms  ( +- 227.0 μs  )
[100000,1]/fromAscList1a'                mean 9.097 ms  ( +- 240.5 μs  )
[-100000,51791]/fromAscList1a            mean 9.188 ms  ( +- 339.0 μs  )
[-100000,51791]/fromAscList1a'           mean 9.189 ms  ( +- 412.6 μs  )
[1000000,1]/fromAscList1a                mean 94.98 ms  ( +- 12.07 ms  )
[1000000,1]/fromAscList1a'               mean 96.65 ms  ( +- 12.22 ms  )
[-1000000,51791]/fromAscList1a           mean 97.83 ms  ( +- 10.69 ms  )
[-1000000,51791]/fromAscList1a'          mean 97.62 ms  ( +- 10.81 ms  )

int-e · 2019-07-08T01:52:57Z

Oops, I believe we got #653 and #658 mixed up in the last 3 comments!

sjakobi · 2020-07-17T16:48:48Z

I'm currently doing a bit of triage on the open PRs.

Could someone please summarize the current state of this PR, and possibly also sketch out what needs to be done until it can be merged?

int-e · 2020-07-18T11:32:39Z

A lot going on here. I'll attempt a summary.

the goal is to speed up fromList (borrowing ideas from fromAscList, which builds the whole map in a bottom-up fashion without deconstructing partial maps) and in particular try to make it no worse than repeated insertions (IntSet.fromList slower than repeated IntSet.insert #288)
we lack adequate benchmarks that test fromList under a variety of inputs with different characteristics
I have some improvements to the PR code in a gist that could be incorporated, and made some effort to benchmark against various types of lists
I believe this is stalled mainly because the code is still worse than before in some cases, and also because this development took place in parallel to improving fromAscList (Improve fromAscList and friends. #658) which was far more successful

sjakobi · 2020-07-18T11:54:16Z

Thanks, @int-e! :)

Could the benchmarks be merged separately, so they are not lost or bit-rotted, and so they could potentially be used for other attempts at speeding up fromList?

Also, how about incorporating your code improvements into this PR? I guess the easiest way to do that would be if you'd make a PR against @treeowl's branch.

Of course it would also be nice to make progress on the original problem, but since that is currently stalled, let's try to ensure that the value created so far is saved.

int-e · 2020-08-06T15:19:18Z

I'm not going to work on this. The benchmarks are in a very ad-hoc state. Ideally, the benchmarks would use realistic inputs for fromList and I don't really know what those are.

As for my improvements, after looking at the benchmark data again, there's either no speedup at all or it's so small that it looks like noise. While my changes were relatively minor tweaks and refactorings, they make the code more complicated too, so they're almost certainly not worthwhile. The real question is... is there a better approach? But that requires some serious meditation (and time).

sjakobi · 2020-08-14T12:57:43Z

Thanks, @int-e! :)

Let's shelve this until we have better benchmarks then. (See #657)

Speed up fromList for IntMap

636b8e8

Make `fromList` and `fromListWithKey` for `IntMap` smarter. Rather than rebuilding the path from the root for each element, insert as many elements as possible into each subtree before backing out.

treeowl mentioned this pull request Jul 3, 2019

IntSet.fromList and its benchmarks #652

Open

treeowl commented Jul 3, 2019

View reviewed changes

containers/src/Data/IntMap/Internal.hs Show resolved Hide resolved

treeowl added 2 commits July 3, 2019 15:33

Clean up a bit

6ab738d

treeowl mentioned this pull request Jul 4, 2019

Can we improve fromDistinctAscList for IntSet and IntMap #654

Open

int-e mentioned this pull request Jul 7, 2019

Improve fromAscList and friends. #658

Merged

jwaldmann mentioned this pull request Jul 17, 2019

IntSet.fromList slower than repeated IntSet.insert #288

Open

gereeter mentioned this pull request Dec 30, 2019

Cleanup after IntMap rewrite #698

Open

26 tasks

sjakobi linked an issue Jul 15, 2020 that may be closed by this pull request

IntSet.fromList slower than repeated IntSet.insert #288

Open

sjakobi added IntMap IntSet performance labels Jul 15, 2020

sjakobi added the info-needed label Jul 17, 2020

sjakobi mentioned this pull request Aug 14, 2020

improve benchmarks for Data.IntMap #657

Open

sjakobi marked this pull request as draft August 14, 2020 12:57

sjakobi removed the info-needed label Aug 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up fromList for IntMap #653

Speed up fromList for IntMap #653

treeowl commented Jul 3, 2019

treeowl commented Jul 3, 2019

treeowl commented Jul 3, 2019

treeowl commented Jul 4, 2019

jwaldmann commented Jul 5, 2019 •

edited

Loading

treeowl commented Jul 6, 2019

treeowl commented Jul 6, 2019

jwaldmann commented Jul 6, 2019

treeowl commented Jul 6, 2019

jwaldmann commented Jul 6, 2019

jwaldmann commented Jul 6, 2019 •

edited

Loading

treeowl commented Jul 6, 2019

3noch commented Jul 6, 2019

treeowl commented Jul 6, 2019

3noch commented Jul 7, 2019

treeowl commented Jul 7, 2019

int-e commented Jul 7, 2019 •

edited

Loading

treeowl commented Jul 7, 2019

treeowl commented Jul 8, 2019

int-e commented Jul 8, 2019 •

edited

Loading

treeowl commented Jul 8, 2019

int-e commented Jul 8, 2019

treeowl commented Jul 8, 2019 via email

int-e commented Jul 8, 2019

treeowl commented Jul 8, 2019 via email

int-e commented Jul 8, 2019

int-e commented Jul 8, 2019

int-e commented Jul 8, 2019

sjakobi commented Jul 17, 2020

int-e commented Jul 18, 2020 •

edited

Loading

sjakobi commented Jul 18, 2020

int-e commented Aug 6, 2020

sjakobi commented Aug 14, 2020

Speed up fromList for IntMap #653

Are you sure you want to change the base?

Speed up fromList for IntMap #653

Conversation

treeowl commented Jul 3, 2019

treeowl commented Jul 3, 2019

treeowl commented Jul 3, 2019

treeowl commented Jul 4, 2019

jwaldmann commented Jul 5, 2019 • edited Loading

treeowl commented Jul 6, 2019

treeowl commented Jul 6, 2019

jwaldmann commented Jul 6, 2019

treeowl commented Jul 6, 2019

jwaldmann commented Jul 6, 2019

jwaldmann commented Jul 6, 2019 • edited Loading

treeowl commented Jul 6, 2019

3noch commented Jul 6, 2019

treeowl commented Jul 6, 2019

3noch commented Jul 7, 2019

treeowl commented Jul 7, 2019

int-e commented Jul 7, 2019 • edited Loading

treeowl commented Jul 7, 2019

treeowl commented Jul 8, 2019

int-e commented Jul 8, 2019 • edited Loading

treeowl commented Jul 8, 2019

int-e commented Jul 8, 2019

treeowl commented Jul 8, 2019 via email

int-e commented Jul 8, 2019

treeowl commented Jul 8, 2019 via email

int-e commented Jul 8, 2019

int-e commented Jul 8, 2019

int-e commented Jul 8, 2019

sjakobi commented Jul 17, 2020

int-e commented Jul 18, 2020 • edited Loading

sjakobi commented Jul 18, 2020

int-e commented Aug 6, 2020

sjakobi commented Aug 14, 2020

jwaldmann commented Jul 5, 2019 •

edited

Loading

jwaldmann commented Jul 6, 2019 •

edited

Loading

int-e commented Jul 7, 2019 •

edited

Loading

int-e commented Jul 8, 2019 •

edited

Loading

int-e commented Jul 18, 2020 •

edited

Loading