Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up fromList for IntMap #653

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

treeowl
Copy link
Contributor

@treeowl treeowl commented Jul 3, 2019

Make fromList and fromListWithKey for IntMap smarter. Rather
than rebuilding the path from the root for each element, insert
as many elements as possible into each subtree before backing out.

Make `fromList` and `fromListWithKey` for `IntMap` smarter. Rather
than rebuilding the path from the root for each element, insert
as many elements as possible into each subtree before backing out.
@treeowl
Copy link
Contributor Author

treeowl commented Jul 3, 2019

@jwaldmann, your review/benchmarking would be greatly appreciated.

Make `fromList` for `IntSet` better for partially sorted input.
Performance seems to be similar to the old implementation for
random input, but nearly as fast as `fromDistinctAscList` for
sorted or reverse sorted input. There are pathological cases where
the new implementation is significantly but not horribly slower than
the old. In particular, I noticed that

```haskell
iterate (\n -> (n ^ 2) `rem` (2^32-1)) 20
```

is pretty bad for the new implementation for some reason.
@treeowl
Copy link
Contributor Author

treeowl commented Jul 3, 2019

Interestingly, it seems that the new fromList implementation is faster than fromAscList. Surprisingly (to me), it's sometimes much faster. So if we replace fromList, we should probably define fromAscList = fromList unless and until someone finds a faster implementation of fromAscList.

@treeowl
Copy link
Contributor Author

treeowl commented Jul 4, 2019

Hmm..... I have the feeling that part of the trouble with fromAscList is that fromDistinctAscList is (probably) less efficient than it can be. It uses an explicit stack in a rather confusing fashion. If we could make it use the GHC reduction stack instead, that might cut allocation and speed things up while also making the code less hard to read. But it's so convoluted that I can't really work out the control flow....

@jwaldmann
Copy link
Contributor

jwaldmann commented Jul 5, 2019

Hi. I was reading your code and did some experiments.

Code: the description should make it more clear that insertSome/Many inserts elements as long as they fit, and stops when encountering the first element that does not (and returns everything after that). I had to scan the source to confirm. The wording of your comment would also allow the interpretation that all of the list is scanned for elements that fit. Of course we should not do that. Or should we? My first reaction is "that can become quadratic" but I am not sure.

Experiment: since the algorithm does not have look-ahead, it can be fooled. In the following, I am using

easy 3 = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
hard 3 = [0,8,1,9,2,10,3,11,4,12,5,13,6,14,7,15]

This is a ghci session on Data.IntMap.Internal (using your version of fromList). Measurements are totally unscientific ...

easy e = [0 :: Int ..2^(e+1)-1]
m [] ys = ys ; m (x:xs) ys = x : m ys xs
hard e = m [0 :: Int ..2^e-1] [2^e .. 2^(e+1)-1]

:set +s

size $ fromList $ zip (easy 18) $ repeat ()  -- 7 seconds
size $ fromList $ zip (hard 18) $ repeat () -- 27 seconds

-- compare to naive fromList implementation:
fromList0 = Foldable.foldl' (\m (k,v) -> insert k v m) empty

size $ fromList0 $ zip (easy 18) $ repeat () -- 16 seconds
size $ fromList0 $ zip (hard 18) $ repeat () -- 17 seconds

-- compare to from*AscList
size $ fromAscList $ zip (easy 18) $ repeat () -- 6 sec
size $ fromDistinctAscList  $ zip (easy 18) $ repeat () -- 5 sec

Do you want to export insertAll? Of course insertAll t kvs = union t (fromList kvs) but it may be faster.

evens e = [0 :: Int , 2 .. 2^e-1]
odds e = [1 :: Int , 3 .. 2^e-1]

t = fromList $ zip (evens 19) $ repeat ()
size t -- just to evaluate it

size $ insertAll t $ zip (odds 19) $ repeat () -- 5 seconds
size $ union t $ fromList (zip (odds 19) $ repeat ()) -- 6 seconds

@treeowl
Copy link
Contributor Author

treeowl commented Jul 6, 2019

Thanks, @jwaldmann! If your results are for a -O2-compiled Data.IntMap.Internal, then the bad-case slowdown from 17 seconds to 27 seconds seems a bit unfortunate, though probably tolerable. Do you or @int-e have any ideas for cutting that down? I have to wonder if unproductive insertMany calls are to blame. In that case, we pattern match on the list, then throw away the result of doing so only to match on it again immediately. Perhaps there's some way to avoid that (maybe with unboxed sums)? It's also a bit sketchy to call these insertion functions on trees we just created.

@treeowl
Copy link
Contributor Author

treeowl commented Jul 6, 2019

Oh, and yes, I would prefer to export insertAll, but that will require some mailing list discussion to settle on a name.

@jwaldmann
Copy link
Contributor

-O2 compiled

they were for ghci, now I compiled them (https://github.com/jwaldmann/containers/tree/intmap-fromList) and get these numbers. The difference (between contiguous and interleaved) is smaller than with ghci (where interleaved is a more descriptive name for [0,2^e,1,2^e+1,..])
For (pseudo) random data, times are quite high - but don't differ from previous implementation.

stack bench --resolver=lts-13.27  containers-tests:intmap-benchmarks --ba "-m pattern fromList"
containers-tests> benchmarks
Running 1 benchmarks...
Benchmark intmap-benchmarks: RUNNING...
benchmarked pointwise/fromList/contiguous
time                 6.568 ms   (6.474 ms .. 6.694 ms)
                     0.997 R²   (0.996 R² .. 0.999 R²)
mean                 6.388 ms   (6.280 ms .. 6.469 ms)
std dev              300.2 μs   (217.6 μs .. 439.2 μs)
variance introduced by outliers: 21% (moderately inflated)

benchmarked pointwise/fromList/interleaved
time                 21.05 ms   (20.77 ms .. 21.32 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 20.28 ms   (19.48 ms .. 20.56 ms)
std dev              986.3 μs   (459.2 μs .. 1.921 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarked pointwise/fromList/sparse
time                 7.084 ms   (6.953 ms .. 7.241 ms)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 6.603 ms   (6.466 ms .. 6.705 ms)
std dev              352.3 μs   (257.0 μs .. 522.1 μs)
variance introduced by outliers: 25% (moderately inflated)

benchmarked pointwise/fromList/random
time                 34.74 ms   (34.21 ms .. 35.22 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 34.33 ms   (33.80 ms .. 34.79 ms)
std dev              1.024 ms   (663.2 μs .. 1.433 ms)

benchmarked pointwise/fromList_via_foldl_insert/contiguous
time                 8.886 ms   (8.758 ms .. 9.022 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 8.643 ms   (8.529 ms .. 8.733 ms)
std dev              282.4 μs   (207.7 μs .. 389.0 μs)
variance introduced by outliers: 13% (moderately inflated)

benchmarked pointwise/fromList_via_foldl_insert/interleaved
time                 18.84 ms   (18.36 ms .. 19.39 ms)
                     0.997 R²   (0.996 R² .. 0.999 R²)
mean                 18.46 ms   (18.15 ms .. 18.72 ms)
std dev              670.2 μs   (488.1 μs .. 972.3 μs)

benchmarked pointwise/fromList_via_foldl_insert/sparse
time                 9.198 ms   (9.075 ms .. 9.353 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 9.097 ms   (8.987 ms .. 9.184 ms)
std dev              277.5 μs   (200.0 μs .. 376.3 μs)
variance introduced by outliers: 10% (moderately inflated)

benchmarked pointwise/fromList_via_foldl_insert/random
time                 33.30 ms   (32.80 ms .. 33.76 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 33.55 ms   (33.27 ms .. 33.83 ms)
std dev              605.4 μs   (466.4 μs .. 791.3 μs)

Benchmark intmap-benchmarks: FINISH


@treeowl
Copy link
Contributor Author

treeowl commented Jul 6, 2019

Could you do me a favor and also run some tests with longer lists? I saw much bigger differences for contiguous data in my own testing, where I was using millions of elements.

@jwaldmann
Copy link
Contributor

one millon:

benchmarked 2^20/pointwise/contiguous/fromList                                                          
time                 34.67 ms   (33.36 ms .. 35.76 ms)                                                  
                     0.996 R²   (0.992 R² .. 0.999 R²)                                                  
mean                 33.09 ms   (31.78 ms .. 34.01 ms)                                                  
std dev              2.167 ms   (1.404 ms .. 2.926 ms)                                                  
variance introduced by outliers: 20% (moderately inflated)                                              
 
benchmarking 2^20/pointwise/contiguous/fromList_via_foldl_insert ... took 9.431 s, total 56 iterations
benchmarked 2^20/pointwise/contiguous/fromList_via_foldl_insert
time                 148.2 ms   (140.6 ms .. 154.7 ms)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 162.6 ms   (157.9 ms .. 167.7 ms)
std dev              8.422 ms   (5.956 ms .. 13.63 ms)

benchmarking 2^20/pointwise/interleaved/fromList ... took 12.49 s, total 56 iterations
benchmarked 2^20/pointwise/interleaved/fromList
time                 211.5 ms   (209.6 ms .. 213.4 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 214.7 ms   (213.7 ms .. 215.7 ms)
std dev              1.685 ms   (1.233 ms .. 2.245 ms)

benchmarking 2^20/pointwise/interleaved/fromList_via_foldl_insert ... took 9.429 s, total 56 iterations
benchmarked 2^20/pointwise/interleaved/fromList_via_foldl_insert
time                 148.0 ms   (136.0 ms .. 155.3 ms)
                     0.994 R²   (0.985 R² .. 0.999 R²)
mean                 165.5 ms   (158.6 ms .. 173.5 ms)
std dev              12.84 ms   (9.967 ms .. 18.18 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarked 2^20/pointwise/sparse/fromList
time                 30.37 ms   (28.97 ms .. 31.55 ms)
                     0.995 R²   (0.991 R² .. 0.998 R²)
mean                 29.69 ms   (28.51 ms .. 30.51 ms)
std dev              2.073 ms   (1.329 ms .. 2.924 ms)
variance introduced by outliers: 25% (moderately inflated)

benchmarking 2^20/pointwise/sparse/fromList_via_foldl_insert ... took 10.16 s, total 56 iterations
benchmarked 2^20/pointwise/sparse/fromList_via_foldl_insert
time                 171.2 ms   (169.8 ms .. 172.9 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 174.0 ms   (173.1 ms .. 175.4 ms)
std dev              1.874 ms   (1.107 ms .. 2.941 ms)

benchmarking 2^20/pointwise/random/fromList ... took 52.66 s, total 56 iterations
benchmarked 2^20/pointwise/random/fromList
time                 972.0 ms   (961.2 ms .. 986.7 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 918.6 ms   (886.5 ms .. 941.2 ms)
std dev              45.42 ms   (24.70 ms .. 63.02 ms)

benchmarking 2^20/pointwise/random/fromList_via_foldl_insert ... took 51.92 s, total 56 iterations
benchmarked 2^20/pointwise/random/fromList_via_foldl_insert
time                 980.2 ms   (948.5 ms .. 1.005 s)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 897.7 ms   (859.0 ms .. 925.6 ms)
std dev              53.87 ms   (37.18 ms .. 68.48 ms)
variance introduced by outliers: 18% (moderately inflated)



@jwaldmann
Copy link
Contributor

jwaldmann commented Jul 6, 2019

30 million:

benchmarking 2^25/pointwise/contiguous/fromList ... took 245.4 s, total 56 iterations
benchmarked 2^25/pointwise/contiguous/fromList
time                 4.108 s    (3.742 s .. 4.314 s)
                     0.992 R²   (0.978 R² .. 0.999 R²)
mean                 3.903 s    (3.583 s .. 4.124 s)
std dev              462.4 ms   (245.2 ms .. 738.7 ms)
variance introduced by outliers: 38% (moderately inflated)

benchmarking 2^25/pointwise/contiguous/fromList_via_foldl_insert ... took 477.9 s, total 56 iterations
benchmarked 2^25/pointwise/contiguous/fromList_via_foldl_insert
time                 8.201 s    (8.084 s .. 8.314 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 8.336 s    (8.229 s .. 8.582 s)
std dev              250.5 ms   (104.9 ms .. 427.0 ms)

benchmarking 2^25/pointwise/interleaved/fromList ... took 540.3 s, total 56 iterations
benchmarked 2^25/pointwise/interleaved/fromList
time                 9.415 s    (9.263 s .. 9.603 s)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 9.467 s    (9.364 s .. 9.731 s)
std dev              256.9 ms   (96.10 ms .. 435.1 ms)

benchmarking 2^25/pointwise/interleaved/fromList_via_foldl_insert ... took 501.8 s, total 56 iterations
benchmarked 2^25/pointwise/interleaved/fromList_via_foldl_insert
time                 8.928 s    (8.745 s .. 9.141 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 8.585 s    (8.200 s .. 8.792 s)
std dev              436.2 ms   (234.8 ms .. 657.1 ms)

benchmarking 2^25/pointwise/sparse/fromList ... took 275.4 s, total 56 iterations
benchmarked 2^25/pointwise/sparse/fromList
time                 4.879 s    (4.526 s .. 5.229 s)
                     0.995 R²   (0.989 R² .. 0.999 R²)
mean                 4.365 s    (3.827 s .. 4.666 s)
std dev              651.5 ms   (326.7 ms .. 969.3 ms)
variance introduced by outliers: 48% (moderately inflated)

benchmarking 2^25/pointwise/sparse/fromList_via_foldl_insert ... took 486.0 s, total 56 iterations
benchmarked 2^25/pointwise/sparse/fromList_via_foldl_insert
time                 8.918 s    (8.732 s .. 9.150 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 8.227 s    (7.912 s .. 8.492 s)
std dev              488.9 ms   (332.2 ms .. 682.6 ms)
variance introduced by outliers: 18% (moderately inflated)

benchmarking 2^25/pointwise/random/fromList ... took 4490 s, total 56 iterations
benchmarked 2^25/pointwise/random/fromList
time                 81.35 s    (80.35 s .. 82.21 s)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 80.59 s    (79.77 s .. 81.26 s)
std dev              1.309 s    (927.2 ms .. 1.838 s)

benchmarking 2^25/pointwise/random/fromList_via_foldl_insert ... took 4491 s, total 56 iterations
benchmarked 2^25/pointwise/random/fromList_via_foldl_insert
time                 81.46 s    (80.88 s .. 82.66 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 80.64 s    (79.80 s .. 81.12 s)
std dev              1.050 s    (533.6 ms .. 1.686 s)

@treeowl
Copy link
Contributor Author

treeowl commented Jul 6, 2019

It really looks like the effects we're seeing are real. The new fromList is much better when the keys are contiguous, strictly increasing, or presumably clustered in general. It's about the same for random keys, and it's significantly but not horribly worse in the interleaved case. I'm convinced it's good enough to use. But I'd be a bit happier if we could reduce the regression in the interleaved case. Any ideas for doing so? fromDistinctAscList passes a bit of extra information around to avoid extra pattern matching on the subtree; can we do something similar? Might it be worth passing something extra back in Inserted?

@3noch
Copy link

3noch commented Jul 6, 2019

Perhaps it makes sense to expose the old version as fromListInterleaved or similar.

@treeowl
Copy link
Contributor Author

treeowl commented Jul 6, 2019

@3noch I'm not generally a big fan of cluttering the API with one or two line functions, especially when anyone likely to need them will almost certainly know how to write them. fromListNaive is nothing more than foldl' (flip (uncurry insert)) empty, and fromListNaiveWithKey isn't much more.

@3noch
Copy link

3noch commented Jul 7, 2019

Perhaps it makes sense then to simply describe where fromList is less ideal performance wise and show the better performing version in the docs for that use case.

@treeowl
Copy link
Contributor Author

treeowl commented Jul 7, 2019

@3noch, yes, definitely. I'm just still hoping there's a way to cut down on the regression....

@int-e
Copy link
Contributor

int-e commented Jul 7, 2019

Somewhat surprisingly it looks like we can indeed speed things up slightly by avoiding the repeated destruction of the same list, based around the following return type:

data Inserted' a
   = InsertedNil  !(IntMap a)
   | InsertedCons !(IntMap a) !Key a ![(Key,a)]

See https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-hs-L102-L163 for code and https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-out for some benchmarks. (Functions: fromList is the current implementation in Data.IntMap; fromList1 is the version from this pull request; fromList2 is using the modified Inserted' type, and fromList3 did some manual inlining that didn't pay off at all.)

But for the most part I believe the slowdown for that adverserial alternating list case is the price one has to pay for checking, on each level of the tree, whether the new key fits or not, where the naive version simply starts at the root again each time.

@treeowl
Copy link
Contributor Author

treeowl commented Jul 7, 2019

That is somewhat surprising. I wonder what the Core looks like. I'd have thought the best bet for that sort of thing would've been an unboxed sum as a result, but maybe GHC is doing something clever.... Did you experiment with passing along a key in the map and a mask the way you do for fromAscList?

@treeowl
Copy link
Contributor Author

treeowl commented Jul 8, 2019

But for the most part I believe the slowdown for that adverserial alternating list case is the price one has to pay for checking, on each level of the tree, whether the new key fits or not, where the naive version simply starts at the root again each time.

Yes, there is certainly some inherent arithmetic cost to that. All we can hope to do is minimize additional costs from allocation and pattern matching.

@int-e
Copy link
Contributor

int-e commented Jul 8, 2019

Did you experiment with passing along a key in the map and a mask the way you do for fromAscList?

No I didn't try that... it's not directly applicable. The point with fromAscList is that we know that keys will never be inserted into subtrees we've already built, so we can summarize the current state into a current prefix (expressed by the combination of a key and a mask). Here, we're always potentially inserting into an existing tree, so I believe it's unavoidable to inspect the actual tree we built and there we will find the prefix (expressed as prefix and mask) anyway.

P.S. the term "prefix" is hopelessly overloaded in this context... a) on a high level, a prefix is just a finite bit string that may be shorter than a word. b) in IntMap, it's also used for a key with the lower bits masked. c) in IntSet there's a split into a "prefix" and a "bitmask"...

@treeowl
Copy link
Contributor Author

treeowl commented Jul 8, 2019

I'm not so sure. In the bad case, we call insertMany on a tree that doesn't fit the next key. So if we keep enough information around to determine that without inspecting the tree, then maybe we can win something.

@int-e
Copy link
Contributor

int-e commented Jul 8, 2019

That is somewhat surprising. I wonder what the Core looks like. I'd have thought the best bet for that sort of thing would've been an unboxed sum as a result, but maybe GHC is doing something clever....

IIRC GHC's "enter" operation on the STG level actually takes several continuations for multi-constructor datatypes. So the Inserted' values should never be allocated. But I'm not sure, and I was not at all convinced that this idea would pay off when I started to implement it.

@treeowl
Copy link
Contributor Author

treeowl commented Jul 8, 2019 via email

@int-e
Copy link
Contributor

int-e commented Jul 8, 2019

insertMany' Nil k x kxs' = InsertedNil Nil doesn't look so great, because it makes insertMany' lazy in the key and list. Since we don't reach that anyway, it might as well force those things.

I changed that, but it had no effect on the running time.

@treeowl
Copy link
Contributor Author

treeowl commented Jul 8, 2019 via email

@int-e
Copy link
Contributor

int-e commented Jul 8, 2019

Somewhat surprisingly it looks like we can indeed speed things up slightly by avoiding the repeated destruction of the same list, based around the following return type:

data Inserted' a
   = InsertedNil  !(IntMap a)
   | InsertedCons !(IntMap a) !Key a ![(Key,a)]

Btw, we should also consider that the resulting speedup is miniscule and the code becomes more difficult to read as a result.

@int-e
Copy link
Contributor

int-e commented Jul 8, 2019

Probably got inlined away.... Did you check the CSE between link and branchMask calls? If I'm not mistaken, link makes essentially the same branchMask call you do, but with the arguments flipped.

No I had not checked that... and no, it did not do the CSE, regardless of the order of arguments to branchMask. I've now added a variant of link that takes a precomputed mask (to avoid inlining link manually), and the result does seem to improve performance a bit for short lists without hurting long ones. But there's hardly any signal here... mostly noise.

linkWithMask :: Mask -> Prefix -> IntMap a -> Prefix -> IntMap a -> IntMap a
linkWithMask m p1 t1 p2 t2
  | zero p1 m = Bin p m t1 t2
  | otherwise = Bin p m t2 t1
  where
    p = mask p1 m
{-# INLINE linkWithMask #-}
[1,1]/fromAscList1a                      mean 19.91 ns  ( +- 14.99 ps  )
[1,1]/fromAscList1a'                     mean 19.38 ns  ( +- 199.5 ps  )
[10,1]/fromAscList1a                     mean 145.6 ns  ( +- 1.948 ns  )
[10,1]/fromAscList1a'                    mean 140.1 ns  ( +- 3.528 ns  )
[-10,51791]/fromAscList1a                mean 160.2 ns  ( +- 2.175 ns  )
[-10,51791]/fromAscList1a'               mean 149.8 ns  ( +- 3.113 ns  )
[100,1]/fromAscList1a                    mean 1.549 μs  ( +- 28.80 ns  )
[100,1]/fromAscList1a'                   mean 1.444 μs  ( +- 23.56 ns  )
[-100,51791]/fromAscList1a               mean 1.614 μs  ( +- 48.31 ns  )
[-100,51791]/fromAscList1a'              mean 1.454 μs  ( +- 46.67 ns  )
[1000,1]/fromAscList1a                   mean 16.24 μs  ( +- 183.1 ns  )
[1000,1]/fromAscList1a'                  mean 15.57 μs  ( +- 167.7 ns  )
[-1000,51791]/fromAscList1a              mean 17.01 μs  ( +- 492.8 ns  )
[-1000,51791]/fromAscList1a'             mean 16.07 μs  ( +- 466.1 ns  )
[10000,1]/fromAscList1a                  mean 232.7 μs  ( +- 2.194 μs  )
[10000,1]/fromAscList1a'                 mean 232.3 μs  ( +- 2.882 μs  )
[-10000,51791]/fromAscList1a             mean 250.8 μs  ( +- 4.353 μs  )
[-10000,51791]/fromAscList1a'            mean 246.1 μs  ( +- 5.896 μs  )
[100000,1]/fromAscList1a                 mean 9.101 ms  ( +- 227.0 μs  )
[100000,1]/fromAscList1a'                mean 9.097 ms  ( +- 240.5 μs  )
[-100000,51791]/fromAscList1a            mean 9.188 ms  ( +- 339.0 μs  )
[-100000,51791]/fromAscList1a'           mean 9.189 ms  ( +- 412.6 μs  )
[1000000,1]/fromAscList1a                mean 94.98 ms  ( +- 12.07 ms  )
[1000000,1]/fromAscList1a'               mean 96.65 ms  ( +- 12.22 ms  )
[-1000000,51791]/fromAscList1a           mean 97.83 ms  ( +- 10.69 ms  )
[-1000000,51791]/fromAscList1a'          mean 97.62 ms  ( +- 10.81 ms  )

@int-e
Copy link
Contributor

int-e commented Jul 8, 2019

Oops, I believe we got #653 and #658 mixed up in the last 3 comments!

@sjakobi
Copy link
Member

sjakobi commented Jul 17, 2020

I'm currently doing a bit of triage on the open PRs.

Could someone please summarize the current state of this PR, and possibly also sketch out what needs to be done until it can be merged?

@int-e
Copy link
Contributor

int-e commented Jul 18, 2020

A lot going on here. I'll attempt a summary.

  • the goal is to speed up fromList (borrowing ideas from fromAscList, which builds the whole map in a bottom-up fashion without deconstructing partial maps) and in particular try to make it no worse than repeated insertions (IntSet.fromList slower than repeated IntSet.insert #288)
  • we lack adequate benchmarks that test fromList under a variety of inputs with different characteristics
  • I have some improvements to the PR code in a gist that could be incorporated, and made some effort to benchmark against various types of lists
  • I believe this is stalled mainly because the code is still worse than before in some cases, and also because this development took place in parallel to improving fromAscList (Improve fromAscList and friends. #658) which was far more successful

@sjakobi
Copy link
Member

sjakobi commented Jul 18, 2020

Thanks, @int-e! :)

Could the benchmarks be merged separately, so they are not lost or bit-rotted, and so they could potentially be used for other attempts at speeding up fromList?

Also, how about incorporating your code improvements into this PR? I guess the easiest way to do that would be if you'd make a PR against @treeowl's branch.

Of course it would also be nice to make progress on the original problem, but since that is currently stalled, let's try to ensure that the value created so far is saved.

@int-e
Copy link
Contributor

int-e commented Aug 6, 2020

I'm not going to work on this. The benchmarks are in a very ad-hoc state. Ideally, the benchmarks would use realistic inputs for fromList and I don't really know what those are.

As for my improvements, after looking at the benchmark data again, there's either no speedup at all or it's so small that it looks like noise. While my changes were relatively minor tweaks and refactorings, they make the code more complicated too, so they're almost certainly not worthwhile. The real question is... is there a better approach? But that requires some serious meditation (and time).

@sjakobi
Copy link
Member

sjakobi commented Aug 14, 2020

Thanks, @int-e! :)

Let's shelve this until we have better benchmarks then. (See #657)

@sjakobi sjakobi marked this pull request as draft August 14, 2020 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IntSet.fromList slower than repeated IntSet.insert
5 participants