IntSet.fromList slower than repeated IntSet.insert #288

amigalemming · 2016-06-18T15:49:04Z

I ran the following benchmark:

module Main where

import Criterion.Main (defaultMain, bgroup, bench, nf, )

import qualified Data.IntSet as IntSet; import Data.IntSet (IntSet)
import qualified Data.List as List
import Data.Word (Word64)


{-# INLINE randomIndices #-}
randomIndices :: Int -> [Int]
randomIndices k =
   map (fromIntegral . flip mod (fromIntegral k)) $
   iterate (\x -> mod (x*40692) 2147483399) (1::Word64)

constructIntSet :: Int -> Int -> IntSet
constructIntSet n k =
   IntSet.fromList $ take n $ randomIndices k

insertsIntSet :: Int -> Int -> IntSet
insertsIntSet n k =
   List.foldl' (flip IntSet.insert) IntSet.empty $
   take n $ randomIndices k


main :: IO ()
main =
   defaultMain $
      (bgroup "construct" $
         bench "1000" (nf (constructIntSet 500) 1000) :
         bench "3000" (nf (constructIntSet 500) 3000) :
         bench "9000" (nf (constructIntSet 500) 9000) :
         []) :
      (bgroup "insert" $
         bench "1000" (nf (insertsIntSet 500) 1000) :
         bench "3000" (nf (insertsIntSet 500) 3000) :
         bench "9000" (nf (insertsIntSet 500) 9000) :
         []) :
      []

and got:

benchmarking construct/1000
time                 53.36 μs   (53.24 μs .. 53.55 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 53.46 μs   (53.32 μs .. 53.73 μs)
std dev              638.6 ns   (367.6 ns .. 1.215 μs)

benchmarking construct/3000
time                 66.19 μs   (65.96 μs .. 66.54 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 66.21 μs   (66.09 μs .. 66.35 μs)
std dev              422.9 ns   (350.3 ns .. 547.7 ns)

benchmarking construct/9000
time                 81.97 μs   (81.67 μs .. 82.39 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 83.00 μs   (82.68 μs .. 83.29 μs)
std dev              1.052 μs   (914.6 ns .. 1.202 μs)

benchmarking insert/1000
time                 38.65 μs   (38.58 μs .. 38.79 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 38.90 μs   (38.80 μs .. 39.14 μs)
std dev              505.4 ns   (292.0 ns .. 1.066 μs)

benchmarking insert/3000
time                 47.91 μs   (47.53 μs .. 48.25 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 47.82 μs   (47.65 μs .. 48.02 μs)
std dev              626.2 ns   (518.8 ns .. 766.2 ns)

benchmarking insert/9000
time                 63.68 μs   (63.32 μs .. 64.21 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 64.61 μs   (64.27 μs .. 65.01 μs)
std dev              1.230 μs   (959.5 ns .. 1.847 μs)
variance introduced by outliers: 14% (moderately inflated)

However, I am not sure whether I got the strictness right.

The text was updated successfully, but these errors were encountered:

treeowl · 2016-06-18T16:02:07Z

This seems likely to be caused by one or both of the major foldl' changes
(only one of them intentional) in base 4.8. foldl' now participates in list
fusion (intentional) and it's now strict in the initial value of the
accumulator (unintentional). I think we probably want to copy the current
(4.8/4.9) foldl' implementation into Data.Utils.StrictFold for use with GHC
7.10 and later, leaving it alone for older versions.
On Jun 18, 2016 11:49 AM, "amigalemming" [email protected] wrote:

I ran the following benchmark:

module Main where

import Criterion.Main (defaultMain, bgroup, bench, nf, )

import qualified Data.IntSet as IntSet; import Data.IntSet (IntSet)
import qualified Data.List as List
import Data.Word (Word64)

{-# INLINE randomIndices #-}
randomIndices :: Int -> [Int]
randomIndices k =
map (fromIntegral . flip mod (fromIntegral k)) $
iterate (\x -> mod (x*40692) 2147483399) (1::Word64)

constructIntSet :: Int -> Int -> IntSet
constructIntSet n k =
IntSet.fromList $ take n $ randomIndices k

insertsIntSet :: Int -> Int -> IntSet
insertsIntSet n k =
List.foldl' (flip IntSet.insert) IntSet.empty $
take n $ randomIndices k

main :: IO ()
main =
defaultMain $
(bgroup "construct" $
bench "1000" (nf (constructIntSet 500) 1000) :
bench "3000" (nf (constructIntSet 500) 3000) :
bench "9000" (nf (constructIntSet 500) 9000) :
[]) :
(bgroup "insert" $
bench "1000" (nf (insertsIntSet 500) 1000) :
bench "3000" (nf (insertsIntSet 500) 3000) :
bench "9000" (nf (insertsIntSet 500) 9000) :
[]) :
[]

and got:

benchmarking construct/1000
time 53.36 μs (53.24 μs .. 53.55 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 53.46 μs (53.32 μs .. 53.73 μs)
std dev 638.6 ns (367.6 ns .. 1.215 μs)

benchmarking construct/3000
time 66.19 μs (65.96 μs .. 66.54 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 66.21 μs (66.09 μs .. 66.35 μs)
std dev 422.9 ns (350.3 ns .. 547.7 ns)

benchmarking construct/9000
time 81.97 μs (81.67 μs .. 82.39 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 83.00 μs (82.68 μs .. 83.29 μs)
std dev 1.052 μs (914.6 ns .. 1.202 μs)

benchmarking insert/1000
time 38.65 μs (38.58 μs .. 38.79 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 38.90 μs (38.80 μs .. 39.14 μs)
std dev 505.4 ns (292.0 ns .. 1.066 μs)

benchmarking insert/3000
time 47.91 μs (47.53 μs .. 48.25 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 47.82 μs (47.65 μs .. 48.02 μs)
std dev 626.2 ns (518.8 ns .. 766.2 ns)

benchmarking insert/9000
time 63.68 μs (63.32 μs .. 64.21 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 64.61 μs (64.27 μs .. 65.01 μs)
std dev 1.230 μs (959.5 ns .. 1.847 μs)
variance introduced by outliers: 14% (moderately inflated)

However, I am not sure whether I got the strictness right.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#288, or mute the thread
https://github.com/notifications/unsubscribe/ABzi_cLUEmpgsRKwsxPtnNkfuSYaSfkdks5qNBNwgaJpZM4I4-UP
.

amigalemming · 2016-06-18T16:06:48Z

For comparison the results of GHC-7.8.4:

benchmarking construct/1000
time                 59.70 μs   (59.61 μs .. 59.79 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 60.15 μs   (59.89 μs .. 60.64 μs)
std dev              1.175 μs   (564.6 ns .. 2.070 μs)
variance introduced by outliers: 15% (moderately inflated)

benchmarking construct/3000
time                 74.42 μs   (74.33 μs .. 74.55 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 74.83 μs   (74.67 μs .. 75.07 μs)
std dev              624.6 ns   (456.9 ns .. 980.6 ns)

benchmarking construct/9000
time                 89.63 μs   (89.52 μs .. 89.74 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 89.68 μs   (89.56 μs .. 89.86 μs)
std dev              478.1 ns   (371.0 ns .. 640.5 ns)

benchmarking insert/1000
time                 61.10 μs   (61.04 μs .. 61.18 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 61.34 μs   (61.25 μs .. 61.47 μs)
std dev              369.8 ns   (241.8 ns .. 657.3 ns)

benchmarking insert/3000
time                 71.49 μs   (71.39 μs .. 71.58 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 71.54 μs   (71.43 μs .. 71.72 μs)
std dev              450.0 ns   (306.2 ns .. 672.4 ns)

benchmarking insert/9000
time                 86.61 μs   (86.53 μs .. 86.72 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 86.90 μs   (86.77 μs .. 87.08 μs)
std dev              529.1 ns   (397.3 ns .. 782.1 ns)

This supports your hypothesis.

treeowl · 2016-07-07T18:17:23Z

@amigalemming, would you like to put together a pull request?

jwaldmann · 2019-07-17T22:16:01Z

Is this superseded by #653 now?

treeowl · 2019-07-17T22:20:09Z

@jwaldmann I really don't remember what I was thinking when I proposed a fix, and I don't really understand what I wrote, but I do think this will be fixed by #653.

int-e · 2020-07-18T11:51:39Z

For the record, I tried the stupid fromAscList . sort and it was terrible, more than two times slower than fromList.

m-renaud added the performance label Dec 26, 2017

jwaldmann mentioned this issue Feb 2, 2018

generalize type of Data.Set.unions (from List to Foldable)? #520

Closed

sjakobi linked a pull request Jul 15, 2020 that will close this issue

Speed up fromList for IntMap #653

Draft

sjakobi added the IntSet label Jul 15, 2020

int-e linked a pull request Jul 18, 2020 that will close this issue

Speed up fromList for IntMap #653

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IntSet.fromList slower than repeated IntSet.insert #288

IntSet.fromList slower than repeated IntSet.insert #288

amigalemming commented Jun 18, 2016

treeowl commented Jun 18, 2016

amigalemming commented Jun 18, 2016

treeowl commented Jul 7, 2016

jwaldmann commented Jul 17, 2019

treeowl commented Jul 17, 2019

int-e commented Jul 18, 2020

IntSet.fromList slower than repeated IntSet.insert #288

IntSet.fromList slower than repeated IntSet.insert #288

Comments

amigalemming commented Jun 18, 2016

treeowl commented Jun 18, 2016

amigalemming commented Jun 18, 2016

treeowl commented Jul 7, 2016

jwaldmann commented Jul 17, 2019

treeowl commented Jul 17, 2019

int-e commented Jul 18, 2020