-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster disjoint/isSubsetOf for Set via unbalanced splitting. #865
base: master
Are you sure you want to change the base?
Conversation
The same should be applicable to |
I'm going to need to see a proof that the stated time bounds still hold. Or if they don't, some reasonably tight new bound. We need to be sure that this doesn't introduce some nasty cases that our benchmarks don't happen to catch. |
-- Same as 'splitMember' but skips re-balancing by using 'bin' instead of 'link'. | ||
-- Attempting to build new trees out of these will error when re-balancing but | ||
-- this can improve performance when the resulting trees are disposable. | ||
splitMemberUnbalanced :: Ord a => a -> Set a -> (Set a,Bool,Set a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to factor out bin
/link
to avoid code duplication but that made the involved functions 2-3x slower.
Did you see my question? The bounds we have for these operations lean on the highly nontrivial published proofs for bounds on intersection and difference. My concern is that by allowing one set of trees to become unbalanced (hence potentially deep for their size), you could break those proofs (and bounds). |
My bad, hadn't noticed you had already replied by the time I finished commenting.
Right. The idea here is that the act of balancing that That said, I don't currently have a proof, only a gut feeling backed by no more than a specific benchmark. I'll peek at the published proofs. |
I think this is interesting, and can be applied to union/intersection/difference/others too. I implemented it and can see some improvements in set-operations-set, Resultsunion-block_nn: OK (0.39s) 419 μs ± 25 μs, 32% less than baseline union-block_nn_swap: OK (0.39s) 410 μs ± 32 μs, 35% less than baseline union-block_ns: OK (0.42s) 38.5 μs ± 3.0 μs, 44% less than baseline union-block_sn_swap: OK (0.47s) 48.0 μs ± 3.5 μs, 35% less than baseline union-common_nn: OK (0.77s) 512 μs ± 34 μs, same as baseline union-common_nn_swap: OK (0.84s) 1.23 ms ± 113 μs, 13% more than baseline union-common_ns: OK (0.42s) 404 μs ± 28 μs, 43% less than baseline union-common_nt: OK (0.55s) 34.7 μs ± 1.8 μs, 27% less than baseline union-common_sn_swap: OK (0.32s) 1.49 ms ± 146 μs, same as baseline union-common_tn_swap: OK (0.42s) 95.5 μs ± 5.5 μs, 14% more than baseline union-disj_nn: OK (0.53s) 2.97 μs ± 174 ns, 36% less than baseline union-disj_nn_swap: OK (0.54s) 2.82 μs ± 197 ns, 43% less than baseline union-disj_ns: OK (0.49s) 2.11 μs ± 177 ns, 40% less than baseline union-disj_nt: OK (0.54s) 1.23 μs ± 89 ns, 44% less than baseline union-disj_sn_swap: OK (0.51s) 2.41 μs ± 195 ns, 38% less than baseline union-disj_tn_swap: OK (0.47s) 1.72 μs ± 166 ns, 32% less than baseline union-mix_nn: OK (1.16s) 16.7 ms ± 624 μs, 6% less than baseline union-mix_nn_swap: OK (0.58s) 16.6 ms ± 565 μs, same as baseline union-mix_ns: OK (0.48s) 1.18 ms ± 42 μs, 31% less than baseline union-mix_nt: OK (0.36s) 66.1 μs ± 5.6 μs, 16% less than baseline union-mix_sn_swap: OK (0.46s) 2.25 ms ± 111 μs, 16% more than baseline union-mix_tn_swap: OK (0.46s) 97.9 μs ± 7.5 μs, 14% more than baseline difference-block_nn: OK (0.40s) 191 μs ± 16 μs, 56% less than baseline difference-block_nn_swap: OK (0.42s) 188 μs ± 11 μs, 57% less than baseline difference-block_ns: OK (0.45s) 18.4 μs ± 1.5 μs, 64% less than baseline difference-block_sn_swap: OK (0.42s) 17.7 μs ± 1.5 μs, 65% less than baseline difference-common_nn: OK (0.53s) 3.03 ms ± 189 μs, 14% less than baseline difference-common_nn_swap: OK (0.35s) 577 μs ± 44 μs, 17% less than baseline difference-common_ns: OK (0.31s) 1.33 ms ± 104 μs, 46% less than baseline difference-common_nt: OK (0.44s) 92.2 μs ± 7.3 μs, 29% less than baseline difference-common_sn_swap: OK (0.42s) 453 μs ± 25 μs, 55% less than baseline difference-common_tn_swap: OK (0.45s) 43.4 μs ± 3.4 μs, 49% less than baseline difference-disj_nn: OK (0.56s) 1.53 μs ± 90 ns, 56% less than baseline difference-disj_nn_swap: OK (0.56s) 1.55 μs ± 85 ns, 47% less than baseline difference-disj_ns: OK (0.51s) 1.19 μs ± 84 ns, 55% less than baseline difference-disj_nt: OK (0.58s) 772 ns ± 49 ns, 54% less than baseline difference-disj_sn_swap: OK (0.51s) 1.19 μs ± 87 ns, 50% less than baseline difference-disj_tn_swap: OK (0.56s) 736 ns ± 42 ns, 55% less than baseline difference-mix_nn: OK (0.31s) 3.11 ms ± 213 μs, 49% less than baseline difference-mix_nn_swap: OK (0.58s) 3.23 ms ± 147 μs, 47% less than baseline difference-mix_ns: OK (0.37s) 833 μs ± 55 μs, 40% less than baseline difference-mix_nt: OK (0.36s) 68.5 μs ± 6.1 μs, 28% less than baseline difference-mix_sn_swap: OK (0.33s) 562 μs ± 47 μs, 61% less than baseline difference-mix_tn_swap: OK (0.45s) 50.1 μs ± 3.1 μs, 44% less than baseline intersection-block_nn: OK (0.40s) 191 μs ± 16 μs, 66% less than baseline intersection-block_nn_swap: OK (0.42s) 189 μs ± 13 μs, 66% less than baseline intersection-block_ns: OK (0.44s) 18.4 μs ± 1.5 μs, 73% less than baseline intersection-block_sn_swap: OK (0.41s) 17.8 μs ± 1.5 μs, 74% less than baseline intersection-common_nn: OK (0.27s) 1.06 ms ± 90 μs, 32% less than baseline intersection-common_nn_swap: OK (0.20s) 545 μs ± 42 μs, 33% less than baseline intersection-common_ns: OK (0.26s) 975 μs ± 87 μs, 46% less than baseline intersection-common_nt: OK (0.38s) 77.0 μs ± 6.2 μs, 40% less than baseline intersection-common_sn_swap: OK (0.43s) 430 μs ± 22 μs, 67% less than baseline intersection-common_tn_swap: OK (0.45s) 43.3 μs ± 3.6 μs, 61% less than baseline intersection-disj_nn: OK (0.58s) 1.54 μs ± 85 ns, 62% less than baseline intersection-disj_nn_swap: OK (0.55s) 1.55 μs ± 92 ns, 65% less than baseline intersection-disj_ns: OK (0.51s) 1.20 μs ± 85 ns, 64% less than baseline intersection-disj_nt: OK (0.58s) 780 ns ± 47 ns, 66% less than baseline intersection-disj_sn_swap: OK (0.51s) 1.19 μs ± 84 ns, 65% less than baseline intersection-disj_tn_swap: OK (0.57s) 750 ns ± 47 ns, 65% less than baseline intersection-mix_nn: OK (0.54s) 3.16 ms ± 183 μs, 60% less than baseline intersection-mix_nn_swap: OK (0.31s) 3.06 ms ± 300 μs, 62% less than baseline intersection-mix_ns: OK (0.40s) 839 μs ± 54 μs, 58% less than baseline intersection-mix_nt: OK (0.51s) 65.5 μs ± 3.1 μs, 47% less than baseline intersection-mix_sn_swap: OK (0.31s) 585 μs ± 48 μs, 65% less than baseline intersection-mix_tn_swap: OK (0.48s) 56.1 μs ± 3.0 μs, 53% less than baseline union and intersection are only changed in terms of the unbalanced split, for difference I had to make a larger change so it is not a good direct comparison. There are also a handful of increases in union, not sure why. Anyway, this seems useful, so I'll also try to understand the proofs and see if they still apply with this change. |
One option to consider is to switch to an unbalanced split (or the "hedge" algorithms we used to use) when the sets/maps get small enough (below some fixed size). That will avoid breaking big O while getting a lot of the performance benefits in the cases where it's good. |
I'm surprised it's not always worse for operations that return sets since they must return balanced sets in the end to preserve invariants no? Are you doing a single call to balance at the very end? I haven't been in the headspace to look at this in a while, but one thing I'd been meaning to do is try and make this allocation free. It sounds plausible to me since without re-balancing the triple that's returned is immediately consumed. I had tried to do this via CPS but having functions as arguments seemed to kill performance and I'm always a bit lost when trying to reason about the Core that comes out. |
Not all the reconstructed pieces end up getting incorporated. For |
I had to only for -- A possibly unbalanced set.
-- Invariant: A Bin with non-zero size is balanced.
-- To construct an unbalanced set: Unbalanced (Bin 0 x l r)
newtype Unbalanced a = Unbalanced (Set a)
fromUnbalanced :: Unbalanced a -> Set a
fromUnbalanced (Unbalanced s0) = go s0
where
go (Bin 0 x l r) = link x (go l) (go r)
go s = s
splitSUnbalanced :: Ord a => a -> Unbalanced a -> StrictPair (Unbalanced a) (Unbalanced a)
splitMemberUnbalanced :: Ord a => a -> Unbalanced a -> (Unbalanced a,Bool,Unbalanced a)
union :: Ord a => Set a -> Set a -> Set a
union t10 t20 = go t10 (Unbalanced t20)
where
go t1 (Unbalanced Tip) = t1
go t1 (Unbalanced (Bin _ x Tip Tip)) = insertR x t1
go (Bin 1 x _ _) t2 = insert x (fromUnbalanced t2)
go Tip t2 = fromUnbalanced t2
go t1@(Bin _ x l1 r1) t2 = case splitSUnbalanced x t2 of
(l2 :*: r2)
| l1l2 `ptrEq` l1 && r1r2 `ptrEq` r1 -> t1
| otherwise -> link x l1l2 r1r2
where !l1l2 = go l1 l2
!r1r2 = go r1 r2
difference :: Ord a => Set a -> Set a -> Set a
difference t10 t20 = go t10 (Unbalanced t20)
where
go Tip _ = Tip
go t1 (Unbalanced Tip) = t1
go t1@(Bin _ x l1 r1) t2 = case splitMemberUnbalanced x t2 of
(l2,b,r2)
| b -> merge l1l2 r1r2
| l1l2 `ptrEq` l1 && r1r2 `ptrEq` r1 -> t1
| otherwise -> link x l1l2 r1r2
where !l1l2 = go l1 l2
!r1r2 = go r1 r2
intersection :: Ord a => Set a -> Set a -> Set a
intersection t10 t20 = go t10 (Unbalanced t20)
where
go Tip _ = Tip
go _ (Unbalanced Tip) = Tip
go t1@(Bin _ x l1 r1) t2
| b = if l1l2 `ptrEq` l1 && r1r2 `ptrEq` r1
then t1
else link x l1l2 r1r2
| otherwise = merge l1l2 r1r2
where
!(l2, b, r2) = splitMemberUnbalanced x t2
!l1l2 = go l1 l2
!r1r2 = go r1 r2 |
Unlike, say,
union
/intersection
,disjoint
doesn't return a new structure. It can avoid the re-balancing work because it immediately inspects and forgets the produced tree. This allows significant constant factor speedups.