Add NonEmpty containers, without imbalancing or massive code duplication #608

Ericson2314 · 2019-02-27T19:03:11Z

There's a number of packages on Hackage providing "non-empty" versions of containers, modeled after https://hackage.haskell.org/package/base/docs/Data-List-NonEmpty.html . There's a variety of use-cases, but perhaps my favorite is that they're needed to "curry" containers: for example, Map (k0, k1) v is isomorphic not to Map k0 (Map k1 v) but to Map k0 (NonEmptyMap k1 v). I like this use-case because it comes from the containers themselves.

Anyways, the problem with the existing implementations is that they tend to just prepend a minimal element to the container. This creates a less efficient, ill-balanced tree. But the alternative of copying the implementation creates a maintain burden. In https://github.com/mokus0/dependent-map/pull/31/files, I think I found an alternative which is the best of both worlds: mostly reused code and no degrade in balancing, namely making the types mutually recur.

For example, here are the types for DMap and NonEmptyDMap:

data DMap k f
  = Tip
  | Bin' {-# UNPACK #-} !(NonEmptyDMap k f)

pattern Bin s k v l r = Bin' (NonEmptyDMap s k v l r)

data NonEmptyDMap k f where
    NonEmptyDMap
         :: {- sz    -} !Int
         -> {- key   -} !(k v)
         -> {- value -} f v
         -> {- left  -} !(DMap k f)
         -> {- right -} !(DMap k f)
         -> NonEmptyDMap k f

DMap is like Maybe NonEmptyDMap but with the unboxed strict field.

Functions are then also written mutually recursively. (Take the exact weird worker wrapper style here with a grain of salt. This me trying to strike a balance between brevity and cargo culting the way the functions were written before.)

makeAdjustWithKey
  :: forall k f v
  .  GCompare k
  => (k v -> f v -> f v)
  -> k v
  -> ( DMap k f -> DMap k f
     , NonEmptyDMap k f -> NonEmptyDMap k f
     )
makeAdjustWithKey f k = (k `seq` go, k `seq` go')
  where
    go :: DMap k f -> DMap k f
    go Tip = Tip
    go (Bin' ne) = Bin' $! go' ne

    go' :: NonEmptyDMap k f -> NonEmptyDMap k f
    go' (NonEmptyDMap sx kx x l r) =
      case gcompare k kx of
        GLT -> NonEmptyDMap sx kx x (go l) r
        GGT -> NonEmptyDMap sx kx x l (go r)
        GEQ -> NonEmptyDMap sx kx (f kx x) l r

-- | /O(log n)/. Adjust a value at a specific key. When the key is not
-- a member of the map, the original map is returned.
adjustWithKey :: GCompare k => (k v -> f v -> f v) -> k v -> DMap k f -> DMap k f
adjustWithKey f k = fst $ makeAdjustWithKey f k

-- | /O(log n)/. Adjust a value at a specific key. When the key is not
-- a member of the map, the original map is returned.
adjustWithKey :: GCompare k => (k v -> f v -> f v) -> k v -> NonEmptyDMap k f -> NonEmptyDMap k f
adjustWithKey f k = snd $ makeAdjustWithKey f k

One interesting side benefit is that rotations and other internal operations can be less partial. Take rotateL, for example:

rotateL :: k v -> f v -> DMap k f -> NonEmptyDMap k f -> NonEmptyDMap k f
rotateL k x l r@(NonEmptyDMap _ _ _ ly ry)
  | sizeE ly < ratio*sizeE ry = singleL k x l r
  | otherwise                 = doubleL k x l r

The non-empty argument proves that at least one left rotation is possible. (The double rotate is still partial.)

As my parenthetical above hints, I haven't yet investigated performance properly; rather my goal was to do just enough work to demonstrate that something like this might be possible. If this sort of refactor looks viable to you, maintainers, I'd be happy to take a stab at doing it for all of containers, along with hunting down any performance issues that arise.

The text was updated successfully, but these errors were encountered:

treeowl · 2019-02-27T21:49:27Z

It looks plausible, yes. It'll be a lot of work, though. Do you think you could implement some of the basic operations and get a sense of the performance?

Ericson2314 · 2019-02-28T02:39:17Z

Glad to hear it! I'll start with something like the first commit of the DMap PR, obsidiansystems/dependent-map@5985aad, to see just the performance impact of just the type change.

treeowl · 2019-02-28T04:10:34Z

It would also be very nice to get a sense of the change in code size in one or two applications. I won't push too hard for that, but it'd be nice.

Ericson2314 · 2019-03-03T14:49:41Z

Are pattern synonyms OK (not too new) to use? I couldn't immediately tell.

treeowl · 2019-03-03T14:59:53Z

We aim for more portability than that. But feel free to use them to get some quick benchmarks!

Ericson2314 · 2019-03-09T02:08:46Z

Sorry for taking a bit; busy at work lately. Here's a branch with just the type change https://github.com/obsidiansystems/containers/tree/non-empty . Hope I'm not jinxing the optimizations being bullish, but know pattern synonyms so far.

treeowl · 2019-03-09T02:17:09Z

So what do your benchmarks show so far?

Ericson2314 · 2019-03-18T05:21:01Z

Sorry for the delays. It turns out that just as I opened this, I was heading into an especially busy period at work, so I haven't had much time for such open source things. Relatedly, I was being a bit daft and didn't realize you wanted me to run the benchmarks (should have read CONTRIBUTING.md and used common sense...).

Anyways, I did that now on GHC 8.6.3 with -O2, and the type change alone looks alright https://gist.github.com/Ericson2314/58709d0d99e0c0e83ad266701cd71841 . If anything, my type changes seems to have made it slightly faster (did not expect that!). Though I wouldn't also be surprised if there's nothing significant here; I ran them a few hours apart (while doing other things), so CPU temp differences or something might be distorting the results.

If the type change alone looks like a real and worthwhile performance benefit to you, I'd be happy to make the same change for all the containers, and only then sort out whether/how writing and exposing functions for the non-empty types makes sense.

Ericson2314 · 2019-03-30T00:32:07Z

@treeowl light ping :) --- I feel like a hypocrite after I taking a bit to respond my self, but have you had a chance to glance at those benchmarks? Just waiting for the signal to go do the same transformation on the other types.

treeowl · 2019-03-30T02:16:09Z

I'm generally on board with this approach.

Ericson2314 · 2019-04-01T00:45:26Z

Sweet! I'll convert all the other types and make a PR.

treeowl · 2019-04-01T00:48:47Z

You probably shouldn't bother with sequences, though. If it's possible to work a trick like that with sequences, it will be much more trouble than it's worth. It would make much more sense to fake non-empty sequences using smart constructors and such.

Ericson2314 · 2019-04-01T04:20:51Z

Right, let me roll back "all the other types" to just sticking to the family of Map, IntMap, Set, and IntSet`.

treeowl · 2019-04-01T04:23:59Z

Er... Are you sure about those Int ones? Make sure they make sense, and make sure they're a separate PR, please. Also, did you ever see the WordMap stuff? It seems to have been abandoned, but it would be very interesting to see where it would go if picked up.

Ericson2314 · 2019-04-01T04:38:44Z

Oh oops completely forget that the Int ones are Patricia trees and also totally different. Nevermind then. The Nil constructor invariant begs for the more conventional newtype IntMap a = IntMap (UnpackingMaybe (NonEmptyIntMap a)), but until sums can be unpacked sadly that's not a pure win.

treeowl · 2019-04-08T18:22:18Z

When do you expect to submit a proposal for non-empty Set and Map to the libraries list? It really shouldn't take long. Just get a good list of applications together.

Ericson2314 · 2019-04-08T21:48:34Z

Sorry I've also been hacking on GHC's build system a bit, spreading myself too thin again. It is indeed not too hard, and I can take a bunch of text from the original issue I opened.

treeowl · 2019-04-15T01:12:02Z

Ping.

Ericson2314 · 2019-04-18T23:16:36Z

Sorry I took so long. Finally sent in https://mail.haskell.org/pipermail/libraries/2019-April/029537.html

sjakobi · 2020-07-19T12:57:31Z

Could someone please summarize how the libraries proposal went?

Also, is there an overview of the proposed API changes?

Ericson2314 · 2020-09-07T20:42:41Z

@sjakobi I recall it was mostly positive feedback? #616 was the implementation. I'm sorry I've never finished it off, I've been busy with many other things.

Ericson2314 mentioned this issue Mar 18, 2019

Suppress nil patch events in QueryT as an optimization reflex-frp/reflex#203

Merged

Ericson2314 linked a pull request Apr 2, 2019 that will close this issue

WIP: NonEmptySet and NonEmptyMap #616

Open

Ericson2314 mentioned this issue Oct 8, 2019

WordMap #683

Closed

gereeter mentioned this issue Jan 12, 2020

Rewrite Data.IntMap to be faster and use less memory #340

Open

sjakobi added the feature-request label Jul 15, 2020

sjakobi linked a pull request Jul 15, 2020 that will close this issue

WIP: NonEmptySet and NonEmptyMap #616

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NonEmpty containers, without imbalancing or massive code duplication #608

Add NonEmpty containers, without imbalancing or massive code duplication #608

Ericson2314 commented Feb 27, 2019

treeowl commented Feb 27, 2019

Ericson2314 commented Feb 28, 2019

treeowl commented Feb 28, 2019

Ericson2314 commented Mar 3, 2019

treeowl commented Mar 3, 2019

Ericson2314 commented Mar 9, 2019

treeowl commented Mar 9, 2019

Ericson2314 commented Mar 18, 2019 •

edited

Loading

Ericson2314 commented Mar 30, 2019 •

edited

Loading

treeowl commented Mar 30, 2019

Ericson2314 commented Apr 1, 2019

treeowl commented Apr 1, 2019

Ericson2314 commented Apr 1, 2019

treeowl commented Apr 1, 2019

Ericson2314 commented Apr 1, 2019

treeowl commented Apr 8, 2019

Ericson2314 commented Apr 8, 2019 •

edited

Loading

treeowl commented Apr 15, 2019

Ericson2314 commented Apr 18, 2019

sjakobi commented Jul 19, 2020

Ericson2314 commented Sep 7, 2020

Add NonEmpty containers, without imbalancing or massive code duplication #608

Add NonEmpty containers, without imbalancing or massive code duplication #608

Comments

Ericson2314 commented Feb 27, 2019

treeowl commented Feb 27, 2019

Ericson2314 commented Feb 28, 2019

treeowl commented Feb 28, 2019

Ericson2314 commented Mar 3, 2019

treeowl commented Mar 3, 2019

Ericson2314 commented Mar 9, 2019

treeowl commented Mar 9, 2019

Ericson2314 commented Mar 18, 2019 • edited Loading

Ericson2314 commented Mar 30, 2019 • edited Loading

treeowl commented Mar 30, 2019

Ericson2314 commented Apr 1, 2019

treeowl commented Apr 1, 2019

Ericson2314 commented Apr 1, 2019

treeowl commented Apr 1, 2019

Ericson2314 commented Apr 1, 2019

treeowl commented Apr 8, 2019

Ericson2314 commented Apr 8, 2019 • edited Loading

treeowl commented Apr 15, 2019

Ericson2314 commented Apr 18, 2019

sjakobi commented Jul 19, 2020

Ericson2314 commented Sep 7, 2020

Ericson2314 commented Mar 18, 2019 •

edited

Loading

Ericson2314 commented Mar 30, 2019 •

edited

Loading

Ericson2314 commented Apr 8, 2019 •

edited

Loading