Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: NonEmptySet and NonEmptyMap #616

Open
wants to merge 53 commits into
base: master
Choose a base branch
from

Conversation

Ericson2314
Copy link
Contributor

@Ericson2314 Ericson2314 commented Apr 2, 2019

Do #608.

One downside to this PR as writing is a made a mess of various vertical alignments in the code. I'd be happy to fix that if this looks promising.

@treeowl
Copy link
Contributor

treeowl commented Apr 2, 2019

My personal alignment preferences are not aligned with the historical standards of the package. If there are to be major alignment shifts, I would prefer to see the style change as well. In particular, I like fairly minimal alignment when readability doesn't demand more, and I like things to stay pretty far to the left. The classic "we can make it look like traditional mathematical notation, so we should" argument doesn't carry much weight with me. That said, try not to change much more than you have to. A readable diff is a high priority when you're doing something complicated like this.

@Ericson2314
Copy link
Contributor Author

@treeowl Great! yes I think the current style is mainly a recipe for tons of merge conflicts. The "sed style" I did does keep the diff smaller, and when I changed it more it was to wrap lines and move to the left.

@treeowl
Copy link
Contributor

treeowl commented Apr 2, 2019

There's a build failure.

@Ericson2314
Copy link
Contributor Author

Oops. Took out the pattern synonyms but left the extension.

@Ericson2314
Copy link
Contributor Author

Weird CI failures. Maybe restart it?

@treeowl
Copy link
Contributor

treeowl commented Apr 3, 2019

Weird build error fixed. Real build errors remain.

@treeowl
Copy link
Contributor

treeowl commented Apr 3, 2019

Looks like you still have some pattern synonyms going on.

@Ericson2314
Copy link
Contributor Author

Ericson2314 commented Apr 3, 2019

Sorry! I removed that one import----didn't see anything else and built it locally again---hopefully it works now.

@treeowl
Copy link
Contributor

treeowl commented Apr 3, 2019

It's a good start, but there are some significant issues.

  1. The constructor names are now really awful. Maybe make Bin the constructor of NonEmptyMap and make NE (short!) the first constructor of Map? I'm not really sure what's best, but it's not this.

  2. Since you want to expose the nonempty map type, it will need a bunch of functions. A great many of those are best expressed through mutual recursion with the Map equivalents. The current intermediate state is very awkward and will all have to change again, so let's skip that. There will be massive naming conflict issues (ugh!). Let's yank off the band-aid and deal with it. One likely option: give the nonempty functions horrible names in Data.Map.Internal and rename them all elsewhere for public consumption.

@treeowl
Copy link
Contributor

treeowl commented Apr 3, 2019

Er ... My second point is only valid if we end up exporting that stuff. Is there enough demand for nonempty sets and maps to justify it? That at least needs a libraries list discussion.

@Ericson2314
Copy link
Contributor Author

Oh I made this intermediate state PR just because of the benchmarking of the type change alone. If you want to go all or nothing re exporting the stuff that's fine; I can email the list.

I'll do the constructor renames in any event.

@Ericson2314
Copy link
Contributor Author

Ericson2314 commented Apr 19, 2019

OK fixed conflicts and did

  1. Bin -> NE
  2. NonEmpty* (constructor) -> Bin

as requested.

@treeowl
Copy link
Contributor

treeowl commented Apr 19, 2019

I think it's pretty safe to assume we're going to expose the nonempty types (by some names or other). So if you have time, could you start working on breaking up the functions into mutually recursive pairs? Don't squash into the current commit yet, in case we change our minds.

@treeowl
Copy link
Contributor

treeowl commented Apr 19, 2019

Use NE suffixes for the nonempty functions; we'll rename them in the nonempty modules.

@Ericson2314 Ericson2314 changed the title Split Map and Set types with mutually recursive NonEmpty* WIP: NonEmptySet and NonEmptyMap Apr 19, 2019
@Ericson2314
Copy link
Contributor Author

@treeowl OK sounds good. I renamed the PR accordingly.

@Ericson2314
Copy link
Contributor Author

@treeowl OK pushed a select few Set functions just to double check the style looks good with you.

Relatedly, the wording of Note: Local 'go' functions and capturing seems to warn against capturing arguments in go, but the commits that touched it, 9be5ec2 and 78c1e52, say in the commit message that it's OK, and in the code always seem to capture more, not less. I took the liberty of capturing more to be less noisy to compensate for the noise of mutual recursion.

@Ericson2314
Copy link
Contributor Author

@treeowl hold up, github is loosing comments, ugh. (It normally just marks outdated comments as such and force pushes are fine.)

@Ericson2314
Copy link
Contributor Author

(9555fb4 some missing comments here)

@Ericson2314
Copy link
Contributor Author

I'll just not force-push for now; that seems easiest.

@treeowl
Copy link
Contributor

treeowl commented Apr 19, 2019

Yeah, GitHub is rough. I meant member, not lookup. Sorry. Things are about to get crazy here.... Passover is starting.

@Ericson2314
Copy link
Contributor Author

Oh no worries at all. I'll keep poking at it, taking those into account.

Sync non-empty branch with upstream
@alexfmpe
Copy link
Contributor

@treeowl I'm helping with some legwork on this PR.
Merged with master and dusting it off a bit before having a go at benchmarking for regressions.
I have some questions about the exported NE variants:

  1. There are a good deal of functions that differ only by calling nonEmpty or fmap nonEmpty at the very end. These don't seem to add much benefit. I could see a case for those if they were very common, but not in general.

    intersectionNE :: Ord k => NonEmptyMap k a -> NonEmptyMap k b -> Maybe (NonEmptyMap k a)
    intersectionNE t1 t2 = nonEmpty $ intersection (NE t1) (NE t2)
    
    traverseMaybeWithKeyNE
      :: Applicative f
      => (k -> a -> f (Maybe b)) -> NonEmptyMap k a -> f (Maybe (NonEmptyMap k b))
    traverseMaybeWithKeyNE f = fmap nonEmpty . traverseMaybeWithKeyFromNE' f
  2. Some functions are asymmetrical. For instance, insert produces a map with at least one element with possibly empty maps as children, so it's 'fundamental' type is Map k a -> NonEmptyMap k a. Currently we have

    insert :: Ord k => k -> a -> Map k a -> Map k a
    insertNE :: Ord k => k -> a -> NonEmptyMap k a -> NonEmptyMap k a

    but I would instead expect some

    insert' :: Ord k => k -> a -> Map k a -> NonEmptyMap k a

    since the other two variants can be obtained by forgetting non-emptiness

    insert k a = forget . insert' k a
    insertNE = insert' k a . forget 

    with forget being a placeholder for some exposed version of NE.
    The advantage of having more functions is that some pattern-matching can be skipped by avoiding client code tacking on a NE only to have the library immediately doing a case on it. But maybe we can avoid exporting a bunch of variants by adding rewrite rules for these cases?

    Now, replacing the type of insert would be one big breaking change, so one option is to provide the Map -> NonEmptyMap variant instead of current insertNE, which would be a non-breaking change and yet keep open the possibility of a future breaking change.

@phadej
Copy link
Contributor

phadej commented Sep 29, 2022

My personal wish is that Data.Map module doesn't mention NEMap.

E.g. there is no consNE :: a -> [a] -> NonEmpty a in Data.List and having such would be weird. There are both
(:|) :: a -> [a] -> NonEmpty a and cons :: a -> NonEmpty a -> NonEmpty a in Data.List.NonEmpty though (the naming problem is cleverly avoided)

But I agree that nonEmpty and fmap nonEmpty are better left off for NEMap functions, e.g. there is

filter :: (a -> Bool) -> NonEmpty a -> [a]

in Data.List.NonEmpty. https://hackage.haskell.org/package/base-4.17.0.0/docs/Data-List-NonEmpty.html#v:filter, and no -> Maybe (NonEmpty a) variant.

@phadej
Copy link
Contributor

phadej commented Sep 29, 2022

To be clear:

I expect this change to cause minimal or even no changes to the Data.Map and Data.Set module interfaces.

@treeowl
Copy link
Contributor

treeowl commented Sep 29, 2022

@phadej Why do you think that's the right thing to expect?

@konsumlamm
Copy link
Contributor

Has it already been decided that we really want non-empty maps/sets in containers? They don't seem particularly useful to me (I'm generally not a fan of NonEmpty* types, because now you have to duplicate every function that should work on both variants or constantly convert between the variants), especially for a core package like containers. nonempty-containers already exists, if you really want nonempty containers.

As for the names, adding NE isn't very ergonomic, or would that be renamed for exporting (if so, that would add quite a bit of bloat, wouldn't it)?

@phadej
Copy link
Contributor

phadej commented Sep 29, 2022

@treeowl I expect no breaking changes, which implies there won't be massive changes in Data.Map or Data.Set. If you change the type of Data.Map.insert, I'm sure people will be very angry. I will be for sure.

@konsumlamm if you look at the implementation, it's almost free:

+data Map k a  = Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a)
-data Map k a  = NE {-# UNPACK #-} !(NonEmptyMap k a)
              | Tip

+data NonEmptyMap k a = Bin' {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a)

It's silly to duplicate the code in nonempty-contrainers, especially as the implementation there is bad:

data NEMap k a =
    NEMap { nemK0  :: !k   -- ^ invariant: must be smaller than smallest key in map
          , nemV0  :: a
          , nemMap :: !(Map k a)
          }
  deriving (Typeable)

OTOH. It might be a good idea to first only do the changes in Internal modules, and then do a companion package which would try to figure out the interfaces for non-empty modules, without forcing the major version bumps of containers version. These are very disruptive. I don't believe that any person or committee can figure out the good interface first time, so experimentation/stabilisation is better done in a separate package.

@phadej
Copy link
Contributor

phadej commented Sep 29, 2022

The idea of doing only internal changes first also solves the name bikeshedding problem. As soon as internals are done, you can make a release.

@konsumlamm
Copy link
Contributor

if you look at the implementation, it's almost free:

+data Map k a  = Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a)
-data Map k a  = NE {-# UNPACK #-} !(NonEmptyMap k a)
              | Tip

+data NonEmptyMap k a = Bin' {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a)

This PR does a lot more than just that though, it adds a bunch of new functions (more than 1000 additional LOC).

@phadej
Copy link
Contributor

phadej commented Sep 29, 2022

@konsumlamm that is why I suggest to keep changes to the Internal modules, to reduce the size of the change.

@jwaldmann
Copy link
Contributor

Has it already been decided that we really want non-empty maps/sets in containers? They don't seem particularly useful to me

I think the same.

I can see that it's nice to know statically whether some collection (sequence, list) is non-empty (although it feels a bit arbitrary - perhaps I want the size to be greater than 42, or prime)

Specifically for Map and Set, we will (mostly?) ultimately look up some key, and we can't know statically whether it's there, even if we have static info on emptiness. Well, unless we only do bulk operations - but then intersection has a similar problem: the result can be empty.

On the other hand, I do follow the "curried map" motivation (see beginning of #608 (comment)). But then, what will we do with these curried maps? Probably some look-up, so .. see above.

In all, I probably won't use guaranteed-non-empty map/set much (in my code, in my teaching), so I prefer that the well-known API (Data.Map, Data.Set) does not change - as @phadej was already saying in #616 (comment)

@treeowl
Copy link
Contributor

treeowl commented Sep 29, 2022

The existing public API can't change, but the implementation can, as long as performance doesn't suffer.

@konsumlamm
Copy link
Contributor

The existing public API can't change, but the implementation can, as long as performance doesn't suffer.

Has anyone already done benchmarks to see how this change affects performance?

@alexfmpe
Copy link
Contributor

alexfmpe commented Oct 2, 2022

Has anyone already done benchmarks to see how this change affects performance?

I've ran set-benchmarks and map-benchmarks which seem to indicate, respectively, a geometric mean of 10% and 35% slowdown, but it also seems like the slowdown is mostly concentrated on insert/delete related methods, so I suspect it pretty much boils down to slowdown creeping into primitives like link/glue. Oddly enough some things seems straight up faster, like Set.member or Set.filter.

Here's the result for set-benchmarks following https://github.com/Bodigrim/tasty-bench#comparison-against-baseline

Name,Old,New,Ratio
All.member,221875789,209289636,0.943274
All.insert,1034989168,1407236575,1.35966
All.map,200213961,224134561,1.11948
All.filter,79314771,71061466,0.895942
All.partition,166230030,182848472,1.09997
All.fold,37304,36632,0.981986
All.delete,510582421,960032359,1.88027
All.findMin,16609,17358,1.0451
All.findMax,18064,17922,0.992139
All.deleteMin,100079,99664,0.995853
All.deleteMax,98491,99977,1.01509
All.unions,171477906,195428890,1.13967
All.union,172355501,196027891,1.13735
All.difference,114564277,133476880,1.16508
All.intersection,72926835,56555242,0.775507
All.fromList,78321178,99341937,1.26839
All.fromList-desc,958907987,1326825750,1.38368
All.fromAscList,176746806,178201611,1.00823
All.fromDistinctAscList,47820229,59936910,1.25338
All.disjoint:false,30067,30159,1.00306
All.disjoint:true,165839725,160507622,0.967848
All.null.intersection:false,72714958,55893992,0.768673
All.null.intersection:true,218427448,132499310,0.606606
All.alterF:member,867611900,867927140,1.00036
All.alterF:insert,1221301962,1186877518,0.971813
All.alterF:delete,489001276,469350754,0.959815
All.alterF:four,852520328,851147737,0.99839
All.alterF:four:strings,1756882975,1694781500,0.964652
All.alterF_naive:four,1490742162,4849000625,3.25274
All.alterF_naive:four:strings,4308662200,7881103600,1.82913
Geometric mean,,,1.09951

Specifically for Map and Set, we will (mostly?) ultimately look up some key, and we can't know statically whether it's there, even if we have static info on emptiness. Well, unless we only do bulk operations - but then intersection has a similar problem: the result can be empty.

I would say it's not just lookups. A recurring use-case is to perform a bunch of Map/Set operations and then either spit out a NonEmpty list or iterate the Map/Set somehow. For instance, it is extremely common in UI code to represent an empty container with a "no results" state, rather than merely an empty list. While this is at times achievable by converting to a list and calling nonEmpty, sometimes we want to preserve the underlying structure (e.g. using Traversable NonEmptyMap) or simply avoid superfluous empty states, as per the curried map example.

My plan going forward is roughly to:

  1. PR to master benchmarks for some pre-existing exported utils that don't have them to get a more comprehensive picture of performance impact.
  2. Investigate the existing speed-ups and try to standalone PR them into master if applicable
  3. Remove all external facing exports from this PR.
  4. Remove all the utils in this PR that merely add a nonEmpty on top of other functionality
  5. Reduce new variants as much as possible, in the spirit of (2) in WIP: NonEmptySet and NonEmptyMap #616 (comment)
  6. Try to fix remaining regressions in the PR
  7. Get the PR into a state where no external API changes whatsoever exist, but the internals are restructured to use NonEmpty where applicable.

Once that's all done and merged we can start thinking about external facing API.

Thoughts?

@phadej
Copy link
Contributor

phadej commented Oct 2, 2022

@alexfmpe that sounds great!

@alexfmpe
Copy link
Contributor

alexfmpe commented Oct 12, 2022

The proliferation of split functions and combinations has been gnawing at me, so I did a little experiment on 'emptiness polymorphism'.

Looks like we can have both Set and NonEmptySet exist without having to immediately change implementations. This simple change to master compiles:
alexfmpe@83f14b3

A bunch of functions that do not produce sets or preserve emptyness can be 'generalized' by simply changing the types and adding a dummy to argument to Tip:
alexfmpe@460932f

This approach would make the diff massively smaller, and less room for regressions since the implementations are essentially the same. Benchmark for the above commit in 9.4.2:

All
  member:                    OK (0.14s)
    132  μs ±  12 μs,   0 B  allocated,   0 B  copied, 8.0 MB peak memory,       same as baseline
  insert:                    OK (0.32s)
    626  μs ±  52 μs, 2.6 MB allocated,  56 KB copied, 8.0 MB peak memory,       same as baseline
  map:                       OK (0.21s)
    110  μs ± 7.6 μs, 557 KB allocated,  16 KB copied, 8.0 MB peak memory,       same as baseline
  filter:                    OK (0.17s)
    80.6 μs ± 6.1 μs,  80 KB allocated, 1.2 KB copied, 8.0 MB peak memory,       same as baseline
  partition:                 OK (0.17s)
    158  μs ±  13 μs, 239 KB allocated, 3.7 KB copied, 8.0 MB peak memory,       same as baseline
  fold:                      OK (0.17s)
    35.4 ns ± 3.2 ns, 695 B  allocated,   0 B  copied, 8.0 MB peak memory,       same as baseline
  delete:                    OK (0.22s)
    421  μs ±  37 μs, 1.3 MB allocated, 349 B  copied, 8.0 MB peak memory,       same as baseline
  findMin:                   OK (0.58s)
    16.7 ns ± 1.5 ns,  31 B  allocated,   0 B  copied, 8.0 MB peak memory,       same as baseline
  findMax:                   OK (0.29s)
    16.7 ns ± 1.6 ns,  31 B  allocated,   0 B  copied, 8.0 MB peak memory,       same as baseline
  deleteMin:                 OK (0.23s)
    103  ns ± 9.7 ns, 471 B  allocated,   0 B  copied, 8.0 MB peak memory, 10% less than baseline
  deleteMax:                 OK (0.26s)
    121  ns ± 8.4 ns, 511 B  allocated,   0 B  copied, 8.0 MB peak memory, 13% more than baseline
  unions:                    OK (0.35s)
    167  μs ± 7.1 μs, 472 KB allocated,  10 KB copied, 8.0 MB peak memory, 20% more than baseline
  union:                     OK (0.35s)
    169  μs ± 6.9 μs, 472 KB allocated,  10 KB copied, 8.0 MB peak memory, 20% more than baseline
  difference:                OK (0.22s)
    107  μs ± 5.4 μs, 239 KB allocated, 2.4 KB copied, 8.0 MB peak memory,       same as baseline
  intersection:              OK (0.21s)
    53.3 μs ± 3.4 μs,  80 KB allocated, 834 B  copied, 8.0 MB peak memory, 16% more than baseline
  fromList:                  OK (0.29s)
    69.0 μs ± 3.4 μs, 271 KB allocated, 6.9 KB copied, 8.0 MB peak memory,       same as baseline
  fromList-desc:             OK (0.15s)
    618  μs ±  47 μs, 2.6 MB allocated,  55 KB copied, 8.0 MB peak memory,       same as baseline
  fromAscList:               OK (0.13s)
    123  μs ±  12 μs, 414 KB allocated, 8.3 KB copied, 8.0 MB peak memory,       same as baseline
  fromDistinctAscList:       OK (0.20s)
    50.5 μs ± 3.6 μs, 159 KB allocated, 3.2 KB copied, 8.0 MB peak memory,       same as baseline
  disjoint:false:            OK (0.24s)
    26.7 ns ± 1.5 ns,  31 B  allocated,   0 B  copied, 8.0 MB peak memory,       same as baseline
  disjoint:true:             OK (0.16s)
    149  μs ±  11 μs, 214 KB allocated, 130 B  copied, 8.0 MB peak memory,       same as baseline
  null.intersection:false:   OK (0.23s)
    54.3 μs ± 4.2 μs,  80 KB allocated, 834 B  copied, 8.0 MB peak memory, 16% more than baseline
  null.intersection:true:    OK (0.16s)
    163  μs ±  16 μs, 293 KB allocated, 173 B  copied, 8.0 MB peak memory,       same as baseline
  alterF:member:             OK (0.21s)
    816  μs ±  42 μs, 2.4 MB allocated, 393 B  copied, 8.0 MB peak memory,       same as baseline
  alterF:insert:             OK (0.19s)
    747  μs ±  50 μs, 3.6 MB allocated, 106 KB copied, 8.0 MB peak memory,       same as baseline
  alterF:delete:             OK (0.26s)
    493  μs ±  27 μs, 1.9 MB allocated, 416 B  copied, 8.0 MB peak memory,       same as baseline
  alterF:four:               OK (0.19s)
    787  μs ±  66 μs, 2.4 MB allocated, 378 B  copied, 8.0 MB peak memory,       same as baseline
  alterF:four:strings:       OK (0.22s)
    1.69 ms ±  99 μs, 2.5 MB allocated, 423 B  copied, 8.0 MB peak memory,       same as baseline
  alterF_naive:four:         OK (0.18s)
    1.41 ms ±  89 μs, 2.0 MB allocated, 315 B  copied, 8.0 MB peak memory,       same as baseline
  alterF_naive:four:strings: OK (0.13s)
    4.15 ms ± 391 μs, 2.0 MB allocated, 469 B  copied, 8.0 MB peak memory,       same as baseline

Notice the changed methods member/map/fold have the same time performance. Haven't yet looked into what's up with delete/union/intersection but since the only thing that actually changed for those was adding () on Tip, and given what I've seen while looking at improvements/regressions in the current PR, I expect a good deal of noise in whether optimizations kick in, and probably GHC needs some hand holding here. The unexpected one is deleteMin, but I've also seen filter get a speedup on this PR by virtue of what seems to be grotesque inlining (https://pastebin.com/AKAwCkbX vs https://pastebin.com/2GBaEbN8), so it's not too surprising.

This lets a bunch of the current functions consume nonemptys, so the variants we need to add should significantly go down in size. To be able to actually create NonEmptySet, we'd need to add singleton/insert/etc variants
alexfmpe@5debc86.
The original motivation, curried-map, also introduces nonempty-ness.

Another advantage is that since we change from "a set is either empty or an element and two sets" to "a non empty set is a set that can not be empty", that is, changed the nullary constructor instead of the recursive one, it now also works when there's structurally no element in the recursive case, as is the case for IntSet/IntMap:
alexfmpe@7ee91fd

There's a drawback though - a breaking change.
Changing Set to a type synonym means instances requires consumers that add instances of Set to use TypeSynonymInstances/FlexibleInstances (maybe generalize Set' themselves). For instance, trying to test/benchmark the above fails with

src/Test/QuickCheck/Function.hs:328:33: error:
    • Illegal instance declaration for ‘Function (Set.Set a)’
        (All instance types must be of the form (T t1 ... tn)
         where T is not a synonym.
         Use TypeSynonymInstances if you want to disable this.)
    • In the instance declaration for ‘Function (Set.Set a)’
    |
328 | instance (Ord a, Function a) => Function (Set.Set a) where

If folks think this approach makes sense and that whether it should be pursued depends on the magnitude of breakage, I'd be up for doing impact assessment.

@alexfmpe
Copy link
Contributor

Another advantage is that we can in theory expose Set' itself so downstream consumers are also able to write emptiness-polymorphic functions/instances. However, we don't want to allow parameterizing with nonsense, (e.g. Set' Int) so we'd need to whitelist with DataKinds. Examples:

  • with TypeFamilies

    data Restriction = Unrestricted | NonEmpty
    
    type Foo :: Restriction -> *
    type family Foo r where
      Foo Unrestricted = ()
      Foo NonEmpty = Void
    
    data Set' (r :: Restriction) a
      = Bin {-# UNPACK #-} !Int !a !(Set a) !(Set a)
      | Tip (Foo r)
  • with GADTs

    data Restriction e where
      Unrestricted :: Restriction ()
      NonEmpty :: Restriction Void
    
    type Set' :: forall e. Restriction e -> * -> *
    data Set' (r :: Restriction e) a
      = Bin {-# UNPACK #-} !Size !a !(Set a) !(Set a)
      | Tip e

Since this uses GHC-specific extensions, it'd require the more general definitions (type/instances/code) to be behind CPP to preserve compatibility. AFAICT it should still result in a much smaller diff than the current approach, since It would only apply to the types/signatures, rather than split every function in half.

Impact assessment

An attempt to build 2660 packages via clc-stackage with a ghc using containers 0.6.5.1 plus the Set type synonym change results in the build stopping roughly half-way through, blocked on 17 packages that won't build. So that's roughly 50% of packages indirectly affected, and (extrapolating) around 1% directly.
16 of those failures are due to requiring TypeSynonymInstances/FlexibleInstances while the last one, doclayout, uses Tip directly from the internal modules: https://github.com/jgm/doclayout/blob/ec0cdddeb828a71f8a78aedc8b2fff530534ef54/src/Text/DocLayout.hs#L1111

> cat log | grep 'requires download & build' | wc -l
2660
> cat log | grep 'Completed' | wc -l
1350
> cat log | grep 'Failed to build' | wc -l
17

Additionally, binary / Cabal / haddock also needed FlexibleInstances to build ghc itself.

FlexibleInstances can be added in advance to prevent breakage. After a release, if Set', etc is exported, downstream functions/instances can be generalized on a per-package basis where applicable, possibly behind CPP to maintain backwards compatibility. The GADTs approach requires StandaloneKindSignatures (ghc 8.10.1+) which might make for more CPP.

This encoding is technically breaking as the assessment shows, but it seems much more useful than completely separate types which is likely to also lead to function proliferation downstream, and in principle can also be applied to other types: say, to de-duplicate Data.List and Data.List.NonEmpty (though that's a trickier, possibly prohibitive, compatibility story).

@treeowl @Ericson2314

@treeowl
Copy link
Contributor

treeowl commented Oct 27, 2022

It's been a while since I looked in on the progress here. Why do you want to make Set a type synonym? There would have to be a very, very good reason.

@alexfmpe
Copy link
Contributor

See #616 (comment)

@treeowl
Copy link
Contributor

treeowl commented Oct 27, 2022

Why not use a newtype instead of a type synonym? There aren't that many wrapper functions to write.

@alexfmpe
Copy link
Contributor

alexfmpe commented Nov 6, 2022

Why not use a newtype instead of a type synonym? There aren't that many wrapper functions to write.

Don't think I understand what's being proposed. The attempt to parameterize emptiness had as goal unifying Foo / NonEmptyFoo whenever there's an empty constructor, so AFAICT Foo / NonEmptyFoo need to be specializations of the same type. Does "use a newtype" refer to the current approach?

@treeowl
Copy link
Contributor

treeowl commented Nov 6, 2022

I mean instead of type Set a = Set' a a, why not newtype Set a = Set (Set' a a)?

@alexfmpe
Copy link
Contributor

alexfmpe commented Nov 6, 2022

I mean instead of type Set a = Set' a a, why not newtype Set a = Set (Set' a a)?

That'd be newtype Set a = Set (Set' () a) right?

So that'd mean

  1. Unify Foo/NonEmptyFoo types and implementations internally - e.g. in Data.Set.Internal
  2. Add coerce wrappers for the externally exported bindings - e.g. in Data.Set

I see. That would prevent breakage, allow also adding non-empty-ness for the Int-trees, and keep the implementations the same, up to coerce, so I do think it seems an improvement to the current approach.
Downside compared to the type-synonym is that it wouldn't allow downstream to use emptyness-agnostic code (unless they imported Internal modules I guess?), so consumers might end up writing variants both for Foo and NonEmptyFoo. I didn't want to lock out that option right out the gate, and it seems the newtype wrapper would allow that discussion in the future, so I'm happy to go with that

@alexfmpe
Copy link
Contributor

alexfmpe commented Nov 6, 2022

Oh that technically still breaks at least one package - as mentioned above, there's at least one use of Tip in the wild
https://github.com/jgm/doclayout/blob/ec0cdddeb828a71f8a78aedc8b2fff530534ef54/src/Text/DocLayout.hs#L1111

Once this PR is in shape, I should run another impact analysis

@treeowl
Copy link
Contributor

treeowl commented Nov 6, 2022

The newtype constructor would be available from the Internal module, so code using internals can be adjusted pretty easily.

@alexfmpe
Copy link
Contributor

Trying to write the wrappers with coerce gives me

src/Data/Set.hs:183:1-27: error:
    Data.Coerce: Can't be safely imported!
    The module itself isn't safe.
    |
183 | import Data.Coerce (coerce)
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^

Do we un-mark the module as safe, or write the wrappers the old way?
I assume the newtype wrapping/unwrapping would still be optimized away.

import Data.Set.Internal as S hiding (empty, insert, member)
import qualified Data.Set.Internal as S

empty :: Set a
empty = Set S.empty

insert :: Ord a => a -> Set a -> Set a
insert a (Set s) = Set $ S.insert a s

member :: Ord a => a -> Set a -> Bool
member a (Set s) = S.member a s

@treeowl
Copy link
Contributor

treeowl commented Jul 10, 2023

Switching to Trustworthy is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add NonEmpty containers, without imbalancing or massive code duplication
8 participants