Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add (Int, +) finger trees #766

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

treeowl
Copy link
Contributor

@treeowl treeowl commented Feb 3, 2021

  • Export a type for (Int,+) finger trees.
  • Export more Data.Sequence internals.
  • Offer a module of Data.Sequence internals intended
    for external use, that should obey the PVP.
  • Remove Functor and Traversable instances for FingerTree and Node.
  • Remove the Generic1 instance for FingerTree.

@treeowl
Copy link
Contributor Author

treeowl commented Feb 3, 2021

@emilypi and @Lysxia, I'd especially appreciate if one of you would check my documentation for the splitting functions in Data.FingerTree.IntPlus.

@sjakobi
Copy link
Member

sjakobi commented Feb 3, 2021

@treeowl Could you say something about the motivation for this PR? Ideally create an issue, so we can discuss the "problem" separately from this implementation.

uncheckedSplit :: Sized a => Int -> FingerTree a -> UncheckedSplit a
uncheckedSplit i ft
| S.Split l m r <- S.splitTree i ft
= UncheckedSplit l m r
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this partial function, why not only export split and let users deal with partial patterns if they want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is whether that extra stuff will inline away. Certain functions are extremely sensitive to the performance of splitting tiny little sequences (e.g., 2–5 elements) where any little extra time/allocation can matter a lot. I'm not sure how hard it would be to rejigger Data.Sequence.Internal.splitTree to make sure that works out okay.

@treeowl
Copy link
Contributor Author

treeowl commented Feb 3, 2021

@treeowl Could you say something about the motivation for this PR? Ideally create an issue, so we can discuss the "problem" separately from this implementation.

What really triggered it for me was wanting to play with incremental quicksort for sequences. Doing that (reasonably) efficiently requires use of FingerTree (Seq a) (which is a lot like Seq (Seq a), but with different annotations and therefore different splitting behavior). And the simplest (not simple) approach requires something like the splitMap function, which we already use to implement zipWith and chunksOf.

In principle, it would be nice to expand from (Int, +), to any Int-based monoid. But I'm scared of the efficiency considerations of doing that, and I don't know of any reasonably nice interface that doesn't lean on the extremely non-portable Coercible machinery. That is, one could imagine

data FingerTree s a
class Sized s a | a -> s where size :: a -> s
(<|) :: (Coercible s Int, Monoid s, Sized s a) => a -> FingerTree s a -> FingerTree s a

but then we're relying on an additional specialization along with all that non-portable stuff. Youch. So I figured I'd start with something conservative.

* Export a type for `(Int,+)` finger trees.
* Export more `Data.Sequence` internals.
* Offer a module of `Data.Sequence` internals intended
  for external use, that should obey the PVP.
@Lysxia
Copy link
Contributor

Lysxia commented Feb 3, 2021

I was about to mention https://hackage.haskell.org/package/fingertree but if the goal is to implement "incremental quicksort" for Seq it does seem necessary to do this in containers and sufficient to keep it specialized to (Int, +).

@treeowl
Copy link
Contributor Author

treeowl commented Feb 3, 2021

I was about to mention https://hackage.haskell.org/package/fingertree but if the goal is to implement "incremental quicksort" for Seq it does seem necessary to do this in containers and sufficient to keep it specialized to (Int, +).

Well... not strictly necessary, I don't think, but the Ints are unpacked here (but not there). That seems especially important for the very short sequences. Also, the fingertree package has fallen behind in some optimizations.

@sjakobi
Copy link
Member

sjakobi commented Feb 3, 2021

So, do you intend to offer this incremental quicksort from a different package? Otherwise I don't understand why we need to enhance the public / stable API for this.

Or are you possibly talking about different sequence types than Seq here?

@treeowl
Copy link
Contributor Author

treeowl commented Feb 3, 2021 via email

Copy link
Member

@emilypi emilypi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good on first pass!

Unless i'm missing something (i'm not super familiar with this library's internals), there are a few improvements we can make to ergnomics here. For example, I cannot fromList [1..10]. I must fromList $ Elem <$> [1..10], since Sized has only two instances. The ability to debug splits would also be welcome.

Thoughts?


data Split a
= Split !(FingerTree a) a !(FingerTree a)
| EmptySplit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a debugging function that shows the split lying around? Would be useful to have.

@treeowl
Copy link
Contributor Author

treeowl commented Feb 9, 2021

This looks good on first pass!

Unless i'm missing something (i'm not super familiar with this library's internals), there are a few improvements we can make to ergnomics here. For example, I cannot fromList [1..10]. I must fromList $ Elem <$> [1..10], since Sized has only two instances. The ability to debug splits would also be welcome.

Thoughts?

That fromList "limitation" is kind of essential to this structure. But we can add more Sized instances. I believe this PR adds one for Seq, we can get one for Array (and UArray?), and I guess inefficient ones for lists and non-empty lists too. Then there are potential derived instances for various base newtypes. Did you check my logic/arithmetic for splitting? And do you have a better name for split?

Copy link
Contributor

@phadej phadej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot comment on this (not sure why you asked me to review). finger trees are unfamiliar structures to me.

anything internal that is not exported, please file a GitHub issue.
-}

module Data.Sequence.StableInternal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would Data.Sequence.Unsafe be better name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants