Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix several bugs around the 'byteString' family of Builders #671

Merged

Conversation

clyring
Copy link
Member

@clyring clyring commented Apr 11, 2024

No description provided.

-- Ensure that the common case is not recursive and therefore yields
-- better code.
| op' <= ope = do copyBytes op ip isize
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

op' might overflow

op' = op `plusPtr` isize
ip = unsafeForeignPtrToPtr ifp
ipe = ip `plusPtr` isize
k br = do touchForeignPtr ifp -- input consumed: OK to release here
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not safe to put the touchForeignPtr in the continuation like this because the call site may consume one copied chunk and then hang without ever invoking that continuation.

@@ -850,19 +850,17 @@ byteStringCopy = \bs -> builder $ byteStringCopyStep bs

{-# INLINE byteStringCopyStep #-}
byteStringCopyStep :: S.StrictByteString -> BuildStep a -> BuildStep a
byteStringCopyStep (S.BS ifp isize) !k0 br0@(BufferRange op ope)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builders are not allowed to do anything with their continuation arguments until they are done writing, including forcing them. (This one now has some test cases.)

touchForeignPtr ifp
k0 (BufferRange op' ope)
| otherwise = wrappedBytesCopyStep (BufferRange ip ipe) k br0
-- What's the reasoning here, more concretely?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to elaborate on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard for me to elaborate. I thought about it for a while but still have no idea in what sense a non-recursive hot path was supposed to be "better" at the time this was written.

  • For small statically-known lengths, nowadays, we might benefit from specialized copyBytes code generation.
    • But copyBytes just lowered to an opaque foreign call when this comment was written.
    • This special code generation is not great until ghc#24519 is fixed.
    • Anyway, it'd be nice to somehow incur this extra branch only for literals if this is our reasoning.
  • Recursive stuff is much less likely to inline automatically.
    • The fact that we need inlining of b1 to happen to make a Builder concatenation b1 <> b2 fast is pretty ugly.
      • I think on my machine the reboxing of args and the unknown call incur around 15ns of overhead per builder. There can also be a bit more overhead if the continuation has to allocate a PAP.
      • At the time the Builder core stuff was written, it was found that unboxing the Builder args actually made things slower, for obscure RTS fast-call-pattern reasons. (I think it was about equal when I tried it a year or two ago.)
    • If this is really that important, inlining of the whole loop could be forced with an inner-go and an INLINE pragma. And that entire loop isn't bigger than this one unrolled iteration!

I suppose in light of the lack of clear reasoning I'm fine with deleting this comment and unifying byteStringCopyStep with wrappedBytesCopyStep.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also have to look at benchmarks, but we don't have any that really test this stuff right now. That changes in #569, but actually a change I wanted to make in that PR is blocked on this patch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out I was wrong. There is a relevant benchmark already, foldMap byteStringCopy, and getting rid of the byteStringCopyStep/wrappedBytesCopyStep distinction slows that benchmark down by about 30% for some very obscure reasons. I'll push a comment describing these reasons in a few minutes.

This whole business with the continuation-threading causing practically unpredictable performance issues with Builders has gotten very tiresome. Perhaps I will revisit #194 relatively soon.

This makes explicit the reasoning for in what sense
"ensur[ing] that the common case is not recursive" is expected to
possibly "yield[] better code."
@Bodigrim Bodigrim merged commit 4a41f7f into haskell:master Jul 18, 2024
24 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants