-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix several bugs around the 'byteString' family of Builders #671
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -133,6 +133,7 @@ import Data.Semigroup (Semigroup(..)) | |
import Data.List.NonEmpty (NonEmpty(..)) | ||
|
||
import qualified Data.ByteString as S | ||
import qualified Data.ByteString.Unsafe as S | ||
import qualified Data.ByteString.Internal.Type as S | ||
import qualified Data.ByteString.Lazy.Internal as L | ||
import qualified Data.ByteString.Short.Internal as Sh | ||
|
@@ -796,24 +797,24 @@ ensureFree minFree = | |
| ope `minusPtr` op < minFree = return $ bufferFull minFree op k | ||
| otherwise = k br | ||
|
||
-- | Copy the bytes from a 'BufferRange' into the output stream. | ||
wrappedBytesCopyStep :: BufferRange -- ^ Input 'BufferRange'. | ||
-- | Copy the bytes from a 'S.StrictByteString' into the output stream. | ||
wrappedBytesCopyStep :: S.StrictByteString -- ^ Input 'S.StrictByteString'. | ||
-> BuildStep a -> BuildStep a | ||
wrappedBytesCopyStep (BufferRange ip0 ipe) k = | ||
go ip0 | ||
-- See Note [byteStringCopyStep and wrappedBytesCopyStep] | ||
wrappedBytesCopyStep bs0 k = | ||
go bs0 | ||
where | ||
go !ip (BufferRange op ope) | ||
go !bs@(S.BS ifp inpRemaining) (BufferRange op ope) | ||
| inpRemaining <= outRemaining = do | ||
copyBytes op ip inpRemaining | ||
S.unsafeWithForeignPtr ifp $ \ip -> copyBytes op ip inpRemaining | ||
let !br' = BufferRange (op `plusPtr` inpRemaining) ope | ||
k br' | ||
| otherwise = do | ||
copyBytes op ip outRemaining | ||
let !ip' = ip `plusPtr` outRemaining | ||
return $ bufferFull 1 ope (go ip') | ||
S.unsafeWithForeignPtr ifp $ \ip -> copyBytes op ip outRemaining | ||
let !bs' = S.unsafeDrop outRemaining bs | ||
return $ bufferFull 1 ope (go bs') | ||
where | ||
outRemaining = ope `minusPtr` op | ||
inpRemaining = ipe `minusPtr` ip | ||
|
||
|
||
-- Strict ByteStrings | ||
|
@@ -834,7 +835,7 @@ byteStringThreshold :: Int -> S.StrictByteString -> Builder | |
byteStringThreshold maxCopySize = | ||
\bs -> builder $ step bs | ||
where | ||
step bs@(S.BS _ len) !k br@(BufferRange !op _) | ||
step bs@(S.BS _ len) k br@(BufferRange !op _) | ||
| len <= maxCopySize = byteStringCopyStep bs k br | ||
| otherwise = return $ insertChunk op bs k | ||
|
||
|
@@ -848,21 +849,69 @@ byteStringThreshold maxCopySize = | |
byteStringCopy :: S.StrictByteString -> Builder | ||
byteStringCopy = \bs -> builder $ byteStringCopyStep bs | ||
|
||
{- | ||
Note [byteStringCopyStep and wrappedBytesCopyStep] | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
A Builder that copies the contents of an arbitrary ByteString needs a | ||
recursive loop, since the bytes to be copied might not fit into the | ||
first few chunk buffers provided by the driver. That loop is | ||
implemented in 'wrappedBytesCopyStep'. But we also have a | ||
non-recursive wrapper, 'byteStringCopyStep', which performs exactly | ||
the first iteration of that loop, falling back to 'wrappedBytesCopyStep' | ||
if a chunk boundary is reached before the entire ByteString is copied. | ||
|
||
This is very strange! Why do we do this? Perhaps mostly for | ||
historical reasons. But sadly, changing this to use a single | ||
recursive loop regresses the benchmark 'foldMap byteStringCopy' by | ||
about 30% as of 2024, in one of two ways: | ||
|
||
1. If the continuation 'k' is taken as an argument of the | ||
inner copying loop, it remains an unknown function call. | ||
So for each bytestring copied, that continuation must be | ||
entered later via a gen-apply function, which incurs dozens | ||
of cycles of extra overhead. | ||
2. If the continuation 'k' is lifted out of the inner copying | ||
loop, it becomes a free variable. And after a bit of | ||
inlining, there will be no unknown function call. But, if | ||
the continuation function has any free variables, these | ||
become free variables of the inner copying loop, which | ||
prevent the loop from floating out. (In the actual | ||
benchmark, the tail of the list of bytestrings to copy is | ||
such a free variable of the continuation.) As a result, | ||
the inner copying loop becomes a function closure object | ||
rather than a top-level function. And that means a new | ||
inner-copying-loop function-closure-object must be | ||
allocated on the heap for every bytestring copied, which | ||
is expensive. | ||
|
||
In theory, GHC's late-lambda-lifting pass can clean this up by | ||
abstracting over the problematic free variables. But for some | ||
unknown reason (perhaps a bug in ghc-9.10.1) this optimization | ||
does not fire on the relevant benchmark code, even with a | ||
sufficiently high value of -fstg-lift-lams-rec-args. | ||
|
||
|
||
|
||
Alternatively, it is possible to avoid recursion altogether by | ||
requesting that the next chunk be large enough to accommodate the | ||
entire remainder of the input when a chunk boundary is reached. | ||
But: | ||
* For very large ByteStrings, this may incur unwanted latency. | ||
* Large next-chunk-size requests have caused breakage downstream | ||
in the past. See also https://github.com/yesodweb/wai/issues/894 | ||
-} | ||
|
||
{-# INLINE byteStringCopyStep #-} | ||
byteStringCopyStep :: S.StrictByteString -> BuildStep a -> BuildStep a | ||
byteStringCopyStep (S.BS ifp isize) !k0 br0@(BufferRange op ope) | ||
-- Ensure that the common case is not recursive and therefore yields | ||
-- better code. | ||
| op' <= ope = do copyBytes op ip isize | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
touchForeignPtr ifp | ||
k0 (BufferRange op' ope) | ||
| otherwise = wrappedBytesCopyStep (BufferRange ip ipe) k br0 | ||
-- See Note [byteStringCopyStep and wrappedBytesCopyStep] | ||
byteStringCopyStep bs@(S.BS ifp isize) k br@(BufferRange op ope) | ||
| isize <= osize = do | ||
S.unsafeWithForeignPtr ifp $ \ip -> copyBytes op ip isize | ||
k (BufferRange op' ope) | ||
| otherwise = wrappedBytesCopyStep bs k br | ||
where | ||
osize = ope `minusPtr` op | ||
op' = op `plusPtr` isize | ||
ip = unsafeForeignPtrToPtr ifp | ||
ipe = ip `plusPtr` isize | ||
k br = do touchForeignPtr ifp -- input consumed: OK to release here | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not safe to put the |
||
k0 br | ||
|
||
-- | Construct a 'Builder' that always inserts the 'S.StrictByteString' | ||
-- directly as a chunk. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Builders are not allowed to do anything with their continuation arguments until they are done writing, including forcing them. (This one now has some test cases.)