-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
This could be considered a duplicate of #24916, but whereas that might be able to be closed via a workaround fix, this likely will need a breaking solution.
flate.Decompress
works as an example for this use case. It always needs to keep at least flate.history_len
bytes buffered in order to decompress data correctly. This is because the compressed data can refer back to previous bytes as a method of deduplication. Here's a simplified example:
a123<look 4 bytes back in the history and copy 4 bytes>
(a123
is the history of previously decompressed data, and the <...>
is what reading the next part of the compressed stream tells us to do)
That would get decompressed into
a123a123
If those 4 bytes of history don't exist/are incorrect when we read the <look 4 bytes back in the history and copy 4 bytes>
part, then there's no way to decompress the data correctly.
The Reader
interface mostly supports this use case, with the sole exception of rebase
, which fundamentally cannot be implemented safely for this use case. A concrete example can be found in #24916, where there's a call to
try rebase(r, r.buffer.len);
in Reader.peekDelimiterInclusive
, which is instructing the Reader
to throw away their entire buffer, which is incompatible with the "maintain some amount of history" use case.
Even if that particular call reduces its requirements, though, the problem can still always be caused, as there's nothing that prevents some implementation from requesting to rebase more than is possible from a window-maintaining implementation.
Possible solutions
I'll mention a few, but there are likely more:
- Change
rebase
to return the number of bytes that are now available, which could be less than the number requested- This would require changes at every call site of rebase, and would push detection/handling of any "not enough buffer capacity to free up that many bytes" error to the caller instead of the implementer. For example, in the
peekDelimiterInclusive
case, it would just discard the return value ofrebase
since its goal is just to rebase as much as possible
- This would require changes at every call site of rebase, and would push detection/handling of any "not enough buffer capacity to free up that many bytes" error to the caller instead of the implementer. For example, in the
- Introduce some sort of special value that means "rebase as much as possible"
- This is what
peekDelimiterInclusive
is actually trying to do, so giving some way to properly encode that intention would (partially) address the problem. The downside of this is that it doesn't make the problem impossible, as it's still possible for something torebase
with an impossible-to-satisfy length that's<= buffer.len
.
- This is what
- Encode this use case directly in the interface, e.g. add some sort of
maintained_buffered_bytes
field tostd.Io.Reader
- This might be too heavy-handed, since only specific implementations would have any reason to make this non-zero.
- Add a new possible error to
Reader.RebaseError
that encodes "requested capacity cannot be satisfied"- This would be similar to (1), but less precise, as callers that actually want to "rebase as much as possible" would then need to employ some sort of backoff strategy and rebase with smaller and smaller capacities until the error no longer occurs
- Remove the
capacity
parameter fromrebase
entirely, always make it rebase as much as possible- This is what all implementers of
rebase
withinstd
do anyway (capacity
is only used inassert
s). Callers would then need to check if there's enough space after therebase
call (r.buffer.len - r.end
) to do whatever it is they intend to do.
- This is what all implementers of
Personally, I lean towards (1) or (5), but I don't feel like I've fully thought through the implications yet.
What about Writer
?
Writer.VTable.rebase
includes a preserve
parameter in its signature, but that doesn't prevent the same problem from existing: there's nothing stopping something from calling rebase
with lengths that can't be satisfied. In practice, this might not be quite as pressing, though, since e.g. flate
compression doesn't rely on the window for compression correctness, only for compression quality--that is, if the full buffer is thrown away during compression, then the compression ratio will just temporarily decrease while the history window is being built back up.
In any case, whatever happens to Reader.VTable.rebase
should probably be applied to Writer.VTable.rebase
as well, as there might be use cases where Writer
also needs to be able to keep some minimum amount of data buffered at all times.