Skip to content

Reader does not fully support the use case of keeping some amount of data buffered #25103

@squeek502

Description

@squeek502

This could be considered a duplicate of #24916, but whereas that might be able to be closed via a workaround fix, this likely will need a breaking solution.


flate.Decompress works as an example for this use case. It always needs to keep at least flate.history_len bytes buffered in order to decompress data correctly. This is because the compressed data can refer back to previous bytes as a method of deduplication. Here's a simplified example:

a123<look 4 bytes back in the history and copy 4 bytes>

(a123 is the history of previously decompressed data, and the <...> is what reading the next part of the compressed stream tells us to do)

That would get decompressed into

a123a123

If those 4 bytes of history don't exist/are incorrect when we read the <look 4 bytes back in the history and copy 4 bytes> part, then there's no way to decompress the data correctly.


The Reader interface mostly supports this use case, with the sole exception of rebase, which fundamentally cannot be implemented safely for this use case. A concrete example can be found in #24916, where there's a call to

try rebase(r, r.buffer.len);

in Reader.peekDelimiterInclusive, which is instructing the Reader to throw away their entire buffer, which is incompatible with the "maintain some amount of history" use case.

Even if that particular call reduces its requirements, though, the problem can still always be caused, as there's nothing that prevents some implementation from requesting to rebase more than is possible from a window-maintaining implementation.

Possible solutions

I'll mention a few, but there are likely more:

  1. Change rebase to return the number of bytes that are now available, which could be less than the number requested
    • This would require changes at every call site of rebase, and would push detection/handling of any "not enough buffer capacity to free up that many bytes" error to the caller instead of the implementer. For example, in the peekDelimiterInclusive case, it would just discard the return value of rebase since its goal is just to rebase as much as possible
  2. Introduce some sort of special value that means "rebase as much as possible"
    • This is what peekDelimiterInclusive is actually trying to do, so giving some way to properly encode that intention would (partially) address the problem. The downside of this is that it doesn't make the problem impossible, as it's still possible for something to rebase with an impossible-to-satisfy length that's <= buffer.len.
  3. Encode this use case directly in the interface, e.g. add some sort of maintained_buffered_bytes field to std.Io.Reader
    • This might be too heavy-handed, since only specific implementations would have any reason to make this non-zero.
  4. Add a new possible error to Reader.RebaseError that encodes "requested capacity cannot be satisfied"
    • This would be similar to (1), but less precise, as callers that actually want to "rebase as much as possible" would then need to employ some sort of backoff strategy and rebase with smaller and smaller capacities until the error no longer occurs
  5. Remove the capacity parameter from rebase entirely, always make it rebase as much as possible
    • This is what all implementers of rebase within std do anyway (capacity is only used in asserts). Callers would then need to check if there's enough space after the rebase call (r.buffer.len - r.end) to do whatever it is they intend to do.

Personally, I lean towards (1) or (5), but I don't feel like I've fully thought through the implications yet.

What about Writer?

Writer.VTable.rebase includes a preserve parameter in its signature, but that doesn't prevent the same problem from existing: there's nothing stopping something from calling rebase with lengths that can't be satisfied. In practice, this might not be quite as pressing, though, since e.g. flate compression doesn't rely on the window for compression correctness, only for compression quality--that is, if the full buffer is thrown away during compression, then the compression ratio will just temporarily decrease while the history window is being built back up.

In any case, whatever happens to Reader.VTable.rebase should probably be applied to Writer.VTable.rebase as well, as there might be use cases where Writer also needs to be able to keep some minimum amount of data buffered at all times.

Metadata

Metadata

Assignees

No one assigned

    Labels

    standard libraryThis issue involves writing Zig code for the standard library.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions