Reader does not fully support the use case of keeping some amount of data buffered

This could be considered a duplicate of https://github.com/ziglang/zig/issues/24916, but whereas that might be able to be closed via a workaround fix, this likely will need a breaking solution.

---

`flate.Decompress` works as an example for this use case. It always needs to keep at least `flate.history_len` bytes buffered in order to decompress data correctly. This is because the compressed data can refer back to previous bytes as a method of deduplication. Here's a simplified example:

```
a123<look 4 bytes back in the history and copy 4 bytes>
```
(`a123` is the history of previously decompressed data, and the `<...>` is what reading the next part of the compressed stream tells us to do)

That would get decompressed into
```
a123a123
```

If those 4 bytes of history don't exist/are incorrect when we read the `<look 4 bytes back in the history and copy 4 bytes>` part, then there's no way to decompress the data correctly.

---

The `Reader` interface mostly supports this use case, with the sole exception of `rebase`, which fundamentally cannot be implemented safely for this use case. A concrete example can be found in https://github.com/ziglang/zig/issues/24916, where there's a call to

```
try rebase(r, r.buffer.len);
```

in `Reader.peekDelimiterInclusive`, which is instructing the `Reader` to throw away their entire buffer, which is incompatible with the "maintain some amount of history" use case.

Even if that particular call reduces its requirements, though, the problem can still always be caused, as there's nothing that prevents some implementation from requesting to rebase more than is possible from a window-maintaining implementation.

## Possible solutions

I'll mention a few, but there are likely more:

1. Change `rebase` to return the number of bytes that are now available, which could be less than the number requested
    - This would require changes at every call site of rebase, and would push detection/handling of any "not enough buffer capacity to free up that many bytes" error to the caller instead of the implementer. For example, in the `peekDelimiterInclusive` case, it would just discard the return value of `rebase` since its goal is just to rebase as much as possible
2. Introduce some sort of special value that means "rebase as much as possible"
    - This is what `peekDelimiterInclusive` is actually trying to do, so giving some way to properly encode that intention would (partially) address the problem. The downside of this is that it doesn't make the problem impossible, as it's still possible for something to `rebase` with an impossible-to-satisfy length that's `<= buffer.len`.
3. Encode this use case directly in the interface, e.g. add some sort of `maintained_buffered_bytes` field to `std.Io.Reader`
    - This might be too heavy-handed, since only specific implementations would have any reason to make this non-zero.
4. Add a new possible error to `Reader.RebaseError` that encodes "requested capacity cannot be satisfied"
    - This would be similar to (1), but less precise, as callers that actually want to "rebase as much as possible" would then need to employ some sort of backoff strategy and rebase with smaller and smaller capacities until the error no longer occurs
5. Remove the `capacity` parameter from `rebase` entirely, always make it rebase as much as possible
    - This is what all implementers of `rebase` within `std` do anyway (`capacity` is only used in `assert`s). Callers would then need to check if there's enough space after the `rebase` call (`r.buffer.len - r.end`) to do whatever it is they intend to do.

Personally, I lean towards (1) or (5), but I don't feel like I've fully thought through the implications yet.

## What about `Writer`?

`Writer.VTable.rebase` includes a `preserve` parameter in its signature, but that doesn't prevent the same problem from existing: there's nothing stopping something from calling `rebase` with lengths that can't be satisfied. In practice, this might not be quite as pressing, though, since e.g. `flate` compression doesn't *rely* on the window for compression *correctness*, only for compression quality--that is, if the full buffer is thrown away during compression, then the compression ratio will just temporarily decrease while the history window is being built back up.

In any case, whatever happens to `Reader.VTable.rebase` should probably be applied to `Writer.VTable.rebase` as well, as there might be use cases where `Writer` also *needs* to be able to keep some minimum amount of data buffered at all times.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reader does not fully support the use case of keeping some amount of data buffered #25103

Possible solutions

What about `Writer`?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Reader does not fully support the use case of keeping some amount of data buffered #25103

Description

Possible solutions

What about Writer?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What about `Writer`?