-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reporting: Serialize batches as they're constructed #1227
Conversation
No changes to the coverage.
HTML Report |
a4afcb3
to
5fdd4ad
Compare
95a12a0
to
b6d3c65
Compare
5fdd4ad
to
5956ccd
Compare
b6d3c65
to
a8e7dfe
Compare
Needs to be updated after rebasing #1226. Final commit is still ok-ish to review in the meantime, but ok to leave until after I'm done. |
Part of #1220. For scaling events, the size of a gzip-compressed batch is *much* less than the uncompressed version, so it's actually worth it to compress the JSON-encoded contents of the batch as we go, instead of buffering the events as a []ScalingEvent in memory. **Implementation** This change primarily adds the reporting.BatchBuilder[E] type (and some implementations), which abstracts over the process of gradually building the serialized bytes from a batch of events. Two implementations of BatchBuilder are provided in pkg/reporting - one for large JSON arrays, and one for the JSON lines format. To allow continuous gzip compression, both of these JSON batch builders are constructed with a new 'IOBuffer' type, which is just an io.Writer with a method to collect the bytes at the end. There's implementations of IOBuffer for normal bytes and for a gzip-compressed buffer. All that together means that batches of billing and scaling events should now be GZIP-compressed (where possible) as events are added to the batches, dramatically reducing the memory footprint.
4a2bfd6
to
4062128
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(non-blocking) I don't really like panics when handling errors from io.Writer
, but should be fine as long as we know that byte buffer and gzip implementation should not return errors.
That's fair — my thinking is that there isn't really anything we can do if we get an error there: it's a buffer in memory, so retrying won't help - should we just drop the events? IMO it's kind of better to panic so that we bubble the error up into something more noisy, hoping that restarting the autoscaler-agent will resolve the issue. Probably worthwhile to at least clarify that in the code — WDYT? |
There are always two options:
In terms of handling, it's true that we can't do much here. The only thing we can do is to drop the event and collect context about the error (what was the event, what was the write that failed, etc). In this particular case, I'd probably changed the definition of
Then, I'd propagated the error up, probably up to But for now we know that there is only |
Part of #1220. For scaling events, the size of a gzip-compressed batch is *much* less than the uncompressed version, so it's actually worth it to compress the JSON-encoded contents of the batch as we go, instead of buffering the events as a []ScalingEvent in memory. **Implementation** This change primarily adds the reporting.BatchBuilder[E] type (and some implementations), which abstracts over the process of gradually building the serialized bytes from a batch of events. Two implementations of BatchBuilder are provided in pkg/reporting - one for large JSON arrays, and one for the JSON lines format. To allow continuous gzip compression, both of these JSON batch builders are constructed with a new 'IOBuffer' type, which is just an io.Writer with a method to collect the bytes at the end. There's implementations of IOBuffer for normal bytes and for a gzip-compressed buffer. All that together means that batches of billing and scaling events should now be GZIP-compressed (where possible) as events are added to the batches, dramatically reducing the memory footprint. Signed-off-by: Oleg Vasilev <oleg@neon.tech>
Part of #1220.
For scaling events, the size of a gzip-compressed batch is much less than the uncompressed version, so it's actually worth it to compress the JSON-encoded contents of the batch as we go, instead of buffering the events as a
[]ScalingEvent
in memory.Implementation
This change primarily adds the
reporting.BatchBuilder[E]
type (and some implementations), which abstracts over the process of gradually building the serialized bytes from a batch of events.Two implementations of
BatchBuilder
are provided inpkg/reporting
- one for large JSON arrays, and one for the JSON lines format.To allow continuous gzip compression, both of these JSON batch builders are constructed with a new
IOBuffer
type, which is just anio.Writer
with a method to collect the bytes at the end. There's implementations ofIOBuffer
for normal bytes and for a gzip-compressed buffer.All that together means that batches of billing and scaling events should now be GZIP-compressed (where possible) as events are added to the batches, dramatically reducing the memory footprint.
Note: This PR builds on #1226, and must not be merged before it.