Support writing to a specific pebble cache partition #8899

maggie-lou · 2025-04-08T22:40:06Z

No description provided.

maggie-lou · 2025-04-08T22:44:06Z

@bduffany @jdhollen @tylerwilliams What do you guys think about this approach? I want to support writing snapshots to a special partition in the cacheproxy

The cache config might look something like: https://github.com/buildbuddy-io/buildbuddy-internal/compare/ss_partition_ex?expand=1

tylerwilliams · 2025-04-08T23:02:27Z

First reaction: disgust. Second reaction: yeah ok, that seems pretty reasonable? I suppose an alternative is to convey this information in a side channel somehow (metadata header or something) but I feel like being explicit about it is good.

bduffany · 2025-04-09T15:47:12Z

First reaction: disgust. Second reaction: yeah ok, that seems pretty reasonable? I suppose an alternative is to convey this information in a side channel somehow (metadata header or something) but I feel like being explicit about it is good.

I kind of had this same reaction, but am not so opposed to using gRPC headers for this, since the partition is already driven by gRPC header-derived info (group ID). I think it would be nice to keep the ResourceName proto 1-1 mappable to Bazel's concept of resource name defined in the REAPI, which is a property that I think would be broken by this change and might have some weird implications?

In terms of explicit vs implicit I would also add that optional proto fields are almost equally explicit as optional gRPC headers in terms of how they are enforced, but maybe not as obvious from a documentation perspective.

maggie-lou · 2025-04-09T20:32:19Z

I kind of had this same reaction, but am not so opposed to using gRPC headers for this, since the partition is already driven by gRPC header-derived info (group ID). I think it would be nice to keep the ResourceName proto 1-1 mappable to Bazel's concept of resource name defined in the REAPI, which is a property that I think would be broken by this change and might have some weird implications?

In terms of explicit vs implicit I would also add that optional proto fields are almost equally explicit as optional gRPC headers in terms of how they are enforced, but maybe not as obvious from a documentation perspective.

Okay you've convinced me. If a non-internal client wanted to write to a specific partition, using a header would be a lot easier than changing the bytestream API. I'll add the partition override to the logging context, so we can at least detect it in logs

enterprise/server/backends/pebble_cache/pebble_cache.go

iain-macdonald · 2025-04-15T17:22:41Z

Another option instead of this PR and #8868 would be to just run an internal-only app statefulset in the baremetal cluster and point the workflow executors at that. Did we already consider that / would it work?

maggie-lou · 2025-04-15T18:03:04Z

Another option instead of this PR and #8868 would be to just run an internal-only app statefulset in the baremetal cluster and point the workflow executors at that. Did we already consider that / would it work?

Some pros and cons to that approach. Pros are that it simplifies the code and we could scale workflow-specific infrastructure separately. Cons are that we have to manage more microservices, which makes the release longer, creates more orchestration overhead, makes things harder to setup for on-prem customers, and is more expensive. I know @tylerwilliams would prefer not to spin up a new statefulset for those reasons. My thinking was that there's not a strong enough reason to have a workflow-specific stateful set yet

bduffany

Adding a partition would mean that the other partitions have less space - do we need to run more proxies to compensate? also, the effective space available for workflow snapshots would be reduced I think; wonder if there's any way we could estimate how it would affect snapshot hit rates esp. for orgs that run workflows less frequently?

iain-macdonald · 2025-04-17T17:00:25Z

Adding a partition would mean that the other partitions have less space - do we need to run more proxies to compensate? also, the effective space available for workflow snapshots would be reduced I think; wonder if there's any way we could estimate how it would affect snapshot hit rates esp. for orgs that run workflows less frequently?

We have plenty of proxy disk capacity in US-SJC right now. The proxies have been running for a few months and the caches are only about 35% full, so we should keep an eye on this, but I'm not too concerned about disk capacity issues for now.

Support writing to a specific pebble cache partition

e9c8c0e

Use context headers instead

b3bdf96

maggie-lou marked this pull request as ready for review April 9, 2025 22:04

maggie-lou requested review from tylerwilliams, vadimberezniker, bduffany and jdhollen April 9, 2025 22:04

bduffany reviewed Apr 11, 2025

View reviewed changes

enterprise/server/backends/pebble_cache/pebble_cache.go Outdated Show resolved Hide resolved

enterprise/server/backends/pebble_cache/pebble_cache.go Outdated Show resolved Hide resolved

maggie-lou requested review from bduffany and iain-macdonald April 15, 2025 16:28

Add tests and log partition override when set

1f263b6

maggie-lou force-pushed the special_partition branch from cf7a642 to 1f263b6 Compare April 15, 2025 16:35

bduffany reviewed Apr 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support writing to a specific pebble cache partition #8899

Support writing to a specific pebble cache partition #8899

maggie-lou commented Apr 8, 2025

maggie-lou commented Apr 8, 2025

tylerwilliams commented Apr 8, 2025

bduffany commented Apr 9, 2025 •

edited

Loading

maggie-lou commented Apr 9, 2025

iain-macdonald commented Apr 15, 2025

maggie-lou commented Apr 15, 2025

bduffany left a comment

iain-macdonald commented Apr 17, 2025

Support writing to a specific pebble cache partition #8899

Are you sure you want to change the base?

Support writing to a specific pebble cache partition #8899

Conversation

maggie-lou commented Apr 8, 2025

maggie-lou commented Apr 8, 2025

tylerwilliams commented Apr 8, 2025

bduffany commented Apr 9, 2025 • edited Loading

maggie-lou commented Apr 9, 2025

iain-macdonald commented Apr 15, 2025

maggie-lou commented Apr 15, 2025

bduffany left a comment

Choose a reason for hiding this comment

iain-macdonald commented Apr 17, 2025

bduffany commented Apr 9, 2025 •

edited

Loading