Skip to content

Support writing to a specific pebble cache partition #8899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

maggie-lou
Copy link
Contributor

No description provided.

@maggie-lou
Copy link
Contributor Author

@bduffany @jdhollen @tylerwilliams What do you guys think about this approach? I want to support writing snapshots to a special partition in the cacheproxy

The cache config might look something like: https://github.com/buildbuddy-io/buildbuddy-internal/compare/ss_partition_ex?expand=1

@tylerwilliams
Copy link
Member

First reaction: disgust. Second reaction: yeah ok, that seems pretty reasonable? I suppose an alternative is to convey this information in a side channel somehow (metadata header or something) but I feel like being explicit about it is good.

@bduffany
Copy link
Member

bduffany commented Apr 9, 2025

First reaction: disgust. Second reaction: yeah ok, that seems pretty reasonable? I suppose an alternative is to convey this information in a side channel somehow (metadata header or something) but I feel like being explicit about it is good.

I kind of had this same reaction, but am not so opposed to using gRPC headers for this, since the partition is already driven by gRPC header-derived info (group ID). I think it would be nice to keep the ResourceName proto 1-1 mappable to Bazel's concept of resource name defined in the REAPI, which is a property that I think would be broken by this change and might have some weird implications?

In terms of explicit vs implicit I would also add that optional proto fields are almost equally explicit as optional gRPC headers in terms of how they are enforced, but maybe not as obvious from a documentation perspective.

@maggie-lou
Copy link
Contributor Author

I kind of had this same reaction, but am not so opposed to using gRPC headers for this, since the partition is already driven by gRPC header-derived info (group ID). I think it would be nice to keep the ResourceName proto 1-1 mappable to Bazel's concept of resource name defined in the REAPI, which is a property that I think would be broken by this change and might have some weird implications?

In terms of explicit vs implicit I would also add that optional proto fields are almost equally explicit as optional gRPC headers in terms of how they are enforced, but maybe not as obvious from a documentation perspective.

Okay you've convinced me. If a non-internal client wanted to write to a specific partition, using a header would be a lot easier than changing the bytestream API. I'll add the partition override to the logging context, so we can at least detect it in logs

@maggie-lou maggie-lou marked this pull request as ready for review April 9, 2025 22:04
@iain-macdonald
Copy link
Contributor

Another option instead of this PR and #8868 would be to just run an internal-only app statefulset in the baremetal cluster and point the workflow executors at that. Did we already consider that / would it work?

@maggie-lou
Copy link
Contributor Author

Another option instead of this PR and #8868 would be to just run an internal-only app statefulset in the baremetal cluster and point the workflow executors at that. Did we already consider that / would it work?

Some pros and cons to that approach. Pros are that it simplifies the code and we could scale workflow-specific infrastructure separately. Cons are that we have to manage more microservices, which makes the release longer, creates more orchestration overhead, makes things harder to setup for on-prem customers, and is more expensive. I know @tylerwilliams would prefer not to spin up a new statefulset for those reasons. My thinking was that there's not a strong enough reason to have a workflow-specific stateful set yet

Copy link
Member

@bduffany bduffany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a partition would mean that the other partitions have less space - do we need to run more proxies to compensate? also, the effective space available for workflow snapshots would be reduced I think; wonder if there's any way we could estimate how it would affect snapshot hit rates esp. for orgs that run workflows less frequently?

@iain-macdonald
Copy link
Contributor

Adding a partition would mean that the other partitions have less space - do we need to run more proxies to compensate? also, the effective space available for workflow snapshots would be reduced I think; wonder if there's any way we could estimate how it would affect snapshot hit rates esp. for orgs that run workflows less frequently?

We have plenty of proxy disk capacity in US-SJC right now. The proxies have been running for a few months and the caches are only about 35% full, so we should keep an eye on this, but I'm not too concerned about disk capacity issues for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants