Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compactor: Upload Sparse Index Headers to Object Storage #10684

Open
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

dmwilson-grafana
Copy link
Contributor

@dmwilson-grafana dmwilson-grafana commented Feb 18, 2025

What this PR does

  • Adds an option -compactor.upload-sparse-index-headers which allows the compactor to upload sparse index headers to object storage. When this option is set to true, the compactor will build and upload sparse-index-headers after all other files in the block have been uploaded to object storage. When this option is set, it's a "best effort" process, the compactor won't exit an otherwise successful compaction job early if there's an error affecting the sparse-index-header upload.

  • To make this change, this PR moves the contents of pkg/storegateway/indexheader to pkg/storage/indexheader. Both StoreGateway and Compactor import pkg/storage/indexheader.

  • As implemented, sparse headers uploaded from the compactor use DefaultPostingOffsetInMemorySampling (1/32), not blocks-storage.bucket-store.posting-offsets-in-mem-sampling.

    • This PR adds the sampling rate to the sparse protos and sparse-index-header file, this allows stream readers to downsample from higher to lower sampling rates (e.g. 1/16 -> 1/32).
  • Adding an extra file to upload for each block will slow the comapctor a bit. In testing, enabling upload of sparse-index-headers added <100ms/block.

Which issue(s) this PR fixes or relates to

  • Compactor: store sparse index headers in object store #8166 - New store-gateway replicas don't have any sparse index headers right after starting up and must load the index header and create the sparse header from the data on disk before being able to serve requests for a given block. This leads to severe increases in latency after scaling out store-gateways.

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@56quarters
Copy link
Contributor

Before this is marked as ready for review, pkg/storage or somewhere under there is the directory that the index header code should live in instead of pkg/util

Copy link
Contributor

github-actions bot commented Feb 20, 2025

@dmwilson-grafana dmwilson-grafana marked this pull request as ready for review February 25, 2025 14:34
@dmwilson-grafana dmwilson-grafana requested review from tacole02 and a team as code owners February 25, 2025 14:34
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this. The logic looks sound, but i think we can move around where it sits

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this. I left small suggestions and pushed a change to the changelog because i think it got lost in rebasing

@dimitarvdimitrov
Copy link
Contributor

i realized we aren't monitoring the process of constructing and parsing the sparse headers. So we wouldn't know if the compactors suddenly start failing to construct or upload these sparse headers. We also wouldn't know if the store-gateways suddenly find them corrupted. If this happens the query path will silently degrade.

We should add some metrics to track failures to construct and uploading and to download and parse and alerts on those in case they keep failing for a long time. Do you want to give this a try in a follow-up PR?

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working through the comments

LGTM

Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @dmwilson-grafana !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants