14 Sep 20:28

treid314

df038d7

mimir-2.3.0-rc.2 Pre-release

Pre-release

Changes since 2.3.0-rc.0

This release contains 33 contributions from 9 authors. Thank you!

Note: We tagged a 2.3.0-rc.1 but found a panic in the alertmanager before publishing the 2.3.0-rc.1 pre-release. With 2.3.0-rc.2 we have included the fix for the alertmanager and created a new tag and release candidate.

2.3.0-rc.2

Grafana Mimir

[BUGFIX] Alertmanager: revert upstream alertmananger to v0.24.0 to fix panic when unmarshalling email headers #2924 #2925

2.3.0-rc.1

Grafana Mimir

[CHANGE] Distributor: if forwarding rules are used to forward samples, exemplars are now removed from the request #2725
[CHANGE] Ingester: experimental -blocks-storage.tsdb.new-chunk-disk-mapper has been removed, new chunk disk mapper is now always used, and is no longer marked experimental. Default value of -blocks-storage.tsdb.head-chunks-write-queue-size has changed to 1000000, this enables async chunk queue by default, which leads to improved latency on the write path when new chunks are created in ingesters. #2762
[CHANGE] Ingester: removed deprecated -blocks-storage.tsdb.isolation-enabled option. TSDB-level isolation is now always disabled in Mimir. #2782
[CHANGE] Compactor: -compactor.partial-block-deletion-delay must either be set to 0 (to disable partial blocks deletion) or a value higher than 4h. #2787
[CHANGE] Query-frontend: CLI flag -query-frontend.align-querier-with-step has been deprecated. Please use -query-frontend.align-queries-with-step instead. #2840
[CHANGE] Distributor: change the default value of -distributor.remote-timeout to 2s from 20s and -distributor.forwarding.request-timeout to 2s from 10s to improve distributor resource usage when ingesters crash. #2728
[FEATURE] Introduced an experimental anonymous usage statistics tracking (disabled by default), to help Mimir maintainers make better decisions to support the open source community. The tracking system anonymously collects non-sensitive, non-personally identifiable information about the running Mimir cluster, and is disabled by default. #2643 #2662 #2685 #2732 #2733 #2735
[FEATURE] Introduced an experimental deployment mode called read-write and running a fully featured Mimir cluster with three components: write, read and backend. The read-write deployment mode is a trade-off between the monolithic mode (only one component, no isolation) and the microservices mode (many components, high isolation). #2754 #2838
[ENHANCEMENT] Distributor: Add cortex_distributor_query_ingester_chunks_deduped_total and cortex_distributor_query_ingester_chunks_total metrics for determining how effective ingester chunk deduplication at query time is. #2713
[ENHANCEMENT] Upgrade Docker base images to alpine:3.16.2. #2729
[ENHANCEMENT] Ruler: Add <prometheus-http-prefix>/api/v1/status/buildinfo endpoint. #2724
[ENHANCEMENT] Querier: Ensure all queries pulled from query-frontend or query-scheduler are immediately executed. The maximum workers concurrency in each querier is configured by -querier.max-concurrent. #2598
[ENHANCEMENT] Distributor: Add cortex_distributor_received_requests_total and cortex_distributor_requests_in_total metrics to provide visiblity into appropriate per-tenant request limits. #2770
[ENHANCEMENT] Distributor: Add single forwarding remote-write endpoint for a tenant (forwarding_endpoint), instead of using per-rule endpoints. This takes precendence over per-rule endpoints. #2801
[ENHANCEMENT] Added err-mimir-distributor-max-write-message-size to the errors catalog. #2470
[ENHANCEMENT] Add sanity check at startup to ensure the configured filesystem directories don't overlap for different components. #2828
[ENHANCEMENT] Go: updated to go 1.19.1. #2637
[BUGFIX] Ruler: fix not restoring alerts' state at startup. #2648
[BUGFIX] Ingester: Fix disk filling up after restarting ingesters with out-of-order support disabled while it was enabled before. #2799
[BUGFIX] Memberlist: retry joining memberlist cluster on startup when no nodes are resolved. #2837
[BUGFIX] Query-frontend: fix incorrect mapping of http status codes 413 to 500 when request is too large. #2819
[BUGFIX] Ruler: fix panic when ruler.external_url is explicitly set to an empty string ("") in YAML. #2915

Mixin

[CHANGE] Dashboards: remove the "Cache - Latency (old)" panel from the "Mimir / Queries" dashboard. #2796
[FEATURE] Dashboards: added support to experimental read-write deployment mode. #2780
[ENHANCEMENT] Dashboards: Updated the Writes dashboard to account for samples ingested via the new OTLP ingestion endpoint. #2919
[ENHANCEMENT] Dashboards: added support to query-tee in front of ruler-query-frontend in the "Remote ruler reads" dashboard. #2761
[ENHANCEMENT] Dashboards: Introduce support for baremetal deployment, setting deployment_type: 'baremetal' in the mixin _config. #2657
[ENHANCEMENT] Dashboards: use timeseries panel to show exemplars. #2800
[ENHANCEMENT] Dashboards: Include per-tenant request rate in "Tenants" dashboard. #2874
[ENHANCEMENT] Dashboards: Include inflight object store requests in "Reads" dashboard. #2914
[BUGFIX] Dashboards: stop setting 'interval' in dashboards; it should be set on your datasource. #2802

Jsonnet

[ENHANCEMENT] Upgrade memcached image tag to memcached:1.6.16-alpine. #2740
[ENHANCEMENT] Added $._config.configmaps and $._config.runtime_config_files to make it easy to add new configmaps or runtime config file to all components. #2748

Mimirtool

[BUGFIX] Version checking no longer prompts for updating when already on latest version. #2723

Query-tee

[CHANGE] Renamed CLI flag -server.service-port to -server.http-service-port. #2683
[CHANGE] Renamed metric cortex_querytee_request_duration_seconds to cortex_querytee_backend_request_duration_seconds. Metric cortex_querytee_request_duration_seconds is now reported without label backend. #2683
[ENHANCEMENT] Added HTTP over gRPC support to query-tee to allow testing gRPC requests to Mimir instances. #2683

Mimir Continuous Test

[ENHANCEMENT] Added basic authentication and bearer token support for when Mimir is behind a gateway authenticating the calls. #2717

Documentation

Full Changelog: mimir-2.3.0-rc0...mimir-2.3.0-rc.2

Assets 54

25 Aug 17:00

treid314

mimir-2.3.0-rc0

3292627

mimir-2.3.0-rc0 Pre-release

Pre-release

This release contains 333 PRs from 39 authors. Thank you!

Grafana Mimir version 2.3 release notes

Grafana Labs is excited to announce version 2.3 of Grafana Mimir, the most scalable, most performant open source time series database in the world.

The highlights that follow include the top features, enhancements, and bugfixes in this release. If you are upgrading from Grafana Mimir 2.2, there is upgrade-related information as well.
For the complete list of changes, see the Changelog.

Features and enhancements

Ingest metrics in OpenTelemetry format:
This release of Grafana Mimir introduces experimental support for ingesting metrics from the OpenTelemetry Collector's otlphttp exporter. This adds a second ingestion option for users of the OTel Collector; Mimir was already compatible with the prometheusremotewrite exporter. For more information, please see Configure OTel Collector.
Increased instant query performance:
Grafana Mimir now supports splitting instant queries by time. This allows it to better parallelize execution of instant queries and therefore return results faster. At present, splitting is only supported for a subset of instant queries, which means not all instant queries will see a speedup. This feature is being released as experimental and is disabled by default. It can be enabled by setting -query-frontend.split-instant-queries-by-interval.
Tenant federation for metadata queries:
Users with tenant federation enabled could previously issue instant queries, range queries, and exemplar queries to multiple tenants at once and receive a single aggregated result. With Grafana Mimir 2.3, we've added tenant federation support to the /api/v1/metadata endpoint as well.
Simpler object storage configuration:
Users can now configure block, alertmanager, and ruler storage all at once with the common YAML config option key (or -common.storage.* CLI flags). By centralizing your object storage configuration in one place, this enhancement makes configuration faster and less error prone. Users can still individually configure storage for each of these components if they desire. For more information, see the Common Configurations.
DEB and RPM packages for Mimir:
Starting with version 2.3, we're publishing deb and rpm files for Grafana Mimir, which will make installing and running it on Debian or RedHat-based linux systems much easier. Thank you to community contributor wilfriedroset for your work to implement this!
Import historic data to Grafana Mimir:
Users can now backfill time series data from their existing Prometheus or Cortex installation into Mimir using mimirtool, making it possible to migrate to Grafana Mimir without losing your existing metrics data. This support is still considered experimental and does not work for data stored in Thanos yet. To learn more about this feature, please see mimirtool backfill and Configure TSDB block upload
New Helm chart minor release: The Mimir Helm chart is the best way to install Mimir on Kubernetes. As part of the Mimir 2.3 release, we’re also releasing version 3.1 of the Mimir Helm chart. Notable enhancements follow. For the full list of changes, see the Helm chart changelog.
- We've upgraded the MinIO subchart dependency from a deprecated chart to the supported one. This creates a breaking change in how the administrator password is set. However, as the built-in MinIO is not a recommended object store for production use cases, this change did not warrant a new major version of the Mimir Helm chart.
- The backfill API endpoints for importing historic time series data are now exposed on the Nginx gateway.
- Nginx now sets the value of the X-Scope-OrgID header equal to the value of Mimir's no_auth_tenant parameter by default. The previous release had set the value of X-Scope-OrgID to anonymous by default which complicated the process of migrating to Mimir.
- Memberlist now uses DNS service-discovery by default, which should decrease startup time for large Mimir clusters.

Upgrade considerations

In Grafana Mimir 2.3 we have removed the following previously deprecated configuration options:

The extend_writes parameter in the distributor YAML configuration and -distributor.extend-writes CLI flag have been removed.
The active_series_custom_trackers parameter has been removed from the YAML configuration. It had already been moved to the runtime configuration. See #1188 for details.

With Grafana Mimir 2.3 we have also updated the default value for -distributor.ha-tracker.max-clusters to 100 to provide Denial-of-Service protection. Previously -distributor.ha-tracker.max-clusters was unlimited by default which could allow a tenant with HA Dedupe enabled to overload the HA tracker with __cluster__ label values that could cause the HA Dedupe database to fail.

Bug fixes

PR 2447: Fix incorrect mapping of http status codes 429 to 500 when the request queue is full in the query-frontend. This corrects behavior in the query-frontend where a 429 "Too Many Outstanding Requests" error (a retriable error) from a querier was incorrectly returned as a 500 system error (an unretriable error).
PR 2505: The Memberlist key-value (KV) store now tries to "fast-join" the cluster to avoid serving an empty KV store. This fix addresses the confusing "empty ring" error response and the error log message "ring doesn't exist in KV store yet" emitted by services when there are other members present in the ring when a service starts. Those using other key-value store options (e.g., consul, etcd) are not impacted by this bug.
PR 2289: The "List Prometheus rules" API endpoint of the Mimir Ruler component is no longer blocked while rules are being synced. This means users can now list rules while syncing larger rule sets.

Changelog since 2.2

2.3.0-rc.0

Grafana Mimir

[CHANGE] Ingester: Added user label to ingester metric cortex_ingester_tsdb_out_of_order_samples_appended_total. On multitenant clusters this helps us find the rate of appended out-of-order samples for a specific tenant. #2493
[CHANGE] Compactor: delete source and output blocks from local disk on compaction failed, to reduce likelihood that subsequent compactions fail because of no space left on disk. #2261
[CHANGE] Ruler: Remove unused CLI flags -ruler.search-pending-for and -ruler.flush-period (and their respective YAML config options). #2288
[CHANGE] Successful gRPC requests are no longer logged (only affects internal API calls). #2309
[CHANGE] Add new -*.consul.cas-retry-delay flags. They have a default value of 1s, while previously there was no delay between retries. #2309
[CHANGE] Store-gateway: Remove the experimental ability to run requests in a dedicated OS thread pool and associated CLI flag -store-gateway.thread-pool-size. #2423
[CHANGE] Memberlist: disabled TCP-based ping fallback, because Mimir already uses a custom transport based on TCP. #2456
[CHANGE] Change default value for -distributor.ha-tracker.max-clusters to 100 to provide a DoS protection. #2465
[CHANGE] Experimental block upload API exposed by compactor has changed: Previous /api/v1/upload/block/{block} endpoint for starting block upload is now /api/v1/upload/block/{block}/start, and previous endpoint /api/v1/upload/block/{block}?uploadComplete=true for finishing block upload is now /api/v1/upload/block/{block}/finish. New API endpoint has been added: /api/v1/upload/block/{block}/check. #2486 #2548
[CHANGE] Compactor: changed -compactor.max-compaction-time default from 0s (disabled) to 1h. When compacting blocks for a tenant, the compactor will move to compact blocks of another tenant or re-plan blocks to compact at least every 1h. #2514
[CHANGE] Distributor: removed previously deprecated extend_writes (see #1856) YAML key and -distributor.extend-writes CLI flag from the distributor config. #2551
[CHANGE] Ingester: removed previously deprecated active_series_custom_trackers (see #1188) YAML key from the ingester config. #2552
[CHANGE] The tenant ID __mimir_cluster is reserved by Mimir and not allowed to store metrics. #2643
[CHANGE] Purger: removed the purger component and moved its API endpoints /purger/delete_tenant and /purger/delete_tenant_status to the compactor at /compactor/delete_tenant and /compactor/delete_tenant_status. The new endpoints on the compactor are stable. #2644
[CHANGE] Memberlist: Change the leave timeout duration (-memberlist.leave-timeout duration) from 5s to 20s and connection timeout (`-memberlist.pac...

Contributors

nervo, marctc, and 7 other contributors

Assets 56

21 Jul 17:35

krajorama

mimir-2.2.0

65344e2

2.2.0

Grafana Labs is excited to announce version 2.2 of Grafana Mimir, the most scalable, most performant open source time series database in the world.

The highlights that follow include the top features, enhancements, and bugfixes in this release. If you are upgrading from Grafana Mimir 2.1, there is upgrade-related information as well.
For the complete list of changes, see the Changelog.

This release contains 214 contributions from 32 authors. Thank you!

Features and enhancements

Support for ingesting out-of-order samples: Grafana Mimir includes new, experimental support for ingesting out-of-order samples.
This support is configurable, and it allows you to set how far out-of-order Mimir accepts samples on a per-tenant basis.
This feature still needs additional testing; we do not recommend using it in a production environment.
For more information, see Configuring out-of-order samples ingestion
Improved error messages: The error messages that Mimir reports are more human readable, and the messages include error codes that are easily searchable.
For error descriptions, see the Grafana Mimir runbooks’ Errors catalog.
Configurable prefix for object storage: Mimir can now store block data, rules, and alerts in one bucket, with each under its own user-defined prefix, rather than requiring one bucket for each.
You can configure the storage prefix by using -<storage>.storage-prefix option for corresponding storage: ruler-storage, alertmanager-storage or blocks-storage.
Store-gateway performance optimization
The store-gateway can now pre-populate the file system cache when memory-mapping index-header files.
This avoids the store-gateway from appearing to be stuck while loading index-headers.
This feature is experimental and disabled by default; enable it using the flag -blocks-storage.bucket-store.index-header.map-populate-enabled.
Faster ingester startup: Ingesters now replay their WALs (write ahead logs) about 50% faster, and they also re-join the ring sooner under some conditions.
Helm Chart improvements: The Mimir Helm chart is the best way to install Mimir on Kubernetes. As part of the Mimir 2.2 release, we're also releasing version 3.0 of the Helm chart. Notable enhancements follow. For the full list of changes, see the Helm chart changelog.
- The Helm chart now supports OpenShift.
- The Helm chart can now easily deploy Grafana Agent in order to scrape metrics and logs from all Mimir pods, and ship them to a remote store, which makes it easier to monitor the health of your Mimir installation. For more information, see Collecting metrics and logs from Grafana Mimir.
- The Helm chart now enables multi-tenancy by default. This makes it easy for you to add tenants as you grow your cluster. You can take advantage of Mimir's per-tenant quality-of-service features, which improves stability and resilience at high scale. To learn more about how multi-tenancy in Mimir works, see Grafana Mimir authorization and authentication. This change is backwards-compatible. To read about how we implemented this, see #2117.
- We have significantly improved the configuration experience for the Helm chart, and here are a few of the most salient changes:
  - We've added an extraEnvFrom capability to all Mimir services to enable you to inject secrets via environment variables.
  - We've made it possible to globally set environment variables and inject secrets across all pods in the chart using global.extraEnv and global.extraEnvFrom. Note that the memcached and minio pods are not included.
  - We've switched the default storage of the Mimir configuration from a Secret to a ConfigMap, which makes it easier to quickly see the differences between your Mimir configurations between upgrades. We especially like the Helm diff plugin for this purpose.
  - We've added a structuredConfig option, which allows you to overwrite specific key-value pairs in the mimir.config template, which saves you from having to maintain the entire mimir.config in your own values.yaml file.
  - We've added the ability to create global pod annotations. This unlocks the ability to trigger a restart of all services in response to a single event, such as the update of the secret containing Mimir's storage credentials.
- We've set the chart to disable -ingester.ring.unregister-on-shutdown and -distributor.extend-writes, for a smoother upgrade experience. Rolling restarts of ingesters are now less likely to cause spikes in resource usage.
- We've improved the documentation for the Helm chart by adding a Getting started with Mimir using the Helm chart.
- We've added a smoke test for your Mimir cluster to help catch errors immediately after you install or upgrade Mimir via the Helm chart.

Upgrade considerations

All deprecated API endpoints that are under /api/v1/rules* and /prometheus/rules* have now been removed from the ruler component in favor of identical endpoints that use the prefix /prometheus/config/v1/rules*.

In Grafana Mimir 2.2, we have updated default values and some parameters to give you a better out-of-the-box experience:

Message size limits for gRPC messages that are exchanged between internal Mimir components have increased to 100 MiB from 4 MiB.
This helps to avoid internal server errors when pushing or querying large data.
The -blocks-storage.bucket-store.ignore-blocks-within parameter changed from 0 to 10h.
The default value of -querier.query-store-after changed from 0 to 12h.
For most-recent data, both changes improve query performance by querying only the ingesters, rather than object storage.
The option -querier.shuffle-sharding-ingesters-lookback-period has been deprecated.
If you previously changed this option from its default of 0s, set -querier.shuffle-sharding-ingesters-enabled to true and specify the lookback period by setting the -querier.query-ingesters-within option.
The -memberlist.abort-if-join-fails parameter now defaults to false.
When Mimir is using memberlist as the backend store for its hash ring, and it fails to join the memberlist cluster, Mimir no longer aborts startup by default.

If you have used a previous version of the Mimir Helm chart, you must address some of the chart's breaking changes before upgrading to helm chart version 3.0. For a detailed information about how to do this, see Upgrade the Grafana Mimir Helm chart from version 2.1 to 3.0.

Bug fixes

PR 1883: Fixed a bug that caused the query-frontend and querier to crash when they received a user query with a special regular expression label matcher.
PR 1933: Fixed a bug in the ingester ring page, which showed incorrect status of entries in the ring.
PR 2090: Ruler in remote rule evaluation mode now applies the timeout correctly. Previously the ruler could get stuck forever, which halted rule evaluation.
PR 2036: Fixed panic at startup when Mimir is running in monolithic mode and query sharding is enabled.

Changelog

2.2.0

Grafana Mimir

[CHANGE] Increased default configuration for -server.grpc-max-recv-msg-size-bytes and -server.grpc-max-send-msg-size-bytes from 4MB to 100MB. #1884
[CHANGE] Default values have changed for the following settings. This improves query performance for recent data (within 12h) by only reading from ingesters: #1909 #1921
- -blocks-storage.bucket-store.ignore-blocks-within now defaults to 10h (previously 0)
- -querier.query-store-after now defaults to 12h (previously 0)
[CHANGE] Alertmanager: removed support for migrating local files from Cortex 1.8 or earlier. Related to original Cortex PR cortexproject/cortex#3910. #2253
[CHANGE] The following settings are now classified as advanced because the defaults should work for most users and tuning them requires in-depth knowledge of how the read path works: #1929
- -querier.query-ingesters-within
- -querier.query-store-after
[CHANGE] Config flag category overrides can be set dynamically at runtime. #1934
[CHANGE] Ingester: deprecated -ingester.ring.join-after. Mimir now behaves as this setting is always set to 0s. This configuration option will be rem...

Contributors

gonzalez, pdf, and 9 other contributors

Assets 46

12 Jul 08:50

colega

mimir-2.2.0-rc.1

1a6ab44

Mimir 2.2.0-rc.1 Pre-release

Pre-release

This release contains 26 contributions from 6 authors. Thank you!

Changes since 2.2.0-rc.0

Grafana Mimir

[BUGFIX] Query-frontend: vector and time functions were sharded, which made expressions like vector(1) > 0 and vector(1) fail. #2355

Mimirtool

[BUGFIX] Make mimirtool build for Windows work again. #2273

Full Changelog: mimir-2.2.0-rc.0...mimir-2.2.0-rc.1

Assets 46

28 Jun 11:04

pstibrany

mimir-2.2.0-rc.0

5b00f01

Mimir 2.2.0-rc.0 Pre-release

Pre-release

2.2.0-rc.0

This release contains 214 contributions from 32 authors. Thank you!

Grafana Labs is excited to announce version 2.2 of Grafana Mimir, the most scalable, most performant open source time series database in the world.

Highlights include the top features, enhancements, and bugfixes in this release. If you are upgrading from Grafana Mimir 2.1, there is migration-related information as well.
For the complete list of changes, see the Changelog.

Features and enhancements

Support for ingesting out-of-order samples: Grafana Mimir includes new, experimental support for ingesting out-of-order samples.
This support is configurable, with users able to set how far out-of-order Mimir will accept samples on a per-tenant basis.
Note that this feature still needs a heavy testing, and is not production-ready yet.
Error messages: The error messages that Mimir reports are more human readable, and the messages include error codes that are easily searchable.
Configurable prefix for object storage: Mimir can now store block data, rules, and alerts in one bucket, each under its own user-defined prefix, rather than requiring one bucket for each.
You can configure the storage prefix by using -<storage>.storage-prefix option for corresponding storage: ruler-storage, alertmanager-storage or blocks-storage.
Helm Chart update: TBD
Store-gateway can now optionally prepopulate the file system cache when memory-mapping index-header files.
This can help store-gateway to avoid looking stuck while loading index-headers.
Feature can be enabled with new experimental flag -blocks-storage.bucket-store.index-header.map-populate-enabled.
Faster ingester startup: Ingesters now replay Write-Ahead-Log by about 50% faster, and they also re-join the ring sooner under some conditions.

Upgrade considerations

We have updated default values and some parameters in Grafana Mimir 2.2 to give you a better out-of-the-box experience:

Message size limits for gRPC messages exchanged between internal Mimir components increased to 100 MiB from the previous 4 MiB.
This helps to avoid internal server errors when pushing or querying large data.
The -blocks-storage.bucket-store.ignore-blocks-within parameter changed from 0 to 10h.
The default value of -querier.query-store-after changed from 0 to 12h.
Both changes improve query performance for most-recent data by querying only the ingesters, rather than object storage.
The option -querier.shuffle-sharding-ingesters-lookback-period has been deprecated.
If you previously changed this option from its default of 0s, set -querier.shuffle-sharding-ingesters-enabled to true and specify the lookback period by setting the -querier.query-ingesters-within option.
The -memberlist.abort-if-join-fails parameter now defaults to false.
When Mimir is using memberlist as a backend store for hash ring, and it fails to join the memberlist cluster, Mimir no longer aborts startup by default.

Bug fixes

PR 1883: Fixed a bug that caused the query-frontend and querier to crash when they received a user query with a special regular expression label matcher.
PR 1933: Fixed a bug in the ingester ring page, which showed incorrect status of entries in the ring.
PR 2090: Ruler in remote rule evaluation mode now applies the timeout correctly. Previously the ruler could get stuck forever, which halted rule evaluation.
PR 2036: Fixed panic at startup when Mimir is running in monolithic mode and query sharding is enabled.

CHANGELOG

Grafana Mimir

[CHANGE] Increased default configuration for -server.grpc-max-recv-msg-size-bytes and -server.grpc-max-send-msg-size-bytes from 4MB to 100MB. #1884
[CHANGE] Default values have changed for the following settings. This improves query performance for recent data (within 12h) by only reading from ingesters: #1909 #1921
- -blocks-storage.bucket-store.ignore-blocks-within now defaults to 10h (previously 0)
- -querier.query-store-after now defaults to 12h (previously 0)
[CHANGE] Alertmanager: removed support for migrating local files from Cortex 1.8 or earlier. Related to original Cortex PR cortexproject/cortex#3910. #2253
[CHANGE] The following settings are now classified as advanced because the defaults should work for most users and tuning them requires in-depth knowledge of how the read path works: #1929
- -querier.query-ingesters-within
- -querier.query-store-after
[CHANGE] Config flag category overrides can be set dynamically at runtime. #1934
[CHANGE] Ingester: deprecated -ingester.ring.join-after. Mimir now behaves as this setting is always set to 0s. This configuration option will be removed in Mimir 2.4.0. #1965
[CHANGE] Blocks uploaded by ingester no longer contain __org_id__ label. Compactor now ignores this label and will compact blocks with and without this label together. mimirconvert tool will remove the label from blocks as "unknown" label. #1972
[CHANGE] Querier: deprecated -querier.shuffle-sharding-ingesters-lookback-period, instead adding -querier.shuffle-sharding-ingesters-enabled to enable or disable shuffle sharding on the read path. The value of -querier.query-ingesters-within is now used internally for shuffle sharding lookback. #2110
[CHANGE] Memberlist: -memberlist.abort-if-join-fails now defaults to false. Previously it defaulted to true. #2168
[CHANGE] Ruler: /api/v1/rules* and /prometheus/rules* configuration endpoints are removed. Use /prometheus/config/v1/rules*. #2182
[CHANGE] Ingester: -ingester.exemplars-update-period has been renamed to -ingester.tsdb-config-update-period. You can use it to update multiple, per-tenant TSDB configurations. #2187
[FEATURE] Ingester: (Experimental) Add the ability to ingest out-of-order samples up to an allowed limit. If you enable this feature, it requires additional memory and disk space. This feature also enables a write-behind log, which might lead to longer ingester-start replays. When this feature is disabled, there is no overhead on memory, disk space, or startup times. #2187
- -ingester.out-of-order-time-window, as duration string, allows you to set how back in time a sample can be. The default is 0s, where s is seconds.
- cortex_ingester_tsdb_out_of_order_samples_appended_total metric tracks the total number of out-of-order samples ingested by the ingester.
- cortex_discarded_samples_total has a new label reason="sample-too-old", when the -ingester.out-of-order-time-window flag is greater than zero. The label tracks the number of samples that were discarded for being too old; they were out of order, but beyond the time window allowed.
[ENHANCEMENT] Distributor: Added limit to prevent tenants from sending excessive number of requests: #1843
- The following CLI flags (and their respective YAML config options) have been added:
  - -distributor.request-rate-limit
  - -distributor.request-burst-limit
- The following metric is exposed to tell how many requests have been rejected:
  - cortex_discarded_requests_total
[ENHANCEMENT] Store-gateway: Add the experimental ability to run requests in a dedicated OS thread pool. This feature can be configured using -store-gateway.thread-pool-size and is disabled by default. Replaces the ability to run index header operations in a dedicated thread pool. #1660 #1812
[ENHANCEMENT] Improved error messages to make them easier to understand; each now have a unique, global identifier that you can use to look up in the runbooks for more information. #1907 #1919 #1888 #1939 #1984 #2009 #2056 #2066 #2104 #2150 #2234
[ENHANCEMENT] Memberlist KV: incoming messages are now processed on per-key goroutine. This may reduce loss of "maintanance" packets in busy memberlist installations, but use more CPU. New memberlist_client_received_broadcasts_dropped_total counter tracks number of dropped per-key messages. #1912
[ENHANCEMENT] Blocks Storage, Alertmanager, Ruler: add support a prefix to the bucket store (*_storage.storage_prefix). This enables using the same bucket for the three components. #1686 #1951
[ENHANCEMENT] Upgrade Docker base images to alpine:3.16.0. #2028
[ENHANCEMENT] Store-gateway: Add experimental configuration option for the store-gateway to attempt to pre-populate the file system cache when memory-mapping index-header files. Enabled with -blocks-storage.bucket-store.index-header.map-populate-enabled=true. Note this flag only has an effect when running on Linux. #2019 #2054
[ENHANCEMENT] Chunk Mapper: reduce memory usage of async chunk mapper. #2043
[ENHANCEMENT] Ingester: reduce sleep time when reading WAL. #2098
[ENHANCEMENT] Compactor: Run sanity check on blocks storage configuration at startup. #2144
[ENHANCEMENT] Compactor: Add HTTP API for uploading TSDB blocks. Enabled with -compactor.block-upload-enabled. #1694 #2126
[ENHANCEMENT] Ingester: Enable querying overlapping blocks by default. #2187
[ENHANCEMENT] Distributor: Auto-forget unhealthy distributors after ten failed ring heartbeats. #2154
[ENHANCEMENT] Distributor: Add new metric cortex_distributor_forward_errors_total for error codes resulting from forwarding requests. #2077
[ENHANCEMENT] /ready endpoint now returns and logs detailed services information. #2055
[ENHANCEMENT] Memcached client: Reduce number of connections required to fetch cached keys from memcached. #1920
[ENHANCEMENT] Improved error message returned when -querier.query-store-after validation fails. #...

Contributors

gonzalez, pdf, and 9 other contributors

Assets 42

26 May 17:54

johannaratliff

mimir-2.1.0

840c686

2.1.0

Grafana Labs is excited to announce version 2.1 of Grafana Mimir, the most scalable, most performant open source time series database in the world.

Below we highlight the top features, enhancements and bugfixes in this release, as well as relevant callouts for those upgrading from Grafana Mimir 2.0. The complete list of changes is recorded in the Changelog.

Features and enhancements

Mimir on ARM: We now publish Docker images for both amd64 and arm64, making it easier for those on arm-based machines to develop and run Mimir. Multiplaform images are available from the Mimir docker registry. Note that our existing integration test suite only uses the amd64 images, which means we cannot make any functional or performance guarantees about the arm64 images.
Remote ruler mode for improved rule evaluation performance: We've added a remote mode for the Grafana Mimir ruler, in which the ruler delegates rule evaluation to the query-frontend rather than evaluating rules directly within the ruler process itself. This allows recording and alerting rules to benefit from the query parallelization techniques implemented in the query-frontend (like query sharding). Remote mode is considered experimental and is off by default. To enable, see remote ruler.
Per-tenant custom trackers for monitoring cardinality: In Grafana Mimir 2.0, we introduced a custom tracker feature that allows you to track the count of active series over time that match a specific label matcher. In Grafana Mimir 2.1, we've made it possible to configure custom trackers via the runtime configuration file. This means you can now define different trackers for each tenant in your cluster and modify those trackers without an ingester restart.
Reduce cardinality of Grafana Mimir's /metrics endpoint: While Grafana Mimir does a good job of exposing a relatively small number of series about its own state, this number can tick up when running Grafana Mimir clusters with high tenant counts or high active series counts. To reduce this number (and the accompanying cost of scraping and storing these time series), we made several optimizations which decreased series count on the /metrics endpoint by more than 10%.

Upgrade considerations

We've updated the default values for 2 parameters in Grafana Mimir to give users better out-of-the-box performance:

We've changed the default for -blocks-storage.tsdb.isolation-enabled from true to false. We've marked this flag as deprecated and will remove it completely in 2 releases. TSDB isolation is a feature inherited from Prometheus that didn't provide any benefit given Grafana Mimir's distributed architecture and in our 1 billion series load test we found it actually hurt performance. Disabling it reduced our ingester 99th percentile latency by 90%.
The store-gateway attributes cache is now enabled by default (achieved by updating the default for -blocks-storage.bucket-store.chunks-cache.attributes-in-memory-max-items from 0 to 50000). This in-memory cache makes it faster to look up object attributes for chunk data. We've been running this optional cache internally for a while and upon a recent configuration audit, realized it made sense to do the same for all users. The increase in store-gateway memory utilization from enabling this cache is negligible and easily justified given the performance gains.

Bug fixes

2.1.0 bug fixes

PR 1704: Fixed a bug that previously caused Grafana Mimir to crash on startup when trying to run in monolithic mode with the results cache enabled due to duplicate metric names.
PR 1835: Fixed a bug that caused Grafana Mimir to crash when an invalid Alertmanager configuration was set even though the Alertmanager component was disabled. After this fix, the Alertmanager configuration is only validated if the Alertmanager component is loaded.
PR 1836: The ability to run Alertmanager with local storage broke in Grafana Mimir 2.0 when we removed the ability to run the Alertmanager without sharding. With this bugfix, we've made it possible to again run Alertmanager with local storage. However, for production use, we still recommend using external store since this is needed to persist Alertmanager state (e.g. silences) between replicas.
PR 1715: Restored Grafana Mimir's ability to use CNAME DNS records to reach memcached servers. The bug was inherited from an upstream change to Thanos; we contributed a fix to Thanos and subsequently updated our Thanos version.

CHANGELOG

Grafana Mimir

[CHANGE] Compactor: No longer upload debug meta files to object storage. #1257
[CHANGE] Default values have changed for the following settings: #1547
- -alertmanager.alertmanager-client.grpc-max-recv-msg-size now defaults to 100 MiB (previously was not configurable and set to 16 MiB)
- -alertmanager.alertmanager-client.grpc-max-send-msg-size now defaults to 100 MiB (previously was not configurable and set to 4 MiB)
- -alertmanager.max-recv-msg-size now defaults to 100 MiB (previously was 16 MiB)
[CHANGE] Ingester: Add user label to metrics cortex_ingester_ingested_samples_total and cortex_ingester_ingested_samples_failures_total. #1533
[CHANGE] Ingester: Changed -blocks-storage.tsdb.isolation-enabled default from true to false. The config option has also been deprecated and will be removed in 2 minor version. #1655
[CHANGE] Query-frontend: results cache keys are now versioned, this will cause cache to be re-filled when rolling out this version. #1631
[CHANGE] Store-gateway: enabled attributes in-memory cache by default. New default configuration is -blocks-storage.bucket-store.chunks-cache.attributes-in-memory-max-items=50000. #1727
[CHANGE] Compactor: Removed the metric cortex_compactor_garbage_collected_blocks_total since it duplicates cortex_compactor_blocks_marked_for_deletion_total. #1728
[CHANGE] All: Logs that used theorg_id label now use user label. #1634 #1758
[CHANGE] Alertmanager: the following metrics are not exported for a given user and integration when the metric value is zero: #1783
- cortex_alertmanager_notifications_total
- cortex_alertmanager_notifications_failed_total
- cortex_alertmanager_notification_requests_total
- cortex_alertmanager_notification_requests_failed_total
- cortex_alertmanager_notification_rate_limited_total
[CHANGE] Removed the following metrics exposed by the Mimir hash rings: #1791
- cortex_member_ring_tokens_owned
- cortex_member_ring_tokens_to_own
- cortex_ring_tokens_owned
- cortex_ring_member_ownership_percent
[CHANGE] Querier / Ruler: removed the following metrics tracking number of query requests send to each ingester. You can use cortex_request_duration_seconds_count{route=~"/cortex.Ingester/(QueryStream|QueryExemplars)"} instead. #1797
- cortex_distributor_ingester_queries_total
- cortex_distributor_ingester_query_failures_total
[CHANGE] Distributor: removed the following metrics tracking the number of requests from a distributor to ingesters: #1799
- cortex_distributor_ingester_appends_total
- cortex_distributor_ingester_append_failures_total
[CHANGE] Distributor / Ruler: deprecated -distributor.extend-writes. Now Mimir always behaves as if this setting was set to false, which we expect to be safe for every Mimir cluster setup. #1856
[FEATURE] Querier: Added support for streaming remote read. Should be noted that benefits of chunking the response are partial here, since in a typical query-frontend setup responses will be buffered until they've been completed. #1735
[FEATURE] Ruler: Allow setting evaluation_delay for each rule group via rules group configuration file. #1474
[FEATURE] Ruler: Added support for expression remote evaluation. #1536 #1818
- The following CLI flags (and their respective YAML config options) have been added:
  - -ruler.query-frontend.address
  - -ruler.query-frontend.grpc-client-config.grpc-max-recv-msg-size
  - -ruler.query-frontend.grpc-client-config.grpc-max-send-msg-size
  - -ruler.query-frontend.grpc-client-config.grpc-compression
  - -ruler.query-frontend.grpc-client-config.grpc-client-rate-limit
  - -ruler.query-frontend.grpc-client-config.grpc-client-rate-limit-burst
  - -ruler.query-frontend.grpc-client-config.backoff-on-ratelimits
  - -ruler.query-frontend.grpc-client-config.backoff-min-period
  - -ruler.query-frontend.grpc-client-config.backoff-max-period
  - -ruler.query-frontend.grpc-client-config.backoff-retries
  - -ruler.query-frontend.grpc-client-config.tls-enabled
  - -ruler.query-frontend.grpc-client-config.tls-ca-path
  - -ruler.query-frontend.grpc-client-config.tls-cert-path
  - -ruler.query-frontend.grpc-client-config.tls-key-path
  - -ruler.query-frontend.grpc-client-config.tls-server-name
  - -ruler.query-frontend.grpc-client-config.tls-insecure-skip-verify
[FEATURE] Distributor: Added the ability to forward specifics metrics ...

Assets 46

18 May 23:44

johannaratliff

mimir-2.1.0-rc.1

8115df3

2.1.0-rc.1 Pre-release

Pre-release

CHANGELOG since mimir-2.1.0-rc.0

[CHANGE] Distributor / Ruler: deprecated -distributor.extend-writes. Now Mimir always behaves as if this setting was set to false, which we expect to be safe for every Mimir cluster setup. #1856

Assets 46

17 May 21:00

johannaratliff

mimir-2.1.0-rc.0

373bad5

2.1.0-rc.0 Pre-release

Pre-release

Grafana Mimir version 2.1 release notes

Grafana Labs is excited to announce version 2.1 of Grafana Mimir, the most scalable, most performant open source time series database in the world.

Features and enhancements

Mimir on ARM: We now publish Docker images for both amd64 and arm64, making it easier for those on arm-based machines to develop and run Mimir. Multiplaform images are available from the Mimir docker registry. Note that our existing integration test suite only uses the amd64 images, which means we cannot make any functional or performance guarantees about the arm64 images.
Remote ruler mode for improved rule evaluation performance: We've added a remote mode for the Grafana Mimir ruler, in which the ruler delegates rule evaluation to the query-frontend rather than evaluating rules directly within the ruler process itself. This allows recording and alerting rules to benefit from the query parallelization techniques implemented in the query-frontend (like query sharding). Remote mode is considered experimental and is off by default. To enable, see remote ruler.
Per-tenant custom trackers for monitoring cardinality: In Grafana Mimir 2.0, we introduced a custom tracker feature that allows you to track the count of active series over time that match a specific label matcher. In Grafana Mimir 2.1, we've made it possible to configure custom trackers via the runtime configuration file. This means you can now define different trackers for each tenant in your cluster and modify those trackers without an ingester restart.
Reduce cardinality of Grafana Mimir's /metrics endpoint: While Grafana Mimir does a good job of exposing a relatively small number of series about its own state, this number can tick up when running Grafana Mimir clusters with high tenant counts or high active series counts. To reduce this number (and the accompanying cost of scraping and storing these time series), we made several optimizations which decreased series count on the /metrics endpoint by more than 10%.

Upgrade considerations

We've updated the default values for 2 parameters in Grafana Mimir to give users better out-of-the-box performance:

We've changed the default for -blocks-storage.tsdb.isolation-enabled from true to false. We've marked this flag as deprecated and will remove it completely in 2 releases. TSDB isolation is a feature inherited from Prometheus that didn't provide any benefit given Grafana Mimir's distributed architecture and in our 1 billion series load test we found it actually hurt performance. Disabling it reduced our ingester 99th percentile latency by 90%.
The store-gateway attributes cache is now enabled by default (achieved by updating the default for -blocks-storage.bucket-store.chunks-cache.attributes-in-memory-max-items from 0 to 50000). This in-memory cache makes it faster to look up object attributes for chunk data. We've been running this optional cache internally for a while and upon a recent configuration audit, realized it made sense to do the same for all users. The increase in store-gateway memory utilization from enabling this cache is negligible and easily justified given the performance gains.

Bug fixes

2.1.0 bug fixes

PR 1704: Fixed a bug that previously caused Grafana Mimir to crash on startup when trying to run in monolithic mode with the results cache enabled due to duplicate metric names.
PR 1835: Fixed a bug that caused Grafana Mimir to crash when an invalid Alertmanager configuration was set even though the Alertmanager component was disabled. After this fix, the Alertmanager configuration is only validated if the Alertmanager component is loaded.
PR 1836: The ability to run Alertmanager with local storage broke in Grafana Mimir 2.0 when we removed the ability to run the Alertmanager without sharding. With this bugfix, we've made it possible to again run Alertmanager with local storage. However, for production use, we still recommend using external store since this is needed to persist Alertmanager state (e.g. silences) between replicas.
PR 1715: Restored Grafana Mimir's ability to use CNAME DNS records to reach memcached servers. The bug was inherited from an upstream change to Thanos; we contributed a fix to Thanos and subsequently updated our Thanos version.

Assets 46

29 Mar 09:58

pracucci

mimir-2.0.0

9fd2da5

2.0.0

Grafana Labs is excited to announce the first release of Grafana Mimir, the most scalable, most performant open source time series database in the world. In customer tests, we’ve shown that a single cluster can support more than 1 billion active time series.

Besides massive scale, Grafana Mimir offers a host of other benefits, including easy deployment, native multi-tenancy, high availability, durable long-term storage, and exceptional query performance on even the highest cardinality queries.

We’re launching Grafana Mimir with a 2.0 version number to signal our respect for Cortex, the project from which Grafana Mimir was forked. The choice of 2.0 also represents our conviction that Grafana Mimir is real-world-tested, production-ready software. It has served as the backbone of our Grafana Cloud Metrics and Grafana Enterprise Metrics products since their inception.

Learn more:

The complete list of changes is recorded in the Changelog.

Assets 38

21 Mar 16:24

pracucci

mimir-2.0.0-rc.4

bfe2a45

2.0.0-rc.4 Pre-release

Pre-release

mimir-2.0.0-rc.4

v2.0.0-rc.4

Assets 38

Releases: grafana/mimir

mimir-2.3.0-rc.2

Changes since 2.3.0-rc.0

2.3.0-rc.2

Grafana Mimir

2.3.0-rc.1

Grafana Mimir

Mixin

Jsonnet

Mimirtool

Query-tee

Mimir Continuous Test

Documentation

mimir-2.3.0-rc0

Grafana Mimir version 2.3 release notes

Features and enhancements

Upgrade considerations

Bug fixes

Changelog since 2.2

2.3.0-rc.0

Grafana Mimir

Contributors

2.2.0

Features and enhancements

Upgrade considerations

Bug fixes

Changelog

2.2.0

Grafana Mimir

Contributors

Mimir 2.2.0-rc.1

Changes since 2.2.0-rc.0

Grafana Mimir

Mimirtool

Mimir 2.2.0-rc.0

2.2.0-rc.0

Features and enhancements

Upgrade considerations

Bug fixes

CHANGELOG

Grafana Mimir

Contributors

2.1.0

Features and enhancements

Upgrade considerations

Bug fixes

2.1.0 bug fixes

CHANGELOG

Grafana Mimir

2.1.0-rc.1

2.1.0-rc.0

Grafana Mimir version 2.1 release notes

Features and enhancements

Upgrade considerations

Bug fixes

2.1.0 bug fixes

2.0.0

2.0.0-rc.4