Add Monitoring Metrics with aioprometheus for ResourcePools #103
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
This PR introduces monitoring capabilities for ResourcePools using the aioprometheus library. With these changes, we are able to measure and monitor the execution time of specific events as well as the current state of the ResourcePools, including available and utilized resources.
Context:
The motivation behind this PR is the need to have a clearer and real-time view of the performance and state of the ResourcePools. This has become essential to optimize resource allocation and identify performance bottlenecks more quickly.
Implementation:
The implementation focuses on integrating
aioprometheus
with our existing system and defining specific metrics for the ResourcePools. The main changes include:Execution Time Measurement: We use the
measure_execution_time
oraioprometheus.timer()
decorator to measure the execution time of the methods on the operator. This helps us identify and optimize slower operations.Specific Metrics Added:
In addition to basic execution time monitoring metrics, this PR introduces a series of specific metrics for ResourcePools, allowing for detailed monitoring of their state and utilization. The added metrics are:
resource_pool_min_available
: This Gauge metric represents the minimum number of environments available in each ResourcePool. It is crucial for understanding reserve capacity and ensuring that ResourcePools are adequately sized for demands.resource_pool_available
: Similar to the previous metric, this Gauge measures the current number of available environments, offering real-time insights into resource utilization.resource_pool_used_total
: A Counter that accumulates the total number of environments used over time in each ResourcePool. This metric is essential for tracking overall demand and usage patterns.resource_pool_state
: Complementing the above metrics, this Gauge captures the state of each ResourcePool, including information on available and utilized resources. The states are differentiated by labels such as name, namespace, and state, allowing for granular analyses of the condition and performance of the ResourcePools.Each of these metrics is accompanied by detailed labels, such as
name
andnamespace
for ResourcePool, and an additional state label for theresource_pool_state
metric. These labels provide the necessary context for precise and targeted analyses, facilitating the identification of areas that require attention or adjustments.Example: