-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local queues prometheus metrics #1833
Comments
@alculquicondor @tenzen-y Do you think that'd be a useful / possible enhancement? |
Yes, that's the primary concern. I wouldn't make it a feature flag, but a long-term configuration field. |
I understand that this feature is so useful, but I have the same concern with @alculquicondor. Anyway, I guess that having a small KEP would be better since we may extend the existing Config API. |
That makes sense. There could be one or two options added to the Config API, similar to the existing I can work on a small KEP if you guys give the green light. |
It seems simple enough. |
SGTM |
Thanks for your quick feedback! I'll work on it asap. /assign |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
@astefanutti are you still looking into this? |
@alculquicondor I haven't, but hopefully we'll get back to it soon. |
/remove-lifecycle stale |
/unassign |
This PR introduces an enhancement to enable collection of prometheus metrics for local queues. Addresses issue: kubernetes-sigs#1833 Signed-off-by: Varsha Prasad Narsing <[email protected]>
This PR introduces an enhancement to enable collection of prometheus metrics for local queues. Addresses issue: kubernetes-sigs#1833 Signed-off-by: Varsha Prasad Narsing <[email protected]>
@varshaprasad96, please write |
/assign |
* [Feature] Enable prometheus metrics for local queues This PR introduces an enhancement to enable collection of prometheus metrics for local queues. Addresses issue: #1833 Signed-off-by: Varsha Prasad Narsing <[email protected]> * Address reviews This commit addresses reviews by adding additional metrics for local queue. Signed-off-by: Varsha Prasad Narsing <[email protected]> --------- Signed-off-by: Varsha Prasad Narsing <[email protected]>
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale @varshaprasad96 Do you still work on this enhancement? |
@tenzen-y Yes. I'm planning to get the implementation PR up by next few days. |
Awsome, thanks for your effort! |
/assign |
I've been working on the this KEP for the last couple days and overall it seemed pretty straightforward, until I got to adding the LocalQueueByStatus metric since afaict the status metric for CQs is a bubbling up of the internal cq.status. The only representation of LQ states exists inside the CQ struct, and I felt like adding all LQs to the cache struct felt wrong. My current thought is to add a new Type in metrics |
…#2516) * [Feature] Enable prometheus metrics for local queues This PR introduces an enhancement to enable collection of prometheus metrics for local queues. Addresses issue: kubernetes-sigs#1833 Signed-off-by: Varsha Prasad Narsing <[email protected]> * Address reviews This commit addresses reviews by adding additional metrics for local queue. Signed-off-by: Varsha Prasad Narsing <[email protected]> --------- Signed-off-by: Varsha Prasad Narsing <[email protected]>
@KPostOffice Could you elaborate on where exactly in
Also, the local_queue has reference to the respective cluster queue. Can't we just directly query CQ's cache status instead while reporting metrics? |
@varshaprasad96 I wasn't planning on adding the metrics to the
|
What would you like to be added:
Expose Prometheus metrics for local queues, equivalent to the existing cluster queue metrics, but filtered and labeled by local queues.
Similarly to the visibility API, that serves information about pending workloads in local queues, it would be possible to get metrics like like pending workloads, admitted active workloads, resource usage, etc, for local queues.
If cardinality is a concern, those metrics could be exposed behind a feature flag.
Why is this needed:
Metrics about local queues can be useful for the batch users persona, so they can have visibility and historical trends about their workloads.
While some metrics are already available for cluster queues, exposing them to the batch users persona presents the following challenges / limitations:
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: