Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2854]Add queue maxRunningApps metrics #1012

Closed
wants to merge 3 commits into from

Conversation

kaichiachen
Copy link
Contributor

@kaichiachen kaichiachen commented Feb 10, 2025

What is this PR for?

Add a new metric - maxRunningApps for Prometheus pull-based monitoring. The maxapplications property is an integer value, larger than 1, which allows you to limit the number of running applications for the queue.
I fixed the resource name - "Apps" as I see maxRunningApps is a standalone metrics independent from resources.Resource. See https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/queue.go#L86

type Queue struct {
	QueuePath string // Fully qualified path for the queue
	Name      string // Queue name as in the config etc.

	...
	pending             *resources.Resource       // pending resource for the apps in the queue
	allocatedResource   *resources.Resource       // allocated resource for the apps in the queue
	preemptingResource  *resources.Resource       // preempting resource for the apps in the queue
	...
	maxResource            *resources.Resource // When not set, max = nil
	guaranteedResource     *resources.Resource // When not set, Guaranteed == 0
	...
	maxRunningApps         uint64
	runningApps            uint64

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-2854?filter=-1

How should this be tested?

  1. Deploy k8s cluster with kind
  2. forward metrics endpoint by kubectl port-forward svc/yunikorn-service -n yunikorn 9080:9080
  3. verify metrics are exposed through belowing
#> curl -s http://localhost:9080/ws/v1/metrics  | grep yunikorn_root_queue_resource
# HELP yunikorn_root_queue_resource Queue resource metrics. State of the resource includes `guaranteed`, `max`, `allocated`, `pending`, `preempting`, 'maxRunningApps'.   <---- help message is updated
# TYPE yunikorn_root_queue_resource gauge
yunikorn_root_queue_resource{resource="apps",state="maxRunningApps"} 0        <------ new added metrics
yunikorn_root_queue_resource{resource="ephemeral-storage",state="max"} 6.19061035008e+11
yunikorn_root_queue_resource{resource="memory",state="allocated"} 0
yunikorn_root_queue_resource{resource="memory",state="max"} 4.9624363008e+10
yunikorn_root_queue_resource{resource="memory",state="pending"} 0
yunikorn_root_queue_resource{resource="pods",state="allocated"} 3
yunikorn_root_queue_resource{resource="pods",state="max"} 330
yunikorn_root_queue_resource{resource="pods",state="pending"} 0
yunikorn_root_queue_resource{resource="vcore",state="allocated"} 0
yunikorn_root_queue_resource{resource="vcore",state="max"} 36000
yunikorn_root_queue_resource{resource="vcore",state="pending"} 0
  1. configure maxSunningApps by
apiVersion: v1
data:
  queues.yaml: |2
    partitions:
      - name: default
        placementrules:
          - name: tag
            value: namespace
            create: true
        queues:
          - name: root
            submitacl: '*'
            maxapplications: 10
kind: ConfigMap
  1. Verify metrics are reflected in endpoint
#> curl -s http://localhost:9080/ws/v1/metrics  | grep maxRunningApps           
# HELP yunikorn_queue_resource Queue resource metrics. State of the resource includes `guaranteed`, `max`, `allocated`, `pending`, `preempting`, 'maxRunningApps'.
yunikorn_queue_resource{queue="root",resource="apps",state="maxRunningApps"} 10
# HELP yunikorn_root_dev_6kdna_queue_resource Queue resource metrics. State of the resource includes `guaranteed`, `max`, `allocated`, `pending`, `preempting`, 'maxRunningApps'.
# HELP yunikorn_root_dev_ruf38_queue_resource Queue resource metrics. State of the resource includes `guaranteed`, `max`, `allocated`, `pending`, `preempting`, 'maxRunningApps'.
# HELP yunikorn_root_queue_resource Queue resource metrics. State of the resource includes `guaranteed`, `max`, `allocated`, `pending`, `preempting`, 'maxRunningApps'.
yunikorn_root_queue_resource{resource="apps",state="maxRunningApps"} 10

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@kaichiachen kaichiachen marked this pull request as draft February 10, 2025 09:58
@kaichiachen kaichiachen marked this pull request as ready for review February 10, 2025 12:25
Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some remarks

@pbacsko
Copy link
Contributor

pbacsko commented Feb 13, 2025

Please check the failed test case

@kaichiachen
Copy link
Contributor Author

Please check the failed test case

Sure @pbacsko I fixed UT issue and updated my change

Copy link

codecov bot commented Feb 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.49%. Comparing base (7391aeb) to head (0854845).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1012   +/-   ##
=======================================
  Coverage   82.48%   82.49%           
=======================================
  Files          97       97           
  Lines       15627    15635    +8     
=======================================
+ Hits        12890    12898    +8     
  Misses       2457     2457           
  Partials      280      280           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chenyulin0719 chenyulin0719 self-requested a review February 14, 2025 13:00
Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final comments

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@pbacsko pbacsko closed this in 4f2f1b3 Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants