-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GPU (nvidia-smi) monitoring to telegraf, grafana #77
Labels
Comments
christopheredsall
added
enhancement
New feature or request
good first issue
Post a comment if you're interested in helping
labels
Jun 18, 2020
e.g. [root@vm-gpu2-1-ad1-0001 ~]# sed --in-place -e '/inputs.nvidia_smi/ s/^#//' /etc/telegraf/telegraf.conf
[root@vm-gpu2-1-ad1-0001 ~]# systemctl reload telegraf.service [root@mgmt ~]# influx -database 'telegraf' -execute 'select * from nvidia_smi where time > now()-20s' -format 'json' -pretty
{
"results": [
{
"series": [
{
"name": "nvidia_smi",
"columns": [
"time",
"clocks_current_graphics",
"clocks_current_memory",
"clocks_current_sm",
"clocks_current_video",
"compute_mode",
"encoder_stats_average_fps",
"encoder_stats_average_latency",
"encoder_stats_session_count",
"host",
"index",
"memory_free",
"memory_total",
"memory_used",
"name",
"pcie_link_gen_current",
"pcie_link_width_current",
"power_draw",
"pstate",
"temperature_gpu",
"utilization_gpu",
"utilization_memory",
"uuid"
],
"values": [
[
1592478071000000000,
405,
715,
405,
835,
"Default",
0,
0,
0,
"vm-gpu2-1-ad1-0001",
"0",
16280,
16280,
0,
"Tesla P100-SXM2-16GB",
3,
16,
28.1,
"P0",
41,
4,
0,
"GPU-29282894-1d8f-08c2-9c01-d34c831b1e4d"
]
]
}
]
}
]
} |
Dashboard 12225 seems to work as long as we change --- 12225.json.orig 2020-06-18 18:25:42.000000000 +0100
+++ 12225.json 2020-06-18 17:26:16.000000000 +0100
@@ -2734,7 +2734,7 @@
"multi": false,
"name": "hostname",
"options": [],
- "query": "SHOW TAG VALUES FROM \"win_system\" WITH KEY = \"host\"",
+ "query": "SHOW TAG VALUES FROM \"system\" WITH KEY = \"host\"",
"refresh": 1,
"regex": "",
"skipUrlSync": false, |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
There is an nvidia-smi plugin for telegraf and there are dashboards available on grafana.com
What is needed is to uncomment the lines (or, at a minimum the
[[iinputs.nvidia_smi]]
) in/etc/telegraf/telegraf.conf
We would probably want to do this on only the nodes that have GPUs.
The text was updated successfully, but these errors were encountered: