diff --git a/process/README.md b/process/README.md index 1d1c373c31cd0..032fd0f922b15 100644 --- a/process/README.md +++ b/process/README.md @@ -3,52 +3,37 @@ ## Overview The Process Check lets you: - -- Collect resource usage metrics for specific running processes on any host: CPU, memory, I/O, number of threads, etc. -- Use [Process Monitors][1]: configure thresholds for how many instances of a specific process ought to be running and get alerts when the thresholds aren't met (see **Service Checks** below). +- Collect resource usage metrics for specific running processes on any host. For example, CPU, memory, I/O, and number of threads. +- Use [Process Monitors][1] to configure thresholds for how many instances of a specific process should be running and get alerts when the thresholds aren't met (see **Service Checks** below). ## Setup ### Installation -The Process Check is included in the [Datadog Agent][2] package, so you don't need to install anything else on your server. +The Process check is included in the [Datadog Agent][2] package, so you don't need to install anything else on your server. ### Configuration -Unlike many checks, the Process Check doesn't monitor anything useful by default. You must configure which processes you want to monitor, and how. - +Unlike many checks, the Process check doesn't monitor anything useful by default. You must configure which processes you want to monitor. -1. While there's no standard default check configuration, here's an example `process.d/conf.yaml` that monitors SSH/SSHD processes. See the [sample process.d/conf.yaml][3] for all available configuration options: +While there's no standard default check configuration, here's an example `process.d/conf.yaml` that monitors SSH/SSHD processes. See the [sample process.d/conf.yaml][3] for all available configuration options: ```yaml - init_config: - - instances: - - ## @param name - string - required - ## Used to uniquely identify your metrics as they are tagged with this name in Datadog. - # - - name: ssh - - ## @param search_string - list of strings - optional - ## If one of the elements in the list matches, it returns the count of - ## all the processes that match the string exactly by default. Change this behavior with the - ## parameter `exact_match: false`. - ## - ## Note: Exactly one of search_string, pid or pid_file must be specified per instance. - # - search_string: - - ssh - - sshd +init_config: +instances: +- name: ssh + search_string: + - ssh + - sshd ``` -Some process metrics require either running the Datadog collector as the same user as the monitored process or privileged access to be retrieved. Where the former option is not desired, and to avoid running the Datadog collector as `root`, the `try_sudo` option lets the Process Check try using `sudo` to collect this metric. As of now, only the `open_file_descriptors` metric on Unix platforms is taking advantage of this setting. Note: the appropriate sudoers rules have to be configured for this to work: +**Note**: After you make configuration changes, make sure you [restart the Agent][4]. - ```text - dd-agent ALL=NOPASSWD: /bin/ls /proc/*/fd/ - ``` +Retrieving some process metrics requires the Datadog collector to either run as the monitored process user or with privileged access. For the `open_file_descriptors` metric on Unix platforms, there is an additional configuration option. Setting `try_sudo` to `true` in your `conf.yaml` file allows the Process check to try using `sudo` to collect the `open_file_descriptors` metric. Using this configuration option requires setting the appropriate sudoers rules in `/etc/sudoers`: -2. [Restart the Agent][4]. +```shell +dd-agent ALL=NOPASSWD: /bin/ls /proc/*/fd/ +``` ### Validation @@ -56,10 +41,17 @@ Run the [Agent's status subcommand][5] and look for `process` under the Checks s ### Metrics notes -**Note**: Some metrics are not available on Linux or OSX: +The following metrics are not available on Linux or macOS: +- Process I/O metrics are **not** available on Linux or macOS since the files that the Agent reads (`/proc//io`) are only readable by the process's owner. For more information, [read the Agent FAQ][6]. + +The following metrics are not available on Windows: +- `system.cpu.iowait` +- `system.processes.mem.page_faults.minor_faults` +- `system.processes.mem.page_faults.children_minor_faults` +- `system.processes.mem.page_faults.major_faults` +- `system.processes.mem.page_faults.children_major_faults` -- Process I/O metrics are **not** available on Linux or OSX since the files that the Agent reads (`/proc//io`) are only readable by the process's owner. For more information, [read the Agent FAQ][6] -- `system.cpu.iowait` is not available on Windows. +**Note**: Use a [WMI check][11] to gather page fault metrics on Windows. All metrics are per `instance` configured in process.yaml, and are tagged `process_name:`. @@ -100,3 +92,4 @@ To get a better idea of how (or why) to monitor process resource consumption wit [8]: https://github.com/DataDog/integrations-core/blob/master/process/assets/service_checks.json [9]: https://docs.datadoghq.com/help/ [10]: https://www.datadoghq.com/blog/process-check-monitoring +[11]: https://docs.datadoghq.com/integrations/wmi_check/ diff --git a/process/metadata.csv b/process/metadata.csv index 4ceeb14dcee74..7d032651a0550 100644 --- a/process/metadata.csv +++ b/process/metadata.csv @@ -8,10 +8,10 @@ system.processes.ioread_count,gauge,,read,,The number of disk reads by this proc system.processes.iowrite_bytes,gauge,,byte,,The number of bytes written to disk by this process. In Windows: the number of bytes written by this process.,0,system,processes io w bytes, system.processes.iowrite_bytes_count,count,,byte,,The number of bytes written to disk by this process. In Windows: the number of bytes written by this process.,0,system,processes io w bytes, system.processes.iowrite_count,gauge,,write,,The number of disk writes by this process. In Windows: the number of writes by this process.,0,system,processes io w count, -system.processes.mem.page_faults.minor_faults,gauge,,occurrence,second,The number of minor page faults per second for this process.,0,system,minor page faults, -system.processes.mem.page_faults.children_minor_faults,gauge,,occurrence,second,The number of minor page faults per second for children of this process.,0,system,children minor page faults, -system.processes.mem.page_faults.major_faults,gauge,,occurrence,second,The number of major page faults per second for this process.,0,system,major page faults, -system.processes.mem.page_faults.children_major_faults,gauge,,occurrence,second,The number of major page faults per second for children of this process.,0,system,children major page faults, +system.processes.mem.page_faults.minor_faults,gauge,,occurrence,second,In Unix/Linux and macOS: The number of minor page faults per second for this process.,0,system,minor page faults, +system.processes.mem.page_faults.children_minor_faults,gauge,,occurrence,second,In Unix/Linux and macOS: The number of minor page faults per second for children of this process.,0,system,children minor page faults, +system.processes.mem.page_faults.major_faults,gauge,,occurrence,second,In Unix/Linux and macOS: The number of major page faults per second for this process.,0,system,major page faults, +system.processes.mem.page_faults.children_major_faults,gauge,,occurrence,second,In Unix/Linux and macOS: The number of major page faults per second for children of this process.,0,system,children major page faults, system.processes.mem.pct,gauge,,percent,,The process memory consumption.,0,system,processes mem pct, system.processes.mem.real,gauge,,byte,,The non-swapped physical memory a process has used and cannot be shared with another process (Linux only).,0,system,processes mem real, system.processes.mem.rss,gauge,,byte,,"The non-swapped physical memory a process has used. aka ""Resident Set Size"".",0,system,processes mem rss,