Skip to content

Commit

Permalink
(DOCS-5860)(DOCS-5649) Edit process docs (#15547)
Browse files Browse the repository at this point in the history
* (DOCS-5860)(DOCS-5649) Edit process docs

* Small edit

* Apply suggestions from code review

Co-authored-by: Brett Blue <[email protected]>

---------

Co-authored-by: Brett Blue <[email protected]>
  • Loading branch information
hestonhoffman and brett0000FF authored Aug 11, 2023
1 parent 4ec2ffe commit f2e6c66
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 38 deletions.
61 changes: 27 additions & 34 deletions process/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,63 +3,55 @@
## Overview

The Process Check lets you:

- Collect resource usage metrics for specific running processes on any host: CPU, memory, I/O, number of threads, etc.
- Use [Process Monitors][1]: configure thresholds for how many instances of a specific process ought to be running and get alerts when the thresholds aren't met (see **Service Checks** below).
- Collect resource usage metrics for specific running processes on any host. For example, CPU, memory, I/O, and number of threads.
- Use [Process Monitors][1] to configure thresholds for how many instances of a specific process should be running and get alerts when the thresholds aren't met (see **Service Checks** below).

## Setup

### Installation

The Process Check is included in the [Datadog Agent][2] package, so you don't need to install anything else on your server.
The Process check is included in the [Datadog Agent][2] package, so you don't need to install anything else on your server.

### Configuration

Unlike many checks, the Process Check doesn't monitor anything useful by default. You must configure which processes you want to monitor, and how.

Unlike many checks, the Process check doesn't monitor anything useful by default. You must configure which processes you want to monitor.

1. While there's no standard default check configuration, here's an example `process.d/conf.yaml` that monitors SSH/SSHD processes. See the [sample process.d/conf.yaml][3] for all available configuration options:
While there's no standard default check configuration, here's an example `process.d/conf.yaml` that monitors SSH/SSHD processes. See the [sample process.d/conf.yaml][3] for all available configuration options:

```yaml
init_config:

instances:

## @param name - string - required
## Used to uniquely identify your metrics as they are tagged with this name in Datadog.
#
- name: ssh

## @param search_string - list of strings - optional
## If one of the elements in the list matches, it returns the count of
## all the processes that match the string exactly by default. Change this behavior with the
## parameter `exact_match: false`.
##
## Note: Exactly one of search_string, pid or pid_file must be specified per instance.
#
search_string:
- ssh
- sshd
init_config:
instances:
- name: ssh
search_string:
- ssh
- sshd
```
Some process metrics require either running the Datadog collector as the same user as the monitored process or privileged access to be retrieved. Where the former option is not desired, and to avoid running the Datadog collector as `root`, the `try_sudo` option lets the Process Check try using `sudo` to collect this metric. As of now, only the `open_file_descriptors` metric on Unix platforms is taking advantage of this setting. Note: the appropriate sudoers rules have to be configured for this to work:
**Note**: After you make configuration changes, make sure you [restart the Agent][4].
```text
dd-agent ALL=NOPASSWD: /bin/ls /proc/*/fd/
```
Retrieving some process metrics requires the Datadog collector to either run as the monitored process user or with privileged access. For the `open_file_descriptors` metric on Unix platforms, there is an additional configuration option. Setting `try_sudo` to `true` in your `conf.yaml` file allows the Process check to try using `sudo` to collect the `open_file_descriptors` metric. Using this configuration option requires setting the appropriate sudoers rules in `/etc/sudoers`:

2. [Restart the Agent][4].
```shell
dd-agent ALL=NOPASSWD: /bin/ls /proc/*/fd/
```

### Validation

Run the [Agent's status subcommand][5] and look for `process` under the Checks section.

### Metrics notes

**Note**: Some metrics are not available on Linux or OSX:
The following metrics are not available on Linux or macOS:
- Process I/O metrics are **not** available on Linux or macOS since the files that the Agent reads (`/proc//io`) are only readable by the process's owner. For more information, [read the Agent FAQ][6].

The following metrics are not available on Windows:
- `system.cpu.iowait`
- `system.processes.mem.page_faults.minor_faults`
- `system.processes.mem.page_faults.children_minor_faults`
- `system.processes.mem.page_faults.major_faults`
- `system.processes.mem.page_faults.children_major_faults`

- Process I/O metrics are **not** available on Linux or OSX since the files that the Agent reads (`/proc//io`) are only readable by the process's owner. For more information, [read the Agent FAQ][6]
- `system.cpu.iowait` is not available on Windows.
**Note**: Use a [WMI check][11] to gather page fault metrics on Windows.

All metrics are per `instance` configured in process.yaml, and are tagged `process_name:<instance_name>`.

Expand Down Expand Up @@ -100,3 +92,4 @@ To get a better idea of how (or why) to monitor process resource consumption wit
[8]: https://github.com/DataDog/integrations-core/blob/master/process/assets/service_checks.json
[9]: https://docs.datadoghq.com/help/
[10]: https://www.datadoghq.com/blog/process-check-monitoring
[11]: https://docs.datadoghq.com/integrations/wmi_check/
8 changes: 4 additions & 4 deletions process/metadata.csv
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ system.processes.ioread_count,gauge,,read,,The number of disk reads by this proc
system.processes.iowrite_bytes,gauge,,byte,,The number of bytes written to disk by this process. In Windows: the number of bytes written by this process.,0,system,processes io w bytes,
system.processes.iowrite_bytes_count,count,,byte,,The number of bytes written to disk by this process. In Windows: the number of bytes written by this process.,0,system,processes io w bytes,
system.processes.iowrite_count,gauge,,write,,The number of disk writes by this process. In Windows: the number of writes by this process.,0,system,processes io w count,
system.processes.mem.page_faults.minor_faults,gauge,,occurrence,second,The number of minor page faults per second for this process.,0,system,minor page faults,
system.processes.mem.page_faults.children_minor_faults,gauge,,occurrence,second,The number of minor page faults per second for children of this process.,0,system,children minor page faults,
system.processes.mem.page_faults.major_faults,gauge,,occurrence,second,The number of major page faults per second for this process.,0,system,major page faults,
system.processes.mem.page_faults.children_major_faults,gauge,,occurrence,second,The number of major page faults per second for children of this process.,0,system,children major page faults,
system.processes.mem.page_faults.minor_faults,gauge,,occurrence,second,In Unix/Linux and macOS: The number of minor page faults per second for this process.,0,system,minor page faults,
system.processes.mem.page_faults.children_minor_faults,gauge,,occurrence,second,In Unix/Linux and macOS: The number of minor page faults per second for children of this process.,0,system,children minor page faults,
system.processes.mem.page_faults.major_faults,gauge,,occurrence,second,In Unix/Linux and macOS: The number of major page faults per second for this process.,0,system,major page faults,
system.processes.mem.page_faults.children_major_faults,gauge,,occurrence,second,In Unix/Linux and macOS: The number of major page faults per second for children of this process.,0,system,children major page faults,
system.processes.mem.pct,gauge,,percent,,The process memory consumption.,0,system,processes mem pct,
system.processes.mem.real,gauge,,byte,,The non-swapped physical memory a process has used and cannot be shared with another process (Linux only).,0,system,processes mem real,
system.processes.mem.rss,gauge,,byte,,"The non-swapped physical memory a process has used. aka ""Resident Set Size"".",0,system,processes mem rss,
Expand Down

0 comments on commit f2e6c66

Please sign in to comment.