Skip to content

Commit

Permalink
[Tooling] Add pprof endpoints and documentation (#484)
Browse files Browse the repository at this point in the history
Co-authored-by: Daniel Olshansky <[email protected]>
  • Loading branch information
okdas and Olshansk committed Apr 25, 2024
1 parent 7284a8d commit 977c1f2
Show file tree
Hide file tree
Showing 19 changed files with 333 additions and 4 deletions.
17 changes: 16 additions & 1 deletion Tiltfile
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,9 @@ for x in range(localnet_config["relayminers"]["count"]):
# Run `curl localhost:PORT` to see the current snapshot of relayminer metrics.
str(9069 + actor_number)
+ ":9090", # Relayminer metrics port. relayminer1 - exposes 9070, relayminer2 exposes 9071, etc.
# Use with pprof like this: `go tool pprof -http=:3333 http://localhost:6070/debug/pprof/goroutine`
str(6069 + actor_number)
+ ":6060", # Relayminer pprof port. relayminer1 - exposes 6070, relayminer2 exposes 6071, etc.
],
)

Expand Down Expand Up @@ -295,6 +298,9 @@ for x in range(localnet_config["appgateservers"]["count"]):
# Run `curl localhost:PORT` to see the current snapshot of appgateserver metrics.
str(9079 + actor_number)
+ ":9090", # appgateserver metrics port. appgateserver1 - exposes 9080, appgateserver2 exposes 9081, etc.
# Use with pprof like this: `go tool pprof -http=:3333 http://localhost:6080/debug/pprof/goroutine`
str(6079 + actor_number)
+ ":6090", # appgateserver metrics port. appgateserver1 - exposes 6080, appgateserver2 exposes 6081, etc.
],
)

Expand Down Expand Up @@ -336,13 +342,22 @@ for x in range(localnet_config["gateways"]["count"]):
# Run `curl localhost:PORT` to see the current snapshot of gateway metrics.
str(9089 + actor_number)
+ ":9090", # gateway metrics port. gateway1 - exposes 9090, gateway2 exposes 9091, etc.
# Use with pprof like this: `go tool pprof -http=:3333 http://localhost:6090/debug/pprof/goroutine`
str(6089 + actor_number)
+ ":6060", # gateway metrics port. gateway1 - exposes 6090, gateway2 exposes 6091, etc.
],
)

k8s_resource(
"validator",
labels=["pocket_network"],
port_forwards=["36657", "36658", "40004"],
port_forwards=[
"36657",
"36658",
"40004",
# Use with pprof like this: `go tool pprof -http=:3333 http://localhost:6061/debug/pprof/goroutine`
"6061:6060",
],
links=[
link(
"http://localhost:3003/d/cosmoscometbft/protocol-cometbft-dashboard?orgId=1&from=now-1h&to=now",
Expand Down
155 changes: 155 additions & 0 deletions docusaurus/docs/develop/developer_guide/performance_troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
---
sidebar_position: 4
title: Performance troubleshooting
---

# Performance troubleshooting <!-- omit in toc -->

- [What is pprof](#what-is-pprof)
- [`pprof` and Dependencies - Installation](#pprof-and-dependencies---installation)
- [How to Use `pprof`](#how-to-use-pprof)
- [Available `pprof` Endpoints](#available-pprof-endpoints)
- [Configure Software to Expose `pprof` Endpoints](#configure-software-to-expose-pprof-endpoints)
- [Full Nodes and Validator Configuration](#full-nodes-and-validator-configuration)
- [AppGate Server and RelayMiner](#appgate-server-and-relayminer)
- [Save the Profiling Data](#save-the-profiling-data)
- [Explore the Profiling Data](#explore-the-profiling-data)
- [Explore without saving data](#explore-without-saving-data)
- [Report Issues](#report-issues)

If you believe you've encountered an issue related to memory, goroutine leaks,
or some sort of synchronization blocking scenario, `pprof` is a good tool to
help identify & investigate the problem.

It is open-source and maintained by Google: [google/pprof](https://github.com/google/pprof)

## What is pprof

`pprof` is a tool for profiling and visualizing profiling data. In modern Go versions,
it is included with the compiler (`go tool pprof`), but it can also be installed as a
standalone binary from [github.com/google/pprof](https://github.com/google/pprof).

```bash
go install
```

More information can be found in the [pprof README](https://github.com/google/pprof/blob/main/doc/README.md).

## `pprof` and Dependencies - Installation

1. [Required] `pprof` - Go compiler or standalone pprof binary:

1. pprof that comes with Golang is available via `go tool pprof`
2. A standalone binary can be installed with:

```bash
go install github.com/google/pprof@latest
```

2. [Optional] `graphviz` - Recommended for visualization. It can be skipped if you're not planning to use visualizations.
- [Installation guide](https://graphviz.readthedocs.io/en/stable/#installation)
- On MacOS, it can be installed with:
```bash
brew install graphviz
```
## How to Use `pprof`
`pprof` operates by connecting to an exposed endpoint in the software you want to profile.
It can create snapshots for later examination, or can show information in a browser
for an already running process.
We're going to use `go tool pprof` in the examples below, but if you installed a
standalone binary, just replace `go tool pprof` with `pprof`.

### Available `pprof` Endpoints

Before running `pprof`, you need to decide what kind of profiling you need to do.

The `pprof` package provides several endpoints that are useful for profiling and
debugging. Here are the most commonly used ones:

- `/debug/pprof/heap`: Snapshot of the memory allocation of the heap.
- `/debug/pprof/allocs`: Similar to `/debug/pprof/heap`, but includes all past memory allocations, not just the ones currently in the heap.
- `/debug/pprof/goroutine`: All current go-routines.
- `/debug/pprof/threadcreate`: Records stack traces that led to the creation of new OS threads.
- `/debug/pprof/block`: Displays stack traces that led to blocking on synchronization primitives.
- `/debug/pprof/profile`: Collects 30 seconds of CPU profiling data - configurable via the `seconds` parameter.
- `/debug/pprof/symbol`: Looks up the program counters provided in the request, returning function names.
- `/debug/pprof/trace`: Provides a trace of the program execution.

### Configure Software to Expose `pprof` Endpoints

:::warning Exposing pprof

It is recommended to never expose `pprof` to the internet, as this feature allows
operational control of the software. A malicious actor could potentially disrupt
or DoS your services if these endpoints are exposed to the internet.

:::

#### Full Nodes and Validator Configuration

In `config.toml`, you can configure `pprof_laddr` to expose a `pprof` endpoint
on a particular network interface and port. By default, `pprof` listens on `localhost:6060`.

If the value has been modified, you must restart the process.

#### AppGate Server and RelayMiner

Both `AppGate Server` and `RelayMiner` can be configured to expose a `pprof`
endpoint using a configuration file like this:

```yaml
pprof:
enabled: true
addr: localhost:6060
```

If any of these values have been modified, you must restart the process.

### Save the Profiling Data

You can save profiling data to a file using by running:

```bash
curl -o <NAME_OF_THE_FILE_TO_CREATE> http://<YOUR_PPROF_LADDR>/<PPROF_ENDPOINT>
```

For example, a command to save a heap profile looks like this:

```bash
curl -o heap_profile.pprof http://localhost:6061/debug/pprof/heap
```

That file can be shared with other people.

### Explore the Profiling Data

Now, you can use the file to get insights into the profiling data, including visualizations.
A command like this will start an HTTP server and open a browser:

```bash
go tool pprof -http=:PORT <path_to_profile_file>
```

For example, to open a `heap_profile.pprof` from the example above, you can run:

```bash
go tool pprof -http=:3333 heap_profile.pprof
```

### Explore without saving data

It is also possible to visualize `pprof` data without saving to the file. For example:

```bash
go tool pprof -http=:3333 http://localhost:6061/debug/pprof/goroutine
```

### Report Issues

If you believe you've found a performance problem, please [open a GitHub Issue](https://github.com/pokt-network/poktroll/issues). Make sure to attach the profiling data.
18 changes: 18 additions & 0 deletions docusaurus/docs/operate/configs/appgate_server_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ It is responsible for multiple things:
- [`signing_key`](#signing_key)
- [`listening_endpoint`](#listening_endpoint)
- [`metrics`](#metrics)
- [`pprof`](#pprof)

## Usage

Expand Down Expand Up @@ -135,3 +136,20 @@ metrics:
When `enabled` is set to `true`, the exporter is active. The addr `value` of
`:9090` implies the exporter is bound to port 9090 on all available network
interfaces.

### `pprof`

_`Optional`_

Configures a [pprof](https://github.com/google/pprof/blob/main/doc/README.md)
endpoint for troubleshooting and debugging performance issues.

Example configuration:

```yaml
pprof:
enabled: true
addr: localhost:6060
```

You can learn how to use that endpoint on the [Performance Troubleshooting](../../develop/developer_guide/performance_troubleshooting.md) page.
18 changes: 18 additions & 0 deletions docusaurus/docs/operate/configs/relayminer_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ and which domains to accept queries from._
- [`signing_key_name`](#signing_key_name)
- [`smt_store_path`](#smt_store_path)
- [`metrics`](#metrics)
- [`pprof`](#pprof)
- [Pocket node connectivity](#pocket-node-connectivity)
- [`query_node_rpc_url`](#query_node_rpc_url)
- [`query_node_grpc_url`](#query_node_grpc_url)
Expand Down Expand Up @@ -144,6 +145,23 @@ When `enabled` is set to `true`, the exporter is active. The addr `value` of
`:9090` implies the exporter is bound to port 9090 on all available network
interfaces.

### `pprof`

_`Optional`_

Configures a [pprof](https://github.com/google/pprof/blob/main/doc/README.md)
endpoint for troubleshooting and debugging performance issues.

Example configuration:

```yaml
pprof:
enabled: true
addr: localhost:6060
```

You can learn how to use that endpoint on the [Performance Troubleshooting](../../develop/developer_guide/performance_troubleshooting.md) page.

## Pocket node connectivity

```yaml
Expand Down
3 changes: 3 additions & 0 deletions localnet/kubernetes/values-appgateserver.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@ config:
metrics:
enabled: true
addr: :9090
pprof:
enabled: true
addr: localhost:6060
3 changes: 3 additions & 0 deletions localnet/kubernetes/values-gateway.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ config:
metrics:
enabled: true
addr: :9090
pprof:
enabled: true
addr: localhost:6060
3 changes: 3 additions & 0 deletions localnet/kubernetes/values-relayminer-common.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,6 @@ config:
query_node_grpc_url: tcp://validator-poktroll-validator:36658
tx_node_rpc_url: tcp://validator-poktroll-validator:36657
suppliers: []
pprof:
enabled: true
addr: localhost:6060
3 changes: 3 additions & 0 deletions localnet/poktrolld/config/appgate_server_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@ listening_endpoint: http://localhost:42069
metrics:
enabled: true
addr: :9090
pprof:
enabled: true
addr: localhost:6060
3 changes: 3 additions & 0 deletions localnet/poktrolld/config/appgate_server_config_example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,6 @@ metrics:
enabled: true
# The address that the metrics exporter will listen on. Can be just a port, or host:port
addr: :9090
pprof:
enabled: true
addr: localhost:6060
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@ listening_endpoint: http://0.0.0.0:42069
metrics:
enabled: true
addr: :9090
pprof:
enabled: true
addr: localhost:6060
3 changes: 3 additions & 0 deletions localnet/poktrolld/config/relayminer_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,6 @@ suppliers:
backend_url: http://anvil:8547/
publicly_exposed_endpoints:
- relayminers
pprof:
enabled: false
addr: localhost:6060
6 changes: 6 additions & 0 deletions localnet/poktrolld/config/relayminer_config_full_example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ metrics:
# The address (host:port or just port) for the metrics exporter to listen on.
addr: :9090

# Pprof endpoint configuration. More information:
# https://pkg.go.dev/github.com/google/pprof#section-readme
pprof:
enabled: false
addr: localhost:6060

pocket_node:
# Pocket node URL exposing the CometBFT JSON-RPC API.
# Used by the Cosmos client SDK, event subscriptions, etc.
Expand Down
7 changes: 7 additions & 0 deletions pkg/appgateserver/cmd/cmd.go
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,13 @@ func runAppGateServer(cmd *cobra.Command, _ []string) error {
}
}

if appGateConfigs.Pprof.Enabled {
err = appGateServer.ServePprof(appGateConfigs.Pprof.Addr)
if err != nil {
return fmt.Errorf("failed to start pprof endpoint: %w", err)
}
}

// Start the AppGate server.
if err := appGateServer.Start(ctx); err != nil && !errors.Is(err, http.ErrServerClosed) {
return fmt.Errorf("failed to start app gate server: %w", err)
Expand Down
23 changes: 22 additions & 1 deletion pkg/appgateserver/config/appgate_configs_reader.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ type YAMLAppGateServerConfig struct {
QueryNodeRPCUrl string `yaml:"query_node_rpc_url"`
SelfSigning bool `yaml:"self_signing"`
SigningKey string `yaml:"signing_key"`
Pprof YAMLAppGateServerPprofConfig `yaml:"pprof"`
}

// YAMLAppGateServerMetricsConfig is the structure used to unmarshal the metrics
Expand All @@ -24,6 +25,13 @@ type YAMLAppGateServerMetricsConfig struct {
Addr string `yaml:"addr"`
}

// YAMLAppGateServerPprofConfig is the structure used to unmarshal the config
// for `pprof`.
type YAMLAppGateServerPprofConfig struct {
Enabled bool `yaml:"enabled,omitempty"`
Addr string `yaml:"addr,omitempty"`
}

// AppGateServerConfig is the structure describing the AppGateServer config
type AppGateServerConfig struct {
ListeningEndpoint *url.URL
Expand All @@ -32,15 +40,23 @@ type AppGateServerConfig struct {
QueryNodeRPCUrl *url.URL
SelfSigning bool
SigningKey string
Pprof *AppGateServerPprofConfig
}

// AppGateServerMetricsConfig is the structure resulting from parsing the metrics
// section of the AppGateServer config file
// section of the AppGateServer config file.
type AppGateServerMetricsConfig struct {
Enabled bool
Addr string
}

// AppGateServerPprofConfig is the structure resulting from parsing the pprof
// section of the AppGateServer config file.
type AppGateServerPprofConfig struct {
Enabled bool
Addr string
}

// ParseAppGateServerConfigs parses the stake config file into a AppGateConfig
// NOTE: If SelfSigning is not defined in the config file, it will default to false
func ParseAppGateServerConfigs(configContent []byte) (*AppGateServerConfig, error) {
Expand Down Expand Up @@ -102,5 +118,10 @@ func ParseAppGateServerConfigs(configContent []byte) (*AppGateServerConfig, erro
Addr: yamlAppGateServerConfig.Metrics.Addr,
}

appGateServerConfig.Pprof = &AppGateServerPprofConfig{
Enabled: yamlAppGateServerConfig.Pprof.Enabled,
Addr: yamlAppGateServerConfig.Pprof.Addr,
}

return appGateServerConfig, nil
}
Loading

0 comments on commit 977c1f2

Please sign in to comment.