Skip to content

luszczynski/openshift-router-monitoring

Repository files navigation

Openshift 3.11 Prometheus + Grafana Router Monitoring

Pre-req

Variables

export ROUTER_USERNAME=admin
export ROUTER_PASSWORD=admin
export ROUTER_MONITORING_PROJECT=router-monitoring

Router Configuration

Deployment Configuration

You need to make sure your router environment vars are as following:

  • STATS_PORT = 1936

  • ROUTER_METRICS_TYPE = haproxy

You can check that by running:

oc set env dc/router --list -n default | grep STATS_PORT
oc set env dc/router --list -n default | grep ROUTER_METRICS_TYPE

Now, choose some username and password and change them on your router by running:

oc set env dc/router \
  STATS_USERNAME=$ROUTER_USERNAME \
  STATS_PASSWORD=$ROUTER_PASSWORD \
  -n default

Wait the new version of your router be deployed. You can check if it is done by running:

watch oc get pods -n default

Let’s see if everything is right:

oc set env dc/router --list -n default | grep STATS_USERNAME
# output: STATS_USERNAME=admin

oc set env dc/router --list -n default | grep STATS_PASSWORD
# output: STATS_PASSWORD=admin

Testing Router metrics

To test if our metrics is working run the following commands:

router_pod=$(oc get pods -l deploymentconfig=router --no-headers -n default | head -1 | awk '{print $1}')
oc -n default exec $router_pod -- curl --silent -u $ROUTER_USERNAME:$ROUTER_PASSWORD localhost:1936/metrics

You should see something similar to this:

# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="21600"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="43200"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="86400"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="172800"} 0
...

Prometheus

You can’t customize the internal Openshift stack monitoring (prometheus and grafana) due to supportability issues. You need to install another prometheus on your Openshift cluster.

Install CRDs

For that, you can run:

# Install CRDs if necessary
oc create -f crds.yaml

Output:

Error from server (AlreadyExists): error when creating "crds.yaml": customresourcedefinitions.apiextensions.k8s.io "prometheusrules.monitoring.coreos.com" already exists
Error from server (AlreadyExists): error when creating "crds.yaml": customresourcedefinitions.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" already exists
Error from server (AlreadyExists): error when creating "crds.yaml": customresourcedefinitions.apiextensions.k8s.io "prometheuses.monitoring.coreos.com" already exists
Error from server (AlreadyExists): error when creating "crds.yaml": customresourcedefinitions.apiextensions.k8s.io "alertmanagers.monitoring.coreos.com" already exists
Note
You can ignore this errors. They will happen only when you already have all the CRDs necessary in your cluster.

Create project for monitoring

# Create project for router monitoring
oc new-project $ROUTER_MONITORING_PROJECT

Install Prometheus Operator

# Install operator on <YOUR NAMESPACE>
oc process -f prometheus-operator-template.yaml -p NAMESPACE=$ROUTER_MONITORING_PROJECT | oc create -f -

Output:

rolebinding.rbac.authorization.k8s.io/prometheus-operator created
role.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
serviceaccount/prometheus-operator created
prometheus.monitoring.coreos.com/prometheus created
service/prometheus created
serviceaccount/prometheus created
role.rbac.authorization.k8s.io/prometheus created
rolebinding.rbac.authorization.k8s.io/prometheus created
route.route.openshift.io/prometheus created

Make sure your Operator and Prometheus pods are running before moving on.

oc get pods -n $ROUTER_MONITORING_PROJECT

# Output
NAME                                   READY   STATUS    RESTARTS   AGE
prometheus-operator-7c75c8fb6b-k752m   1/1     Running   0          39s
prometheus-prometheus-0                3/3     Running   1          36s

Fix Permission

Now let’s give read access to the $ROUTER_MONITORING_PROJECT on the default project.

oc adm policy add-role-to-user view system:serviceaccount:$ROUTER_MONITORING_PROJECT:prometheus -n default
Note
It’s important to look the prometheus-operator and prometheus pod’s log to see if there is any permission issue. You can do that by running oc logs -f <pod> -c <container>

Create service monitor

# Create service monitor
oc process -f router-service-monitor.yaml \
  -p NAMESPACE=$ROUTER_MONITORING_PROJECT \
  -p ROUTER_USERNAME=$ROUTER_USERNAME \
  -p ROUTER_PASSWORD=$ROUTER_PASSWORD \
  | oc apply -f -

Prometheus target

Check if your router is showing on Prometheus targets page

03

Grafana

To install grafana, run:

./install-grafana.sh $ROUTER_MONITORING_PROJECT

Find the grafana URL by running:

oc get route -n $ROUTER_MONITORING_PROJECT

Grafana credentials:

  • User: admin

  • Pass: admin

Releases

No releases published

Packages

No packages published

Languages