Skip to content

meow-watermelon/host-monitoring-station

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Host Monitoring Station

Intro

Host Monitoring Station(HMS) is my home-brew standalone monitoring system. This application can collect system metrics and then display graphs in a web page. HMS can ONLY monitor local host system metrics, it is NOT a distributed monitoring system. The reason why I built it is to give myself a simple and easy way to grasp system performance across several servers in my home.

HMS is a very lightweight monitoring system and can be running with minimum configuration efforts.

HMS uses RRDtool for local TSDB storage with Flask WSGI framework for the front-end web application.

Components

HMS is constructed by the following components:

  • RRD Databases Bootstrap Utility
  • System Metrics Poller
  • HMS Web Application

RRD Databases Bootstrap Utility is a utility to help users bootstrap RRD databases schema.

System Metrics Poller is an application to collect system metrics and write the values to local RRDtool TSDB.

HMS Web Application is the front-end web application to display RRD graphs. Users can use any WSGI server to run this web application. I shipped a uWSGI configuration file that can be running directly if users would like to use uWSGI as the WSGI server.

Package Structure

All source codes are located under the src directory. Please DO NOT change any filename or subdirectory name.

├── hms
│   ├── arp.py
│   ├── cpu.py
│   ├── disk.py
│   ├── graph.py
│   ├── __init__.py
│   ├── memory.py
│   ├── network.py
│   ├── os.py
│   ├── tcp.py
│   ├── udp.py
│   └── utils.py
├── hms_bootstrap_rrd.py
├── hms_metrics_poller.py
├── hms_web.py
├── hms_web_uwsgi.ini
├── static
│   ├── config
│   │   └── hms.yaml
│   └── rrd_graph
│       └── placeholder
└── templates
    └── hms.html

hms directory is the core module package of HMS. This module includes all necessary functions and classes to collect metrics and generate RRD graphs.

hms_bootstrap_rrd.py is the RRD Databases Bootstrap Utility.

hms_metrics_poller.py is the System Metrics Poller.

hms_web.py is the HMS Web Application.

hms_web_uwsgi.ini is a uWSGI configuration file that can be used for running HMS web application directly.

static directory is a place to save HMS configuration files and RRD graphs.

templates directory is a place for rendering HMS web page.

Dependencies

HMS is written in Python3.

Following Python packages are needed:

flask
importlib.util
markupsafe
rrdtool
uWSGI + python3 plugin[optional]
yaml

Installation and Configuration

In order to make the installation and configuration easier, I did not create any 3rd-party package. Users can clone the whole repository and configure some parameters to start running HMS. All commands should be running under src directory.

Please follow the instructions below to set up and run HMS:

  1. Clone the whole repository in a directory.
$ git clone https://github.com/meow-watermelon/host-monitoring-station.git
  1. Configure the HMS configuration file src/static/config/hms.yaml. In this file, please define RRD_DB_PATH variable to a proper directory to save RRD databases. Please ignore other variables now as those might be used for future version.
  2. Bootstrap RRD databases. Please use hms_bootstrap_rrd.py utility to bootstrap the RRD databases. Usage:
$ ./hms_bootstrap_rrd.py -h
usage: hms_bootstrap_rrd.py [-h] --dir DIR [--step STEP] [--component COMPONENT]

Host Monitoring Station RRD Database Bootstrap Tool

options:
  -h, --help            show this help message and exit
  --dir DIR             RRD database directory
  --step STEP           RRD database step (default: 1m)
  --component COMPONENT
                        Components to be bootstrapped (default: os,cpu,memory,disk,network,tcp,udp,arp)

The default RRD database step is 1 minute. It s a recommended value in HMS. Please do not change this unless you know what you are doing. Collecting and writing metrics every minute is reasonable for a local monitoring system.

  1. Set up the system metrics poller. The poller completes collecting metrics and writing values to RRD databases in a running cycle. Usage:
$ ./hms_metrics_poller.py -h
usage: hms_metrics_poller.py [-h] --config CONFIG

Host Monitoring Station Metrics Poller

options:
  -h, --help       show this help message and exit
  --config CONFIG  Host Monitoring Station config file

The time period between each polling MUST match the step defined in the bootstrap step. For example, if the step of RRD databases is 1 minute then the metrics poller must be triggered every minute. Here is an example of how I run the poller in a bash terminal:

while true; do ./hms_metrics_poller.py --config static/config/hms.yaml; sleep 60; done
  1. Set up RRD graphs retention policy. RRD graphs are generated in real-time and will be only used once. So it does not make sense to save all RRD graphs because the graphs are useless once the graphs are displayed in HMS web application. Users can simply use cron to trigger the deletion based on the graph files modification time. Here is an example of crontab I use on my laptop:
* * * * * find /home/ericlee/Projects/git/host-monitoring-station/src/static/rrd_graph -type f -name '*.png' -mmin +1 -exec rm -rf '{}' \;
  1. Once the metrics poller is running, the RRD databases will have system metrics stored in the RRD TSDB and can be displayed in the HMS web application. All RRD graphs are in PNG format. The default HTTP service port of HMS web application is 4080 and web server stats port is 4081. Users can adjust those parameters in hms_web_uwsgi.ini file. To start the HMS web application please run the following command under src directory:
$ uwsgi hms_web_uwsgi.ini

Once the HMS web application started, users can access the metrics graph via http://127.0.0.1:4080/hms. The default graph size is 900 x 300 pixels and display last 8 hours metrics. Users can query the historical data and display different graph size by using different URL query parameters. This will be covered by following section.

HMS Web Application Query Parameters

HMS web application supports 3 query parameters:

size: RRD graph size. The default one is medium size which is 900 x 300 pixels. There are also small and large which are 600 x 200 pixels and 1200 x 400 pixels.

start: RRD query start timestamp. The default is end-8h which is past 8 hours from the current time.

end: RRD query end timestamp. The default is now which is the current time.

For more information about start and end keywords please read the rrdgraph manual.

If start and / or end time span range from user input are not valid, HMS will use the default values for start and end parameters.

Metrics List

Category Metric Name Unit Description
OS loadavg_1min n/a 1 min load average
OS loadavg_5min n/a 5 min load average
OS loadavg_15min n/a 15 min load average
OS num_used_fd count number of occupied file descriptors
OS num_total_procs count number of total processes
OS num_running_procs count number of running processes
OS num_blocked_procs count number of blocked processes (e.g. I/O blocked)
OS num_zombie_procs count number of zombie processes
OS context_switch count/second number of context switches per second
CPU cpu_freq kHz CPU current running frequency
Memory memory_total kB total memory
Memory memory_free kB free memory
Memory memory_avail kB available memory
Memory buffer kB buffer
Memory cache kB cache
Memory swap_total kB total swap space
Memory swap_free kB free swap space
Disk read_io count/second number of read I/Os per second
Disk write_io count/second number of write I/Os per second
Disk read_merge count/second number of read I/Os merged per second
Disk write_merge count/second number of write I/Os merged per second
Disk read_sector sector/second number of sectors read per second
Disk write_sector sector/second number of sectors written per second
Disk in_flight count/second number of I/Os in flight per second
Network rx_bytes byte/second number of good received bytes per second
Network tx_bytes byte/second number of good transmitted bytes per second
Network rx_dropped packet/second number of packets received but dropped per second
Network tx_dropped packet/second number of packets dropped in transmission per second
Network rx_errors packet/second number of bad packets received per second
Network tx_errors packet/second number of bad packets transmitted per second
Network collisions count/second number of I/Os in flight per second
IPv4/IPv6 TCP ESTABLISHED count number of ESTABLISHED state sockets
IPv4/IPv6 TCP SYN_SENT count number of SYN_SENT state sockets
IPv4/IPv6 TCP SYN_RECV count number of SYN_RECV state sockets
IPv4/IPv6 TCP FIN_WAIT1 count number of FIN_WAIT1 state sockets
IPv4/IPv6 TCP FIN_WAIT2 count number of FIN_WAIT2 state sockets
IPv4/IPv6 TCP TIME_WAIT count number of TIME_WAIT state sockets
IPv4/IPv6 TCP CLOSE count number of CLOSE state sockets
IPv4/IPv6 TCP CLOSE_WAIT count number of CLOSE_WAIT state sockets
IPv4/IPv6 TCP LAST_ACK count number of LAST_ACK state sockets
IPv4/IPv6 TCP LISTEN count number of LISTEN state sockets
IPv4/IPv6 TCP CLOSING count number of CLOSING state sockets
IPv4/IPv6 TCP NEW_SYN_RECV count number of NEW_SYN_RECV state sockets
UDP InDatagrams datagram/second number of UDP datagrams delivered per second
UDP OutDatagrams datagram/second number of UDP datagrams sent per second
UDP InErrors datagram/second number of received UDP datagrams that could not be delivered per second
UDP NoPorts datagram/second number of received UDP datagrams for which there was no application at the destination port per second
ARP arp_cache_entries count number of ARP cache entries

Screenshots

I saved some example screenshots in the screenshots directory for reference.

Known Issues and Thoughts

  • UI is ugly! I know that and I'm really not a UI/UX expert.
  • No logs so far for all applications. I will add a logging facility in the next version.
  • I would add more metrics in the future version but the current metrics are pretty sufficient for my own use. If you have any suggestions on metrics please open a bug to me.
  • Better exception handling. The current version swallowed some exceptions to make the application run smoothly. I may write some customized exception classes in the future version for better debugging purposes.
  • If a new disk device or network interface is added into the host the graph won't display metrics of the newly added devices. Because the current version does not support dynamic data sources adjustment. This feature will be added soon.

Change Log

0.0.1
* initial commit

0.0.2 - 08/07/2022
* [issue#2] - add CPU frequency metric + graph feature

0.0.3 - 08/11/2022
* [issue#3] - use one RRA to store 1 year data metrics

0.0.4 - 08/13/2022
* [issue#1] - fix start / end time span range invalid issue

0.0.5 - 08/31/2022
* [issue#6] - add TCP metrics

0.0.6 - 09/04/2022
* [issue#4] - allow hms_bootstrap_rrd.py to bootstrap one or more data sources

0.0.7 - 09/05/2022
* [issue#8] - fix inaccurate num_total_procs metric issue

0.0.8 - 09/05/2022
* [issue#9] - catch exceptions if updating RRD database is failed

0.0.9 - 09/08/2022
* [issue#7] - add UDP metrics

0.0.10 - 12/22/2022
* [issue#10] - fix start and end parameters exception

0.0.11 - 01/28/2024
* [issue#13] - add ARP cache entries metric

About

A standalone monitoring system for local host

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published