Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build and action: add timescaledb #36

Merged
merged 42 commits into from
Sep 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
f58e454
build and config: add timescaledb
gsanchietti Aug 21, 2024
c56eb8c
systemd: fix webssh dependency
gsanchietti Aug 21, 2024
9ccd05a
grafana: add new monitoring dashboards
gsanchietti Sep 4, 2024
982c5a0
build: bump grafana version
gsanchietti Sep 11, 2024
e9f9479
grafana: provision timescale datasource
gsanchietti Sep 11, 2024
75733c0
configure: add maxmind_license
gsanchietti Sep 11, 2024
bd11415
README: add report info
gsanchietti Sep 11, 2024
c1161e1
add vpn dashboard
gsanchietti Sep 11, 2024
3c68ab7
fix: expanding tailscale workers to 100
Tbaile Sep 12, 2024
0977b90
feat: adding backup of timescale
Tbaile Sep 12, 2024
a947942
fix: solving backup issues
Tbaile Sep 12, 2024
ce55c2d
fix: hammering down restore process
Tbaile Sep 12, 2024
6b4f8f1
fix: dumping only data and restoring using psql
Tbaile Sep 12, 2024
1cf67d1
grafana: add VPN network traffic chart
gsanchietti Sep 16, 2024
8108257
fix: final touchup of database restore
Tbaile Sep 13, 2024
b3a7f83
feat: added grafana user to postgresql
Tbaile Sep 16, 2024
2c3d44c
feat: added configurable retention period
Tbaile Sep 16, 2024
12fa518
fix: using prometheus retention to configure api server
Tbaile Sep 16, 2024
bcb44d5
docs: added Timescale reference to `prometheus_retention`
Tbaile Sep 16, 2024
d063134
grafana: vpn, add missing charts
gsanchietti Sep 17, 2024
35379fc
fix: api, set GIN_MODE
gsanchietti Sep 17, 2024
6d24dd5
grafana: rename malware dashboard to security
gsanchietti Sep 17, 2024
cee4ad2
grafana: improve traffic_by_client
gsanchietti Sep 17, 2024
e3147aa
grafana: traffic dashboard, add host link
gsanchietti Sep 17, 2024
aaa632b
feat(ui): added maxmind token field and tooltip for metrics
Tbaile Sep 17, 2024
4e0260d
grafana: add connectivity dashboard
gsanchietti Sep 17, 2024
15ba2ac
grafana: update dashboard queries
gsanchietti Sep 18, 2024
49523a5
grafana: fix VPN in unit dashboard
gsanchietti Sep 18, 2024
89e8404
grafana: vpn, add chart from prometheus
gsanchietti Sep 18, 2024
a4b6c29
grafana: add promethus chart to connectivity
gsanchietti Sep 18, 2024
cd04ba5
grafana: connectivity, add total traffic chart
gsanchietti Sep 18, 2024
bad7c4f
fix(grafana): using best effort method to format dates in dashboards
Tbaile Sep 19, 2024
3db9d51
feat(grafana): added time range links for network traffic
Tbaile Sep 19, 2024
c57b874
grafana: add links to all dashboards
gsanchietti Sep 19, 2024
44fe6ab
grafana: fix network_traffic
gsanchietti Sep 19, 2024
279a276
grafana: dashboard multiple fixes
gsanchietti Sep 19, 2024
e6d42d9
grafana: make dashboard read-only
gsanchietti Sep 19, 2024
9923951
grafana: connectivity, add wan staus
gsanchietti Sep 19, 2024
488f45d
grafana: dashboards minor fixes
gsanchietti Sep 23, 2024
fa4ad12
build: update controller to 1.1.0
gsanchietti Sep 23, 2024
015363f
build: use timescale fixed version
gsanchietti Sep 23, 2024
5e27635
README: improve timescale connection info
gsanchietti Sep 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ The module is composed by the following containers:
- [loki](#loki): log storage, it stores logs from promtail
- [grafana](#grafana): metrics visualization, it visualizes metrics from prometheus and logs from loki
- [webssh](#webssh): web-based ssh client
- [timescale](#timescale): time-series database for storing metrics


## Install
Expand Down Expand Up @@ -42,11 +43,12 @@ Launch `configure-module`, by setting the following parameters:
- `api_user`: controller admin user
- `api_password`: controller admin password, change it after first login
- `loki_retention`: Loki retention period in days (default: ``180`` days)
- `promtail_retention`: Promtail retention period in days (default: ``15`` days)
- `prometheus_retention`: Promtail and Timescale retention period in days (default: 15 days)
- `maxmind_license`: [MaxMind](https://www.maxmind.com/) license key to download the GEO IP database, the database is loaded every time the API server is started

Example:

api-cli run module/nethsecurity-controller1/configure-module --data '{"host": "mycontroller.nethsecurity.org", "lets_encrypt": false, "ovpn_network": "172.19.64.0", "ovpn_netmask": "255.255.255.0", "ovpn_cn": "nethsec", "api_user": "admin", "api_password": "password", "loki_retention": 180, "prometheus_retention": 15}'
api-cli run module/nethsecurity-controller1/configure-module --data '{"host": "mycontroller.nethsecurity.org", "lets_encrypt": false, "ovpn_network": "172.19.64.0", "ovpn_netmask": "255.255.255.0", "ovpn_cn": "nethsec", "api_user": "admin", "api_password": "password", "loki_retention": 180, "prometheus_retention": 15, ""maxmind_license": "xxx"}'

The above command will:
- start and configure the nethsecurity-controller instance
Expand Down Expand Up @@ -160,6 +162,11 @@ It has also some pre-configured dashboards:
- nethsecurity.json: a dashboard with the most important metrics from the connected machines, like CPU, memory, disk, network, and system load
- logs.json: a dashboard where you can visualize the logs from all the connected machines and filter them by hostname, application, and priority
- loki.json: a dashboard with the most important metrics from Loki, like the number of logs ingested, the number of logs dropped, and the status of queriers
- network_traffic.json: this dashboard uses data from Timescale database and shows the global network traffic by unit
- network_traffic_by_client.json: this dashboard uses data from Timescale database and shows the network traffic by unit and client (a client is a machine connected to the unit local network)
- network_traffic_by_host.json: this dashboard uses data from Timescale database and shows the network traffic by unit and host (a host is a machien on the internet)
- malware.json: this dashboard uses data from Timescale database and shows the malware blocked by the unit
- vpn.json: this dashboard uses data from Timescale database and shows the VPN connections

Grafana is accessible at `https://<controller-host>/grafana/`, default credentials are the same set for the controller. You should change them on the first login.

Expand All @@ -169,6 +176,16 @@ Grafana is accessible at `https://<controller-host>/grafana/`, default credentia

Access to WebSSH is protected using a random generated URL, you can find it inside the module configuration file at `/home/nethsecurity-controller1/.config/state/config.json`.

### Timescale

[Timescale](https://docs.timescale.com/latest/main) is a time-series database for storing metrics. It's configured via environment variables and the configuration is available at `/home/nethsecurity-controller1/.config/state/db.env`.

You can connect to the database with the following command:
```
runagent -m nethsecurity-controller1
source db.env; podman exec -it timescale psql -U "${POSTGRES_USER}" -p "${POSTGRES_PORT}"
```

## Uninstall

To uninstall the instance:
Expand Down
9 changes: 5 additions & 4 deletions build-images.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,13 @@ images=()
repobase="${REPOBASE:-ghcr.io/nethserver}"
# Configure the image name
reponame="nethsecurity-controller"
controller_version="1.0.1"
controller_version="1.1.0"
promtail_version=2.7.1
loki_version=2.9.4
prometheus_version=2.50.1
grafana_version=10.3.3
grafana_version=11.2.0
webssh_version=1.6.2
timescale_version="2.16.1-pg16"

# Create a new empty container for webssh
echo "Build webssh container" # from https://github.com/huashengdun/webssh
Expand Down Expand Up @@ -74,8 +75,8 @@ buildah add "${container}" ui/dist /ui
# Setup the entrypoint, ask to reserve one TCP port with the label and set a rootless container
buildah config --entrypoint=/ \
--label="org.nethserver.authorizations=traefik@any:routeadm node:tunadm" \
--label="org.nethserver.tcp-ports-demand=10" \
--label="org.nethserver.images=ghcr.io/nethserver/nethsecurity-vpn:$controller_version ghcr.io/nethserver/nethsecurity-api:$controller_version ghcr.io/nethserver/nethsecurity-ui:$controller_version ghcr.io/nethserver/nethsecurity-proxy:$controller_version docker.io/grafana/promtail:$promtail_version docker.io/grafana/loki:$loki_version docker.io/prom/prometheus:v$prometheus_version docker.io/grafana/grafana:$grafana_version ghcr.io/nethserver/webssh:${IMAGETAG:-latest}" \
--label="org.nethserver.tcp-ports-demand=11" \
--label="org.nethserver.images=ghcr.io/nethserver/nethsecurity-vpn:$controller_version ghcr.io/nethserver/nethsecurity-api:$controller_version ghcr.io/nethserver/nethsecurity-ui:$controller_version ghcr.io/nethserver/nethsecurity-proxy:$controller_version docker.io/grafana/promtail:$promtail_version docker.io/grafana/loki:$loki_version docker.io/prom/prometheus:v$prometheus_version docker.io/grafana/grafana:$grafana_version ghcr.io/nethserver/webssh:${IMAGETAG:-latest} docker.io/timescale/timescaledb:$timescale_version" \
"${container}"
# Commit the image
buildah commit "${container}" "${repobase}/${reponame}"
Expand Down
25 changes: 24 additions & 1 deletion imageroot/actions/configure-module/20configure
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ with open('config.json', 'w') as cfp:
# Load subscription info
rdb = agent.redis_connect(privileged=False)
subscription = rdb.hgetall('cluster/subscription')
metrics_retention_days = request.get('prometheus_retention', '15')

with open('config.env', 'w') as env:
env.write(f'ADMIN_USER={request["api_user"]}\n')
Expand All @@ -119,6 +120,9 @@ with open('config.env', 'w') as env:
env.write(f'VALID_SUBSCRIPTION=true\n')
else:
env.write(f'VALID_SUBSCRIPTION=false\n')
if 'maxmind_license' in request:
env.write(f'MAXMIND_LICENSE={request["maxmind_license"]}\n')
env.write(f'RETENTION_DAYS={metrics_retention_days}\n')

server_address = request["ovpn_network"].removesuffix('.0') + '.1'
with open('promtail.env', 'w') as promtail:
Expand All @@ -140,11 +144,12 @@ with open('grafana.env', 'w') as gfp:
gfp.write("GF_SERVER_HTTP_ADDR=127.0.0.1\n")
gfp.write(f'GF_SECURITY_ADMIN_USER={request["api_user"]}\n')
gfp.write(f'GF_SECURITY_ADMIN_PASSWORD={request.get("api_password", config["api_password"])}\n')
gfp.write('GF_DATE_FORMATS_USE_BROWSER_LOCALE=true\n')

with open('prometheus.env', 'w') as pfp:
pfp.write(f"PROMETHEUS_PORT={ports[7]}\n")
pfp.write(f"PROMETHEUS_PATH={config['prometheus_path']}\n")
pfp.write(f"PROMETHEUS_RETENTION={request.get('prometheus_retention', '15')}d\n")
pfp.write(f"PROMETHEUS_RETENTION={metrics_retention_days}d\n")

with open('prometheus.yml', 'w', encoding='utf-8') as fp:
fp.write("global:\n")
Expand All @@ -159,6 +164,7 @@ with open('prometheus.yml', 'w', encoding='utf-8') as fp:
fp.write(f' - 127.0.0.1:{ports[5]}\n')

# Grafana configuration
db = agent.read_envfile('db.env')
with open('grafana.yml', 'w') as fp:
fp.write("apiVersion: 1\n")
fp.write("datasources:\n")
Expand All @@ -174,6 +180,23 @@ with open('grafana.yml', 'w') as fp:
fp.write(' access: proxy\n')
fp.write(f' url: http://127.0.0.1:{ports[5]}\n')

fp.write(' - name: Local Timescale\n')
fp.write(' type: postgres\n')
fp.write(' uid: timescale\n')
fp.write(f' url: 127.0.0.1:{db.get("POSTGRES_PORT")}\n')
fp.write(f' user: grafana\n')
fp.write(' secureJsonData:\n')
fp.write(f' password: {db.get("GRAFANA_POSTGRES_PASSWORD")}\n')
fp.write(' jsonData:\n')
fp.write(' database: report\n')
fp.write(' sslmode: disable\n')
fp.write(' maxOpenConns: 100\n')
fp.write(' maxIdleConns: 100\n')
fp.write(' maxIdleConnsAuto: true\n')
fp.write(' connMaxLifetime: 14400\n')
fp.write(' postgresVersion: 1500\n')
fp.write(' timescaledb: true\n')

network = agent.read_envfile('network.env')
tun = network.get('OVPN_TUN')
bits = sum(bin(int(x)).count('1') for x in request["ovpn_netmask"].split('.'))
Expand Down
8 changes: 7 additions & 1 deletion imageroot/actions/configure-module/validate-input.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
"ovpn_netmask": "255.255.0.0",
"ovpn_cn": "nethsec",
"loki_retention": 180,
"prometheus_retention": 15
"prometheus_retention": 15,
"maxmind_license": "1234567890"
}
],
"type": "object",
Expand Down Expand Up @@ -63,6 +64,11 @@
"type": "integer",
"description": "Retention policy for Prometehus metrics, default is 15 days",
"minimum": 1
},
"maxmind_license": {
"type": "string",
"description": "MaxMind API key, required for GeoIP database updates",
"minLength": 1
}
}
}
12 changes: 12 additions & 0 deletions imageroot/actions/create-module/20initialize
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,14 @@ promtail_port=$(($start+4))
# port 8 is reserved for prometheus
# port 9 is reserved for grafana
webssh_port=$(($start+9))
db_port=$(($start+10))

num=$(echo $MODULE_ID | sed 's/nethsecurity\-controller//')

jwt_secret=$(uuidgen | sha256sum | awk '{print $1}')
reg_secret=$(uuidgen | sha256sum | awk '{print $1}')
db_secret=$(uuidgen | sha256sum | awk '{print $1}')
grafana_postgres_password=$(uuidgen | sha256sum | awk '{print $1}')

cat << EOF > network.env
OVPN_UDP_PORT=$ovpn_udp_port
Expand All @@ -43,4 +46,13 @@ SECRET_JWT=$jwt_secret
REGISTRATION_TOKEN=$reg_secret
EOF

cat << EOF > db.env
POSTGRES_USER=report
POSTGRES_PORT=$db_port
POSTGRES_PASSWORD=$db_secret
GRAFANA_POSTGRES_PASSWORD=$grafana_postgres_password
REPORT_DB_URI=postgres://report:[email protected]:$db_port/report
TS_TUNE_MAX_BG_WORKERS=100
EOF

mkdir -p clients
12 changes: 12 additions & 0 deletions imageroot/actions/restore-module/30restore_database
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/sh

# Load database credentials
. ./db.env

# Wait for database to be ready pooling every 5 seconds
until podman exec timescale pg_isready -U "${POSTGRES_USER}" -p "${POSTGRES_PORT}" -d "${POSTGRES_DB}"; do
sleep 5
done

# Dump the DB from timescale
podman exec -i timescale psql -U "${POSTGRES_USER}" -p "${POSTGRES_PORT}" < backup.sql
5 changes: 5 additions & 0 deletions imageroot/bin/module-cleanup-state
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env sh

set -e

rm -f backup.sql
9 changes: 9 additions & 0 deletions imageroot/bin/module-dump-state
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env sh

set -e

# Load database credentials
. ./db.env

# Dump the DB from timescale
podman exec -i timescale pg_dump -U "${POSTGRES_USER}" -p "${POSTGRES_PORT}" > timescale.sql
Loading
Loading