This is everything that we need to know to work on the baremetal kubernetes cluster. This information is considered out of date and pertains to the old flock cluster. This document remains for archival purposes.
- Introduction
- Roadmap
- Networking
- Publishing Services
- Netbooting
- Adding Nodes
- Installing JupyterHub and BinderHub
- Accessing the Cluster
- Monitoring the Cluster
- Alerting for the Cluster
- Securing the Cluster
- Updating Rooster
- Customizing the Cluster
- Literature List for learning resources.
- Useful Commands
This guide is organized in a way so one could build a Kubernetes cluster running JupyterHub from the beginning. We suggest reading more about Kubernetes first; there are some sources from the Literature List section you could follow.
In the rest of the docs, we may refer to the management node by its hostname, rooster.
This node is not part of the kubernetes cluster, but acts as a gateway to the Internet, runs a dhcp server, and hosts the network boot stuff.
We are calling the basic nodes chicks. These will be our masters and workers in the kubernetes cluster.
Each has hostname chick{i}
where i is a natural number.
Currently have chick0
through chick10
so 11 in total.
Assinging static IPs for chick{i}
of 10.0.0.{i + 100}
. So chick0
will be
at 10.0.0.100
and chick1
at 10.0.0.101
, etc.
To test out your cluster, try running a test deployment and see if you can access the server from every node. Or if you can access the public ip assigned by metalc from outside the network if you publish as type loadbalancer.
Basically, a todo list for the cluster and our development plan for the future.
This is stuff that needs to be done before all work can be done remotely.
- rack all servers and nfs server (remember to put RAM in all servers)
- put disks in the servers
- install OS (can only be done after disks are in)
- wiring for the networking
- networking -- though this technically can be done remotely, we might break
something while doing it so we should do it on-premis.
- write down the MAC addresses for the interfaces we use on all the nodes. We might also want to assign each node a static IP in our private network by changing the dhcp configuration on our management node.
- pod network fabric
- metallb (metal load balancer. See below)
Using kubeadm with the Ansible playbook to install everything. Right now this works in the dev-env, but we need to get it to bare metal. Preferably set it up so we can use the same playbook for both the development environment and the actual metal one.
We need to set up the network fabric in some way. Flannel is working in dev-env, but it has a lot of overhead because it uses ip tunneling. Something to consider is Calico, but we may need some more complicated config
This is where we will talk to the nfs server to allow for persistent volumes. It shouldn't be that different from the nfs-client setup we have in dev-env.
We can either create a default Storage Class, but it would be more secure and give us finer grained control if we configured all of them manually by passing values to the jupyterhub Helm chart. In this way we can allow for different persistence and rules within the same cluster, but it may not be a problem if we just namespace everything into separate namespaces.
MetalLB seems to be the move for this one. It makes it so when services are published as type LoadBalancer, they still work on bare metal and IPs are provided automatically. This seems like the best way to expose services to outside the cluster on bare metal without having to make any modifications to the underlying helm charts. Certainly the people at nginx ingress make this seem like the best option in these docs
Down the road, we need to configure high availability masters so we aren't as vulnerable to a master failing. The setup is outlined in the kubeadm docs. We can use either HA proxy, which Richard is familiar with, or maybe some sort of nginx proxy. Either way, we have to do this manually since this proxying must exist before kubectl is operational.
This will basically all the extra stuff we need to add to the cluster to make it feel like a cloud environment. For instance,
- install the dynamic nfs volume provisioner on the cluster
- install MetalLB on cluster
We need an automated testing framework where we can put in test to simulate a bunch of users. Maybe this could be cool: http://jmeter.apache.org/ But we need to do more complicated things, like have the clients run programs or make graphs.
Yuvi mentioned this and that they had a way to do so, so we should contact him when we are ready.
We also test the cluster under failure by bringing down nodes and seeing how the cluster responds
Once we have something fully deployed, we should put our work on the binderhub site under the deployments section becuase it says they are looking for more people to post their deployments on there.
Here is our basic setup for the nodes (not pods yet): (taken from LibreTexts/Documentation/network_drawing as a saved svg)
This is how all the computers will communicate with each other using kubernetes and how they will access the internet indirectly though the manager. The manager will be a load balancer, dhcp server, router with NAT, and our way assign IP addresses to services.
Uses blue ethernet cables. Plugged into the smart switch.
Manager is at 128.120.136.26
enp1s0 on all the machines (the one on the left but not the far left) except the manager has it on enp3s0 which is the one as shown in the diagram above.
log into the switch with screen /dev/ttyS0
on the management node
- use username: manager password: friend
We will have one management node and one dumb switch for this network. The management node will connect to it on its enp2s0(the ethernet port on the right) and its management interface (the one all the way to the left next to the usb ports). It will run a DHCP server on this network. The rest of the nodes will connect to this dumb switch only on their management interface.
Uses green ethernet cables.
This is the network so pods can communicate with each other. These will be running over the k8s network
We meet the k8s Network Policy using Calico because it is faster than flannel. Alternatively, we could do it by hand like is done in k8s the hard way
Using a pod CIDR of 10.244.0.0/16
Choosing calico. The manifest is in the home directory of the repo calico.yaml
.
In this we changed:
CALICO_IPV4_IPIP: "Never"
CALICO_IPV4POOL_CIDER: "10.244.0.0/16"
otp on the manager node
follwing a combo of mostly this: https://wiki.debian.org/PXEBootInstall#Preface and also https://help.ubuntu.com/18.04/installation-guide/amd64/ch04s05.html and a little bit of https://linuxhint.com/pxe_boot_ubuntu_server/
Using 10.0.0.0/24
for all the node IPs (chicks) so we have the manager as the
dhcp server also at 10.0.0.1
.
network booting seems to work only on the enp1s0 interface on nodes. This is currently the one on the left. You cannot boot on the management interface that is located on the far left of the machines. Furthermore, this didn't work initially on the smart switch, so you must make sure that there are not routes or something already configured that would cause unexpected behavior. You also need to make sure you are doing this on a private network where the manager is the only dhcp server. After they were initially booted, I was able to switch them all over to the smart switch and there were no problems.
-
sudo apt install isc-dhcp-server
-
to
/etc/default/isc-dhcp-server
I added the line:INTERFACESv4="enp3s0"
since enp3s0 is the interface that is hooked up to the management network. Here, we assume enp3s0 is the interface on the manager node that faces the internal kubernetes network.
-
to
/etc/netplan/01-netcfg.yaml
, or whatever the netplan file is I added the following under ethernets:enp3s0: addresses: [192.168.0.1/24] gateway4: 128.120.136.1 dhcp4: no nameservers: addresses: [192.168.0.1]
so we get that management interface up
-
netplan apply
-
before changing
/etc/dhcp/dhcpd.conf
copy the current one to/etc/dhcp/dhcpd.conf.backup
and set it to this
# the following is adapted from
# https://wiki.debian.org/PXEBootInstall#Preface
#
default-lease-time 600;
max-lease-time 7200;
allow booting;
allow bootp;
# in this example, we serve DHCP requests from 10.0.0.(3 to 253)
# and we have a router at 10.0.0.1
# these will be the name of the nodes.
subnet 10.0.0.0 netmask 255.255.255.0 {
range 10.0.0.3 10.0.0.99; # can't have 10.0.0.100 - 10.0.0.110 because we are
# using those for the chicks
option broadcast-address 10.0.0.255;
option routers 10.0.0.1; # this ends up being the default gateway router
# on the hosts. Set to the manager so we can NAT
option domain-name-servers 128.120.136.129,128.120.136.133,128.120.136.134;
filename "pxelinux.0";
}
group {
next-server 10.0.0.1; # our Server. was previously 128.120.136.1
host tftpclient {
filename "pxelinux.0"; # (this we will provide later)
}
}
-
systemctl restart isc-dhcp-server
to get the dhcp server making repsjournalctl -fu isc-dhcp-server -
checked the logs with
grep DHCP /var/log/syslog
and there were some requests and handouts, so thats good. -
sudo apt install tftpd-hpa
-
changed
/etc/default/tftpd-hpa
to have these two defaults:
TFTP_DIRECTORY="/srv/tftp"
TFTP_OPTIONS="--secure -vvv"
so we listen on our management net and not on the internet. ^- changed this, need to change it back after testing
-
sudo mkdir /srv/tftp
-
systemctl restart tftpd-hpa
and then test it -
wget http://archive.ubuntu.com/ubuntu/dists/bionic/main/installer-amd64/current/images/netboot/netboot.tar.gz
-
move netboot.tar.gz into
/srv/tftp
and runtar xvzf netboot.tar.gz
and make the contents readable withchmod -R a+r *
-
systemctl restart tftpd-hpa
-
start up the client machine and it should get to a boot screen.
-
apt get ufw
-
add the following to
/etc/ufw/before.rules
*nat
:POSTROUTING ACCEPT [0:0]
# send stuff out of the eth2 iface
-A POSTROUTING -o enp2s0 -j MASQUERADE
COMMIT
note that enp2s0 is the interface that faces the public internet
-
uncomment
net/ipv4/ip_forward=1
in/etc/ufw/sysctl.conf
-
systemctl restart ufw
-
sudo ufw allow tftp
so it can use the images
Note: If getting errors (like the nodes can't reach the internet, or
you're not having the 'right' Ubuntu mirror), change the IP forwarding
policy in /etc/default/ufw
to:
DEFAULT_FORWARD_POLICY="ACCEPT"
-
have it connected to enp1s0 which is the left ethernet port on the right side
-
power it on with the disks in. The install screen should come on. If not, you may have to change the boot priority order
-
go through the installation steps. Once it says "installing base system," that part takes like an hour so you can go do something else. After that its mostly done. Alternatively, you could use the preseed file to install the OS onto each chick with very little intervention. Check the next section on how to go about this.
-
after completing the installation, to get it to boot from disk, you have to turn off the network boot on the manager (rooster). So on rooster, run
systemctl stop tftpd-hpa
before rebooting your newly installed machine. After it boots, you can turn tftp back on.
With preseeding, you can install Ubuntu Server 18.04 using a preconfiguration file, without going through each installation step manually.
The preconfiguration file is located in the tftp server: /srv/tftp/pxelinux.cfg/default
.
Under label cli
lists the tasks and boot parameters needed to automate most of the
configuration.
The file srv/tftp/preseed.cfg
lists the preconfiguration options. We removed the
partitioning section of the preconfiguration file because we wanted to keep the
RAID arrays already in place of each chick.
In order to use preseeding, type in the command cli
after the boot:
prompt when pxelinux
shows up from booting from the network.
In /etc/dhcp/dhcpd.conf
, to each host, add option host-name "<HOSTNAME>";
to each host. This is for dhcp to replace the hostname of the computer. Alternatively,
you could type in cli hostname=<HOSTNAME>
when booting each chick.
Note: In /etc/hosts
, add the hostnames and IPs for each chick so you can
ssh into each one using their hostname.
For example:
10.0.0.100 chick0
10.0.0.101 chick1
Then run sudo systemctl restart sshd
.
First, check out Netbooting to get the OS installed. This section will cover what you have to do to get the node functioning in the kubernetes cluster after the os is already installed.
-
figure out the ip address that was assigned by the manager's dhcp server by checking out the logs on rooster. Logs are in
/var/log/syslog
for dhcp, so run something likegrep dhcp /var/log/syslog
and there will be mention of what ip it was assigned. -
Add the node to
chicks.csv
by manually adding the hostname and the ip address and other fields. Then, on rooster run./get_macs.py
and this will automatically fill in theenp1s0
andenp2s0
fields with the mac address on those interfaces. See the comments atget_macs.py
. -
Optionally assign a static ip address to the host by changing
/etc/dhcp/dhcpd.conf
on the master and adding the mac address and the ip address you want. See the comments and other examples in that file. Then runsystemctl restart isc-dhcp-server
. It will take a little while for the node's current ip lease to expire and for it to recieve the new IP, or you can runnetplan apply
and the host will reload its ip info from the router. -
Add it to the
hosts
file under the ansible directory. -
Provision all of them using the ansible playbook. From the
ansible/
directory, runanisble-playbook -i hosts playbooks/main.yml --ask-become-pass
. You sometimes have to change it to--ask-pass
and change it back. I dont know why. It might be a bug. If you are just adding one host and not provisioning the whole cluster, add the--limit "chick{i}
flag.
If you are adding a completely new node, add the --limit "chick{i}
flag,
then run the playbook workers.yml
with both the master and new chick node.
The first task will give you a fatal error for the task, join cluster
; this
is expected. (We can probably write another playbook for adding nodes, but would involve
a lot of copying and pasting.)
ansible-playbook -i hosts playbooks/main.yml --ask-become-pass --limit "chick{i}"
ansible-playbook -i hosts playbooks/workers.yml --ask-become-pass --limit "chick{i},master"
If you are adding a wiped node whose name is still in the cluster, i.e. the name of
the node still appears when running kubectl get nodes
, then delete the node first
by running kubectl delete node <node-name>
and completely wipe the node again.
Then follow the steps as if you were adding a completely new node.
If you are adding a node that has been detached (e.g. you restarted the system
on the node), then run sudo systemctl restart kubelet.service
. If you still have
trouble, this may help: Troubleshooting
These are notes about how services of type LoadBalancer
will be handled on our cluster.
MetalLB is a way to assign IPs to services from a pool of IP addresses.
Config is at metallb-config.yml
in the root of the project.
CELINE: we probably need to add a play in the ansible playbook on the master
group so it can install metallb
and also run kubectl apply metallb-config.yml
for the config.
our pool of public ips open on ports 80 and 443 are as follows:
128.120.136.32
128.120.136.54
128.120.136.55
128.120.136.56
128.120.136.61
We must use the layer 2 for metallb because calico is already using BGP to communicate its own routes. This problem is talked about here.
We still have a problem with getting IPs requests for any of the above public IPs forwarded through the manager node and to our switch. (once it gets to the switch, it should be fine since within this network, the ip will be correctly assigned with ARP by metalc).
The problem with this is that we need dhcp within this cluster and are running netboot on this network. So it might be best for it to be on an alternate interface and maybe we could do that later.
In this solution, each public IP above has a corresponding IP within the k8s network so that the manager can accept requests on the public network for all of the above public IPs and then forward them to the corresponding k8s "public" ip on the internal k8s network. Then MetalLB will use these internal k8s "public" ips to assign to services. This will allow the services to be publicly accessible.
This is the solution currently implemented and it works right now.
We have 128.120.136.{i}
forward to 10.0.1.{i}
internally.
On rooster, we listen on the public
network for all of the above public IPs. This is done by modifying
/etc/netplan/01-netcrg.yaml
as follows:
# public network
enp2s0:
addresses:
# IP assigned for rooster
- 128.120.136.26/24
# public ips that richard gave us to publish services
- 128.120.136.32/24
- 128.120.136.54/24
- 128.120.136.55/24
- 128.120.136.56/24
- 128.120.136.61/24
gateway4: 128.120.136.1
dhcp4: no
nameservers:
addresses: [128.120.136.129,128.120.136.133,128.120.136.134]
Apply the netplan configuration by running sudo netplan apply
.
Then for the forwarding, we use nginx and forward from public to private. The following
is part of /etc/nginx/nginx.conf
forwarding:
# this is where we forward to the "public" ips internally
# only did the first 3.
server {
listen 128.120.136.32;
location / {
proxy_pass http://10.0.1.32;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-NginX-Proxy true;
}
}
server {
listen 128.120.136.54;
location / {
proxy_pass http://10.0.1.54;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-NginX-Proxy true;
}
}
Apply the nginx configuration by running systemctl restart nginx.service
.
Finally, on metallb-config.yml
the pool of IPs are the internal "public" ips
beginning with 10.0.1.
.
Add the IP addresses to metallb-config.yml
and run kubectl apply -f metallb-config.yml
.
Update: Documentation on how to setup a physical NFS server using ZFS can be found here. If you don't have the necessary hardware or if you don't need to have a dedicated physical NFS server yet, just keep on reading.
NFS is needed to handle persistent volume claims. It allows persistence of files made by the nodes.
(Credit to Kevin's kube-dev-env)
In rooster, run sudo apt install nfs-kernel-server
to install the NFS server on the
host system. We will use /export
on rooster as the shared directory which the chicks
can access.
Add the following line to /etc/exports
:
/export 10.0.0.0/8(rw,fsid=0,async,no_subtree_check,no_auth_nlm,insecure,no_root_squash)
and run exportfs -a
.
To make sure the NFS mount is successful, run this command on rooster to allow anything
from the network of chicks to talk to rooster: ufw allow from 10.0.0.0/8 to 10.0.0.1
.
Without this command, the firewall won't allow you to mount NFS.
We want each chick to mount 10.0.0.1:/export
(on rooster) to /nfs
(locally on the chick
node). The Ansible Playbook already auto-mounts rooster to each chick by editing the
/etc/fstab
file, so you don't have to do this manually. If you do want to do it manually,
run the command sudo mount 10.0.0.1:/export /nfs
on each chick node.
The nfs-client-vals.yml
describes the values used for running the NFS client provisioner.
Run
helm install --name nfs-client-release stable/nfs-client-provisioner -f nfs-client-vals.yml
Later, we will have a physical NFS server.
Follow these instructions for setting up JupyterHub.
Follow these instructions for setting up BinderHub. The DockerHub container registry is under LibreTexts.
Because of how our cluster is setup with all internet traffic going through rooster before reaching the cluster, Nginx on rooster is setup as a reverse proxy to direct the inbound traffic to the right service running on our cluster(eg. JupyterHub, BinderHub...). We use the stream block for TCP traffic. The stream block allows Nginx to redirect encrypted traffic to the right service on the cluster where it will be decrypted accordingly. If we don't make use of the stream block funtion on Nginx, https traffic coming in meant for services on the cluster would never reach the cluster as Nginx would see encrypted traffic and try to perform a three-way handshake, which would obviously fail as the certificates are setup on the services themselves.
This differs from our previous nginx setup. Before, we had server and upstream blocks in an
https
block:
https {
...
server {
listen 128.120.136.32;
location / {
proxy_pass http://10.0.1.32;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-NginX-Proxy true;
}
}
...
}
However, this would not work since the certificates are set up on the services on the cluster, so the traffic cannot be decrypted.
Instead, we use a stream
block, as follows. Note that all IP addresses in the server blocks
have domain names assigned to them, so any traffic going to those domains are redirected accordingly.
stream {
upstream jupyterhub {
server 10.0.1.54:443;
}
upstream binder {
server 10.0.1.61:443;
}
upstream binderhub {
server 10.0.1.55:443;
}
server {
listen 128.120.136.54:443;
ssl_preread on;
proxy_pass jupyterhub;
}
server {
listen 128.120.136.56:443;
ssl_preread on;
proxy_pass binder;
}
server {
listen 128.120.136.55:443;
ssl_preread on;
proxy_pass binderhub;
}
}
For more info on NGINX reverse proxies, look here.
Still encountering errors? Maybe port 80 isn't open. More details here.
The documentation for BinderHub seems to suggest that it doesn't have a built in https functionality like JupyterHub does. So we had to install manually the various components for https, credit to @kaseyhackspace:
- Install cert-manager.
# Install the CustomResourceDefinition resources separately
kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.8/deploy/manifests/00-crds.yaml
# Create the namespace for cert-manager
kubectl create namespace cert-manager
# Label the cert-manager namespace to disable resource validation
kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io
# Update your local Helm chart repository cache
helm repo update
# Install the cert-manager Helm chart
helm install \
--name cert-manager \
--namespace cert-manager \
--version v0.8.1 \
jetstack/cert-manager
- Create cluster-issuer.yaml(NOTE: Using 'ClusterIssuer' as kind will allow cert-manager to issue certificates for services in any namespace)
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
name: letsencrypt-production
spec:
acme:
# You must replace this email address with your own.
# Let's Encrypt will use this to contact you about expiring
# certificates, and issues related to your account.
email: <email-address>
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
# Secret resource used to store the account's private key.
name: letsencrypt-production
http01: {}
- Apply issuer with kubectl
kubectl apply -f binderhub-issuer.yaml
- Install nginx-ingress controller
helm install stable/nginx-ingress --name quickstart
- Point your domain to the loadbalancer external IP of the nginx-ingress controller, 10.0.1.61 on k8s in our case
kubectl get svc -n <NAMESPACE OF INGRESS CONTROLLER>
- Append ingress object on top level indentation in your config.yaml
config:
BinderHub:
use_registry: true
image_prefix: <dockerhub prefix>
hub_url: <jupyterhub-url>
ingress:
enabled: true
hosts:
- <domain-name>
annotations:
ingress.kubernetes.io/ssl-redirect: "true"
kubernetes.io/ingress.class: nginx
kubernetes.io/tls-acme: "true"
certmanager.k8s.io/issuer: letsencrypt-production
https:
enabled: true
type: nginx
tls:
- secretName: <domain-name>-tls
hosts:
- <domain-name>
- Perform helm upgrade to enable ingress
helm upgrade binderhub jupyterhub/binderhub --version=0.2.0-3b53fce -f secret.yaml -f config.yaml
- Wait for 10~ minutes, it takes some time for it to acquire a certificate.
To access the cluster, you can run the command ssh <rooster's IP address> -D 4545
.
Alternatively, if you have putty, you can SSH into rooster.
In putty, click the upper left, go to Change Settings. In the left menu, go to SSH, then Tunnels
to add a new port forwarding rule.
For Source port, type 4545
.
Select Dynamic
. Click Add.
After SSHing into the cluster, go to Mozilla Firefox, go to Tools, then Options.
Under Network Settings, click Settings.
Select Manual proxy configuration. In SOCKS Host, enter localhost
. In Port, enter 4545
. Select SOCKSv4.
Go to http://10.0.1.54 or http://10.0.1.55 to access JupyterHub or BinderHub respectively.
To access other services, run kubectl get service -A
and go to one of the External IP
's.
Note that BinderHub has an "underlying JupyterHub" it uses to create non-persistent notebooks.
This JupyterHub does not seem to be accessible on its own. Hence when you type
kubectl get services -A
, the proxy-public
load balancer under the binderhub
namespace
corresponds to the underlying JupyterHub and the binder
load balancer corresponds to Binder.
We decided to deploy prometheus-operator
as it takes care of setting up both the Prometheus deployment and the Grafana deployment for us.
Before installing the chart with helm, we changed the settings of the values.yaml file to enable ingress for Grafana specifically.
NOTE: Prometheus-operator seems to have an issue where upgrading the helm deployment deletes all the user data in Grafana, for now make sure to add all the settings you want in the beginning to avoid upgrading in the future. I suggest you take a look at our next section on alerting before installing Grafana.
We created a folder called monitoring to store all of our yaml configuration files.
You can change any of the default values in the values.yaml file and put it in a separate yaml file that can be applied during the installation. Our yaml file looks like this:
grafana:
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
ingress.kubernetes.io/ssl-redirect: "true"
certmanager.k8s.io/issuer: letsencrypt-production
kubernetes.io/tls-acme: "true"
hosts:
- grafana.libretexts.org
path: /
tls:
- secretName: grafana.libretexts.org-tls
hosts:
- grafana.libretexts.org
We enable ingress so that our nginx controller pod can connect to the right endpoint for Grafana.
We can check that the ingress is pointing at the endpoint for Grafana by running kubectl get ingress -n <NAMESPACE>
,
and then using kubectl describe ingress <NAME OF INGRESS> -n <NAMESPACE>
to get something like:
Name: prometheus-operator-grafana
Namespace: monitoring
Address:
Default backend: default-http-backend:80 (<none>)
TLS:
grafana.libretexts.org-tls terminates grafana.libretexts.org
Rules:
Host Path Backends
---- ---- --------
grafana.libretexts.org
/ prometheus-operator-grafana:80 (10.244.85.133:3000)
Annotations:
ingress.kubernetes.io/ssl-redirect: true
kubernetes.io/ingress.class: nginx
kubernetes.io/tls-acme: true
certmanager.k8s.io/issuer: letsencrypt-production
Events: <none>
Under the 'Host', 'Path', 'Backends', we can see that our domain name points to our Grafana endpoint. Checking
with the command kubectl get ep -n <NAMESPACE>
, we can confirm that the endpoint is correct:
NAME ENDPOINTS AGE
prometheus-operator-grafana 10.244.85.133:3000 3d13h
Once we confirm that the ingress is setup properly, we can move on to the last step. We used cert-manager to secure access to Grafana over the web. We created a seperate yaml file called 'certificate.yaml' that communicates with our cert-manager that we had setup already to assign a certificate for HTTPS communnication.
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: grafana.libretexts.org-tls
spec:
secretName: grafana.libretexts.org-tls
dnsNames:
- grafana.libretexts.org
acme:
config:
- http01:
ingressClass: nginx
domains:
- grafana.libretexts.org
issuerRef:
name: letsencrypt-production
kind: ClusterIssuer
Run with kubectl create -f <FILE>
(this assumes that the cert-manager is of kind ClusterIssuer), and cert-manager
will take care of the rest.
In our setup, since we are using nginx as a proxy to our cluster, we changed our nginx.conf and lb file accordingly to point traffic for 'grafana.libretexts.org' to our nginx controller on the cluster.
For basic alerts on the cluster, we have decided to use Grafana built in alerting because it is easy to setup and use.
Grafana supports a variety of channels to send notifications with, we setup a Slack channel and an email channel.
The latest version of Grafana has a built-in templating feature where it allows the user to use a 'template' variable instead of a hardcoded one, allowing for a better user experience. However, Grafana doesn't support the use of templates when alerting. A workaround is to create specific dashboards with hardcoded values for alerting, and use separate dashboards with templates for actual monitoring.
The alerts to check whether the website is reachable (DNS, ping, and HTTPS) are from this template. You may change the thresholds for the alerts by editing the panel, and going to Alerts.
In order to setup the email channel for notifications, it requires a SMTP server. You can use your own SMTP server if you have one on your server, if not, you can use a third party one. We will be using the SMTP server that comes with a gmail account for simplicity. In order to enable email notifications, we need to add some settings to the configuration file for our prometheus-operator:
grafana:
grafana.ini:
smtp:
enabled: true
host: "smtp.gmail.com:587" #gmail
user: "[email protected]" #email
password: <gmail password>
These configurations were enough for us to setup gmail for SMTP. For different SMTP setups, more settings can be found here. After installing prometheus-operator with these settings, one can follow these instructions to setup the alert channels on Grafana.
NOTE: If you are running a firewall, make sure to open the ports used by SMTP, we use 587 here.
To set up a Slack alerts channel, you will need to create a Slack app and webhook. Create a Slack App or edit an existing one already exists.
Activate Incoming Webhooks in the Slack App settings, and copy the incoming webhook to the Grafana Notification Channel settings. Under OAuth & Permissions in the Slack App settings, copy the Bot User OAuth Access Token into the Token entry of the Grafana Notification Channel form.
After the alert channels are setup, one can move on to creating the alerts. We organized our alerts in a separate 'Alerts' folder from the rest of the dashboards used for monitoring.
We created two different kinds of dashboards:
- A dashboard which keep track of pinging our domains to make sure they are still up. This is templated from the WorldPing plugin.
- Custom dashboards displaying CPU and memory data for individual nodes and the cluster as a whole.
For pinging the domain names, install and enable the WorldPing plugin if you haven't already. This is done by going into Configurations in the left toolbar and select Plugins. After searching for WorldPing, the plugin will prompt you to either generate an API key or login to your account. Currently, it is using an API key from Celine's account.
Upon enabling WorldPing, you should be able to set up endpoints. In the WorldPing page on the left toolbar, add a new endpoint pointing to your domain name. In our case, we set up two endpoints to our JupyterHub and BinderHub domain names.
WorldPing’s free plan allows one million checks per month. Try to adjust the check intervals and the number of probes (the number of locations that they will ping from) to fit the limit.
Note that we deleted HTTPS and DNS alerts, only leaving the Ping alerts on. This is because the HTTPS and DNS alerts occasionally send 503 error alerts daily, but JupyterHub itself is not down.
We can use existing dashboards as templates for our alerts. We can click on the name of a panel at the top to get a drop down menu, and selecting 'edit' we can see the settings for a panel.
By looking at existing panels with template values, the values that look like '$cluster', we can get a sense of the queries and use them to create our own:
We replace the template variables with hardcoded values in our alerting dashboards:
As an example, the default query of Cluster Alerts Dashboard:
1 - avg(rate(node_cpu_seconds_total{mode="idle"}[1m]))
was adapted from the panels in the Dashboard, Kubernetes / Compute Resources / Cluster.
1 - sum(:node_memory_MemFreeCachedBuffers_bytes:sum{cluster="$cluster"}) / sum(kube_node_status_allocatable_memory_bytes{cluster="$cluster"})
Was changed to this:
1 - sum(:node_memory_MemFreeCachedBuffers_bytes:sum{}) / sum(kube_node_status_allocatable_memory_bytes{})
After we setup the panel by coping the templated panels, we can click on the bell to setup an alert. Setting up the alerts is pretty self-explanatory as we can see from this picture:
For our cluster, we have setup these alerts so far:
Data | Threshold |
---|---|
jupyter.libretexts.org / binder.libretexts.org | If it goes down or high ping |
Cluster | CPU/cores/RAM utilization exceeds 80% |
Nodes | CPU/RAM utilization exceeds 80% or a node goes offline |
Dashboards are available inside our private configurations repository under
grafana-dashboards
. To import, click Import from the +
icon in the
left toolbar. Upload the .json
files. New dashboard should be created and
Alerts should be automatically added under Alerting.
We followed How to Secure a Linux Server to secure and harden rooster. The following describes our choices for implementation.
We disabled using a password to log into rooster by uncommenting
PasswordAuthentication no
in /etc/ssh/sshd_config
. You can only
log in using an ssh key.
Generate a key using ssh-keygen
on your local computer. Copy
~/.ssh/id_rsa.pub
on your local computer to ~/.ssh/authorized_keys
on rooster.
Alternatively if you use PuTTY, you can use PuTTYgen to generate
a public/private key pair, and copy the public key into ~/.ssh/authorized_keys
on rooster. Then, double click the private key file to enter your
password and PuTTY will log into rooster using your key.
We chose not to use AllowGroups since we don't have many accounts for now. More info in this issue.
We uncommented PermitRootLogin prohibit-password
to allow automated
backups to Richard's server.
We also uncommented several lines in the file for security:
- Maximum authorization attempts: 6
- Turned off PAM authentication
- Turned off challenege authentication
Short Diffie-Hellman keys are less secure.
Make a backup of SSH's moduli file /etc/ssh/moduli:
```
sudo cp --preserve /etc/ssh/moduli /etc/ssh/moduli.$(date +"%Y%m%d%H%M%S")
```
Remove short moduli:
```
sudo awk '$5 >= 3071' /etc/ssh/moduli | sudo tee /etc/ssh/moduli.tmp
sudo mv /etc/ssh/moduli.tmp /etc/ssh/moduli
```
We did not enable 2FA for SSH. More info in this issue.
Followed these instructions.
We did not secure /proc
since there aren't many accounts. More info in
this issue.
We did set them up, but plan on doing them manually. Reasons include:
- Being able to let users know that we are performing the updates in case something bad happens
- Being there in case if something bad does happen
- Controlling the time to update
We followed these instructions. Our unattended upgrades configurations are stored in
/etc/apt/apt.conf.d/51myunattended-upgrades
.
Some systems generate predictable SSH keys, so this could help mitigate that.
sudo apt-get install rng-tools
We just installed the package for now.
List your UFW rules by running sudo ufw status numbered
.
Deleted the following rules, by calling sudo ufw delete <line #>
,
- Deleted traffic on 80 and 443 into enp2s0 and enp3s0 interfaces
allow in on enp3s0 to any port 80
allow in on enp3s0 to any port 443
allow in on enp2s0 to any port 80
allow in on enp2s0 to any port 443
- Delete traffic
allow 111
Some of these rules were added while trying to get JupyterHub to work.
Followed these instructions almost exactly. Here is psad's documentation. psad scans iptables for suspicious activity and automatically sends alerts to our email. DL stands for "danger level" of suspicious activity.
The following are the changes in the instructions we followed:
- Review and update configuration options in
/etc/psad/psad.conf
. Pay special attention to these:
Setting | Set To |
---|---|
EMAIL_ADDRESSES |
your email address(s) |
HOSTNAME |
your server's hostname |
ENABLE_AUTO_IDS |
ENABLE_AUTO_IDS N; |
ENABLE_AUTO_IDS_EMAILS |
ENABLE_AUTO_IDS_EMAILS N; |
EXPECT_TCP_OPTIONS |
EXPECT_TCP_OPTIONS Y; |
We chose not to enable auto IDS, which automatically blocks suspicious IP's. For now, we do not want to accidentally block a legitimate IP, like one from the cluster.
To whitelist, edit /etc/psad/auto_dl
:
<IP address> <danger level> <optional protocol or ports>;
Roughly followed these instructions.
- Install fail2ban
sudo apt install fail2ban
- We created
/etc/fail2ban/jail.local
and added the following:
[DEFAULT]
# the IP address range we want to ignore
ignoreip = 127.0.0.1/8 10.0.0.1/8 192.168.0.1/24
# who to send e-mail to
destemail = [our e-mail]
# who is the email from
sender = [our e-mail]
# since we're using exim4 to send emails
mta = mail
# get email alerts
action = %(action_mwl)s
- According to the instructions, we created an
sshd
jail by creating/etc/fail2ban/jail.d/ssh.local
and adding:
[sshd]
enabled = true
banaction = ufw
port = ssh
filter = sshd
logpath = %(sshd_log)s
maxretry = 5
However, after executing the following, I get a noduplicates
error.
sudo fail2ban-client start
sudo fail2ban-client reload
sudo fail2ban-client add sshd
Although running sudo fail2ban-client status
shows that the sshd
jail is
active, probably from the sshd jail in the default file
/etc/fail2ban/action.d/defaults-debian.conf
.
Attackers can install rootkits, which allow them to gain access to a system without the owner noticing. Followed these instructions almost exactly.
Exception:
4. The value PHALANX2_DIRTEST
was not set to 1. There is not much documentation on this
so I decided to leave it alone.
Another change was made to match with SSH login as root, as described in /etc/ssh/sshd_config
.
Note that rkhunter in Ubuntu comes with cron scripts, which you can find in /etc/cron.daily/rkhunter
.
Rkhunter will email a daily report.
This article describes the differences between rkhunter and chrootkit. It recommends to use both!
Followed these instructions to install.
- When running
sudo dpkg-reconfigure chkrootkit
, I answeredYes
to the first question and left the second question blank. The default answer to the second question is-q
, mode. The third question was answered withYes
.
Followed these instructions to install.
As of now, it sends emails to root.
After weighting the pros and cons of using unattended updates against manual updates, we have decided to go with the manual updates. With manual updates, we can schedule a time ahead of time and let the users know to expect downtime. Also, since the updates are manual, we will be there to fix anything that goes wrong during the update. On the other hand, if anything went wrong during an unattended update, no one would be there to fix a problem if Rooster went down.
- Decide on a time to run the update and send out an announcement to users letting them know ahead of time.
- Send out a second reminder on the day of the scheduled update.
- SSH into rooster and perform a dry run of the updates first:
sudo unattended-upgrade -d --dry-run
, check if there are any potential errors that could occur during the update. - If everything looks good from performing the dry run, then run the actual update:
sudo unattended-upgrade -d
- If anything goes wrong, then troubleshoot the problem, otherwise perform some basic tests like trying to connect to a service on the cluster to confirm that everything is working.
- After we are sure that every service is working on the cluster, we will send out a message to our users to let them know our services are back online.
There are two ways to whitelist users: through the configuration file or on JupyterHub as an admin user. More information can be found in the documentation.
Currently, we use the JupyterHub admin interface to manage access to our Hub. Before adding users to the whitelist, we must ensure that they are affiliated with UC Davis or LibreTexts. If you are unsure, be sure to ask someone else on the JupyterTeam. Even though we are using the online interface to manage user access, the whitelist must remain within the configuration file or else it will allow any users to access the Hub by default.
SSH into rooster. Open the JupyterHub configuration file in ~/jupyterhub/config.yaml
.
There will be a block that looks like this:
auth:
type: google
admin:
access: true
users:
- [email protected]
whitelist:
users:
- [email protected]
- [email protected]
Add user emails in the whitelist
section under users
.
Note that anyone who is added under admin
will have admin privileges and will
automatically be whitelisted.
After editing config.yaml
, upgrade JupyterHub by running these commands in the
~/jupyterhub
folder (as specified in the documentation:
RELEASE=jhub
helm upgrade $RELEASE jupyterhub/jupyterhub \
--version=0.9-2d435d6 \
--values config.yaml
When you log into JupyterHub, go to the Control Panel (Hub -> Control Panel) if you haven't already. Click the Admin tab on the navigation bar.
You can add email addresses by clicking Add Users. Be careful of the other buttons!
Refer to this Discourse post for information on editing the login page.
You only need to complete this section once. To edit or add html files, go to the next section.
The Discourse post includes two approaches to editing the login templates.
We decided to use Init Containers,
which must run to completion before a pod launches, i.e. a pod has the Running
status.
In this setup, the Init Container uses the alpine/git
Docker image to git clone
a repository of custom templates and mounts this
volume to the hub-xxx
pod. The volume is mounted to /etc/jupyterhub/templates
of the hub-xxx
pod.
We mount two volumes and clone one repository:
- jupyterhub-templates,
containing custom html templates and static files. We mount
custom-templates
andcustom-templates-static
Check out the README.md within the jupyterhub-templates repo for more information.
Our relevant portion of config.yaml
looks like this:
# Clone custom JupyterHub templates into a volume
initContainers:
- name: git-clone-templates
image: alpine/git
command:
- /bin/sh
- -c
args:
- >-
git clone --branch=master https://github.com/LibreTexts/jupyterhub-templates.git &&
cp -r jupyterhub-templates/templates/* /templates &&
cp -r jupyterhub-templates/static/* /static
volumeMounts:
- name: custom-templates
mountPath: /templates
- name: custom-templates-static
mountPath: /static
extraVolumes:
- name: custom-templates
emptyDir: {}
- name: custom-templates-static
emptyDir: {}
extraVolumeMounts:
- name: custom-templates
mountPath: /etc/jupyterhub/templates
- name: custom-templates-static
mountPath: /usr/local/share/jupyterhub/static/external
extraConfig:
templates: |
from jupyterhub.handlers.base import BaseHandler
class AboutHandler(BaseHandler):
def get(self):
self.write(self.render_template("about.html"))
class FAQHandler(BaseHandler):
def get(self):
self.write(self.render_template("faq.html"))
c.JupyterHub.extra_handlers.extend([
(r"about", AboutHandler),
(r"faq", FAQHandler),
])
c.JupyterHub.template_paths = ['/etc/jupyterhub/templates']
After adding this to config.yaml
, run the command ./upgrade.sh
from within the ~/jupyterhub/
directory to apply the changes.
Important note: you must upgrade JupyterHub to a development release
later than this pull request.
config.yaml
supports Init Containers after the stable
release of 0.8.2.
-
Clone jupyterhub-templates.
git clone https://github.com/LibreTexts/jupyterhub-templates.git
-
Edit or add html files in the
templates
folder ofjupyterhub-templates
.Useful resources:
- Working with templates and UI from the JupyterHub documentation.
- How to extend Jinja2 templates
Additional note: The images are mounted at
/usr/local/share/jupyterhub/static/external
. If you specify an image locally in an html file, use the prefix/external/images/<path-in-repo-to-image>
. Within the jupyter-templates repo,/images/
is located in the/static/
folder. For example,<img src="{{ static_url("external/images/faq/terminal.png") }}" alt="Finding the Terminal">
The curly brackets are a jinja rendering feature and must be used to load the image properly.
-
After editing your files, commit and push to the master branch of the repositories.
git add * git commit -m "<your commit message>" git push
-
For the changes to appear, recreate the
hub-xxx
pod to rerun the Init Containers. This can be done one of two ways: a. by deleting the pod to force recreation,$ kubectl get pods -n jhub | grep "hub" hub-<random string> .... $ kubectl delete pod hub-<random string> -n jhub
b. or by recreating all pods.
helm upgrade $RELEASE jupyterhub/jupyterhub \ --version=0.9-2d435d6 \ --values config.yaml \ --recreate-pods
Using the Jinja2 templating system, we:
- extended the existing error.html by calling
{% extends "templates/error.html" %}
(templates
is a default folder inside the hub pod where JupyterHub looks for templates), - modified specific sections of the original file by using blocks, starting with
{% block h1_error %}
and ending with{% endblock h1_error %}
, - and included the contents of the original content of the block by calling
{{ super() }}
.
Note that Python syntax can also be used, as shown in the if-else statement.
{% extends "templates/error.html" %}
{% block h1_error %}
{% if status_code == 400 %}
{{ super() }}
<p>
Please login again from <a href="https://jupyter.libretexts.org">the home page</a>.
</p>
{% elif status_code == 500 %}
{{ super() }}
<p>
Oh no! Something is wrong on our end. If this problem persists, please email us</a>.
</p>
{% else %}
{{ super() }}
{% endif %}
{% endblock h1_error %}
For each user:
- limit: 4
- guarantee: 0.5
- limit: 8G
- guarantee: 1G
- 500 MB per user
6 cores per server / 0.5 core per user x 10 servers is approximately 100 users
Supports ~100 concurrent users at most. Rounded down since CPU is also needed for monitoring, etc.
2 TB of storage / 500 MB per user
If only using rooster's storage, we can support ~4000 accounts.
Place for us to add some useful reading we find
Obviously, the concepts section is probably the most valueable resource for learning about kubernetes. Services, Load Balancing, and networking is probably the most important aspect for our intents and purposes. Also, check out the /dev-env to give yourself a kubernetes cluster to mess with while learning.
A good intro blog on basics like containers and kubernetes: what is a kubelet
An introduction and overview of Kubernetes and its keywords: An Introduction to Kubernetes
Building a Kubernetes cluster using Ansible Playbooks: How to Build a Kubernetes Cluster Using Kubeadm on Ubuntu 18.04
A lab website where you can play with Kubernetes! Play With Kubernetes
Another Source for Setting Up Grafana and Prometheus
For when we have multiple masters: High Availability Clusters Using Kubeadm
Information on using Kubernetes and Load Balancing: One year using Kubernetes in production: Lessons learned
Understanding kubernetes networking: ingress
Removing worker nodes to update them
Keep your Kubernetes cluster balanced: the secret to High Availability
High Availability Clusters Using Kubeadm
Introduction to ports and IP addresses: TCP/IP Ports and Sockets Explained
Some info on NFS server setup: Install NFS Server and Client on Ubuntu 18.04
More on NFS: How to Set Up an NFS Mount
Nginx Reverse Proxy: TCP and UDP Load Balancing on Nginx
A post about pxelinux.cfg file setup for unattended installs of Ubuntu 18.04: Ubuntu 18.04 Unattended Setup
NFS Client Provisioner for setting up an automatic provisioner after you have the NFS server set up.
kubectl get service
lists the services of the clusters, with cluster IP, external IP, and ports. Likewise,kubectl get service -A
lists all services.kubectl get po -A
orkubectl get pod -A
lists all pods in the cluster.kubectl get pv -A
lists all persistent volumeskubectl get pvc -A
lists all persistent volume claims made (the requests by for physical storage in rooster)kubectl get logs <pod name> -n <namespace> -c <container>
gives the logs on a container (if applicable) in a podkubectl delete pod <pod name> -n <namespace>
will delete the pod specified. Note that the pod may regenerate depending on its settingskubectl describe <type> <name>
describes your objectkubectl exec <pod name> -n <namespace> -ti bash
enters the pod's command line- Example of patching a pod (in this case, making one a LoadBalancer):
kubectl patch svc "prometheus-operator-grafana" \
--namespace "monitoring" \
-p '{"spec": {"type": "LoadBalancer"}}'
tail /var/log/syslog
gives the latest updates on dhcp, ufw, etc.tail /var/log/apt/history.log
gives the logs for unattended upgrades