Usage

Doberman is built around a single executable Monitor.py that has multiple uses. We'll go over their uses here. The core principle behind Doberman is that you want some function called at some interval. This function probably asks a device to read out one sensor, but there are also a large variety of other possiblities. Doberman is written around this concept, see this page for a discussion around this.

All Monitors respond quickly to ctrl-c and should shut down promptly. Sometimes it can take a bit longer if it's in the middle of something that can't be interrupted, but we're still talking a few seconds.

Deployment paradigm

We initially had everything running in systemd, but this proved to be difficult in some situations, mainly in handling some error states, and in starting things remotely. After a wildly successful experiment involving screen and an automated software controller in the XENONnT DAQ system, we changed the operational paradigm to have everything run in screen sessions. This makes automated control much simpler. This is discussed more in the Hypervisor section below.

Common things

All Monitors will periodically heartbeat with the database. The hypervisor periodically checks heartbeats from all running Monitors. If one isn't running that is supposed to be, it gets started. All monitors also listen for internal communications. The port numbers for this start at whatever value you specify for the hypervisor in the global dispatch, each device on each host gets its own port. You don't need to specify anything other than the hypervisor, they will happen automatically.

Interprocess communication

All communication routes through the dispatcher, which runs as part of the hypervisor. The dispatcher receives all messages, holds onto them for as long as is necessary, and then sends them to their destination (assuming the recipient is online). The message buffering is to support scheduling messages for some point in the future, usually this is a pipeline scheduling state changes but other uses also exist. Sending commands is done via a call to the database API (Database.log_command).

Each Monitor accepts different commands, so we'll cover them in their sections.

Startup script

A simple bash script is provided in the scripts subdirectory. It acts as a convenient way to automatically start things inside screen sessions. The main user of this script is the hypervisor, but it's also really useful for humans.

DeviceMonitor

You'll have one of these for each device you want read out. Start using the provided helper script ./start_process.sh -d <device> or manually with ./Monitor.py --device <device>. The helper script will start the process in a screen. The monitor will start up, connect to its device, and begin reading out all configured sensors.

Commands

Device monitors accept the following commands:

stop: stop and shutdown
set <quantity> <value>: tell the device to set <quantity> to <value>, whatever those are. <quantity> and <value> are forwarded to the device driver for it to deal with. Please note that <value> may not contain spaces but <quantity> can, so set valve 3 open will split into valve 3 and open, respectively, but set heater 1 max power will split into heater 1 max and power, which probably isn't what you wanted.

PipelineMonitor

There are three kinds of pipelines: alarm pipelines that handle alarm states, control pipelines to make changes to what the system is doing, and convert pipelines that do some mathematical operation on measurements (or combinations of them) and put them into the database. There are three kinds of pipeline monitors, one for each kind of pipeline. The hypervisor will start each of these when it starts up, but you can do it manually via ./start_process.sh --pipeline pl_<flavor> or ./Monitor.py --pipeline pl_<flavor>, where <flavor> is one of alarm control convert. There should only be one of each of these running at once.

Commands

Pipeline monitors take the following commands:

pipelinectl_start <name>: start the specified pipeline
pipelinectl_stop <name>: stop the specified pipeline
pipelinectl_restart <name>: restart the specified pipeline
pipelinectl_silent <name>: silence the specified pipeline
pipelinectl_active <name>: activate the specified pipeline
stop: stop the monitor and shut down all owned pipelines.

Note that the pipelinectl commands require that the specified pipeline is actually owned by the monitor handling the command (start obviously excluded).

AlarmMonitor

This is a Pipeline monitor that specializes in alarm pipelines. When alarm states are detected, alarm messages are created and distributed via the specifid methods. See the alarm page for more details on alarm distribution. Start with ./start_process.sh --alarm or ./Monitor.py --alarm.

Commands

None

Hypervisor

If something goes wrong and one of your readout machines crashes and reboots in the small hours of the morning (this is exceedingly rare but stay with me), do you want to get woken up by an alarm or only find out when you get to the lab after coffee, or do you want something to automatically restart everything? This is the job of the hypervisor. It makes sure that everything that's supposed to be running is running. It does this via commands over ssh, so be sure to have your ssh permissions set up. This is also why screen is more convenient than systemd; slow control doesn't need sudo-level permissions but systemd does, and running commands as root via ssh without a password is a staggeringly massive security risk. Screen doesn't have this limitation. The hypervisor keeps a list of everything that's currently running (specifically, Monitors add and remove themselves from this list on startup and shutdown), and also a list of things that are supposed to be running (the "managed" things). Things that are supposed to be running but aren't will get started. Note that the hypervisor isn't a panacea and if the problem is the readout device itself then there's not much it can do, but it is still very useful. The hypervisor also acts as the central dispatcher for interprocess communication, so while you don't need to give it devices to manage, it does still need to run. Also, the hypervisor will compress all logfiles older than one week. Start with ./start_process.sh --hypervisor or ./Monitor.py --hypervisor.

The obvious next question is 'quis custodiet ipsos custodes?'. Pretty much everything the hypervisor does is wrapped in try-except blocks so it's difficult for the hypervisor itself to crash. The machine hosting it can crash, but the average server running linux should be able to put out continuous years of uptime, and if it randomly crashes on a semiregular basis then the underlying hardware is probably faulty and you should replace it.

It should go without saying that there should only be one hypervisor running at any one time.

Commands

The hypervisor accepts the following commands:

start <name>: start whatever <name> is. If it isn't a device it's assumed to be a pipeline. If it's neither then what are you doing?
manage <name>: add <name> to the list of managed devices.
unmanage <name>: remove <name> from the list of managed devices.
kill <name>: whatever <name> is, take it out back and unceremoniously get rid of it. This forces an unclean shutdown. If <name> isn't a device it's assumed to be the name of a screen running on localhost.

Note that stop commands are issued directly to the Monitor in question, so the hypervisor doesn't need to get involved.

System startup

Here's how to actually bring the system online. This assumes you've already configured the databases appropriately. If you haven't, do that now. This also assumes all your databases restart automatically, and you don't need to do any manual networking nonsense.

Cleanup from dirty shutdown

Suppose your UPS runs out before the power comes back up. This is a dirty shutdown. You'll need to do one thing: start the hypervisor, if it didn't start automatically. This will return the system to whatever it was doing when lost power.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage

Usage

Deployment paradigm

Common things

Interprocess communication

Startup script

DeviceMonitor

Commands

PipelineMonitor

Commands

AlarmMonitor

Commands

Hypervisor

Commands

System startup

Cleanup from dirty shutdown

Clone this wiki locally