pipe-tools
Software
www.skarnet.org
The npt-supervise program
npt-supervise starts and monitors a service. It is an
enhanced version of the
supervise
command.
Interface
npt-supervise s
- npt-supervise switches to the directory s.
- If the file ./down exists, npt-supervise does
nothing more, waiting for a command on its control pipe. The service is
wanted down until a control command tells npt-supervise
otherwise.
- If the ./down file does not exist, the service is
considered wanted up by default.
- Whenever the service is wanted up, npt-supervise
tries and start the ./run program.
- Whenever ./run exits, npt-supervise tries and
start ./finish if it exists. The exit code and exit status
from ./run is given as arguments to ./finish.
- When ./finish exits,
npt-supervise automatically restarts ./run if the
service is still wanted up.
Control
npt-supervise listens on the ./supervise/control
named pipe. When some external program, like
npt-svc or
svc,
writes on that pipe, npt-supervise reads the bytes one by one
and interprets them as control commands. Unknown commands are ignored.
The valid commands are the following:
- u
- Up. The service is marked wanted up. If it is
not running, start it. It will be automatically restarted when it exits.
- d
- Down. The service is marked wanted down. If the
service is running, send it a SIGTERM. If it was paused, send it a
SIGCONT too. Do not restart it when it exits.
- o
- Once. The service is marked wanted down, but
if it is not running, start it once.
- t
- Terminate. If the service is running, send it a SIGTERM.
- p
- Pause. If the service is running, send it a SIGSTOP.
- c
- Continue. If the service is running, send it a SIGCONT.
- a
- Alarm. If the service is running, send it a SIGALRM.
- b
- Abort. If the service is running, send it a SIGABRT.
- q
- Quit. If the service is running, send it a SIGQUIT.
- h
- Hangup. If the service is running, send it a SIGHUP.
- i
- Interrupt. If the service is running, send it a SIGINT.
- k
- Kill. If the service is running, send it a SIGKILL.
- 1
- User-defined 1. If the service is running, send it a SIGUSR1.
- 2
- User-defined 2. If the service is running, send it a SIGUSR2.
- f
- Enable "finish". Try and spawn ./finish everytime
./run exits. This is the default if ./finish exists and is
executable at the time npt-supervise is run.
- F
- Disable "finish". Do not spawn ./finish everytime
./run exits. This is the default when ./finish does not
exist or is not executable at the time npt-supervise is run.
- x
- Exit. Exit 0 as soon as the service is down
and wanted down.
Signals
When npt-supervise receives a SIGTERM, it behaves as if it had
received an x command.
Notification
When npt-supervise
monitors a service directory s, it performs the following actions:
- Notify s/supervise/event with "s" when it starts
- Notify s/supervise/event with "u" whenever the service goes up
(i.e. npt-supervise forks and tries to exec ./run)
- Notify s/supervise/event with "d" whenever the service goes down
(i.e. ./run dies)
- Notify s/supervise/event with "x" when it exits.
event is a fifodir; if it doesn't exist, npt-supervise
doesn't mind.
You can then use npt-wait,
npt-and or
npt-or on supervise/event to be
instantly notified when one of those events arises.
Be careful! Do NOT run a command line like
npt-svc -d /service/zoinx ; npt-and /service/zoinx/supervise/event d
or you will suffer from an important
race condition. Use the
npt-svwaitdown command instead:
it checks the status file while listening, avoiding the race.
Restarting policy fine-tuning with ./finish
- You can enable or disable ./finish at any time by
sending npt-supervise the f or F
command.
- If ./finish is disabled, npt-supervise will
sleep for one second before restarting ./run
whenever it dies.
- If ./finish is enabled, then npt-supervise
will wait until ./finish has exited, or one second has
elapsed, whichever comes later, before restarting ./run.
This means that a ./finish script reduced to true
or sleep 1 is the same as no ./finish at all,
whereas a sleep 5 finish script extends to 5 seconds
the delay between restarts.
- ./finish is run with two arguments. The first one is
./run's exit code. The second one is the least significant
byte of the exit status as determined by waitpid();
for instance it is 0 if ./run exited normally, and the
signal number if ./run was terminated by a signal.
- If npt-supervise could not exec ./run for
some reason, the exit code is 111 and the status is 0.
You can use those values in your ./finish script to
perform different actions, for instance, when your ./run
script exits zero, exits non-zero, or is killed by a signal.
Typically, ./finish can be used to send a mail to the system
administrator whenever a service crashes.
Be careful: npt-supervise will not restart ./run
as long as ./finish has not exited, so you should make sure
that ./finish does not block indefinitely.
Compatibility
- Previous versions of pipe-tools came with an unexported
supervise replacement. Starting with version 0.45, pipe-tools
provides an exported npt-supervise, that does not need the
daemontools source to
build.
- You can make a link in the /command directory from supervise
to npt-supervise, in order to always use the npt-supervise
implementation. svscan
will work with npt-supervise.
- To send the bytes control commands to npt-supervise
running on the service directory s, you can use:
- the npt-svc command (recommended):
npt-svc -bytes s
- the svc
command, as long as bytes are supported by svc:
svc -bytes s
- the minecho
command: minecho -n bytes > s/supervise/control
- the Unix echo command, just as minecho.
Credits
- The daemontools
package provided the original interface and implementation of
supervise and
svc.
- Gerrit Pape's
runsv program
provided the idea for the additional control bytes, as well as the
./finish concept.
- Clemens Fischer and Stefan Karrmann provided insight on the
restarting algorithm.