Re: runit-scripts gone, supervision-scripts progress

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Fri, 02 Jan 2015 13:59:15 +0100

  Hi Avery,
  Happy new year to you !

  Congratulations on the achievements so far, even if they're not reaching
the bar you set for yourself.

  Just a little note:

> + The ./finish concept needs development and refinement.
>
> + Need to incorporate some kind of alerting or reporting mechanism into
> ./finish, so that the sysadmin receives notifications

  ./finish is a delicate beast. It is not only run when the admin brings
the service down, which is fine, but also when the service stops in an
untimely fashion; and the service cannot start again as long as ./finish
is running. So, if anything time-consuming, or worse, blocking, happens
in ./finish, the service can be totally hosed.
  Services should do all their necessary work in ./run, before executing
into the long-lived process: when they are in ./run, it's a known and
manageable state, they are up, even if they are not ready yet. But in
./finish, it's kind of a limbo state that shouldn't be drawn out. The
service is down, but it's still doing something, can't be brought up
right now, etc. Having a service stuck in "finish" state is about as
infuriating as having a process stuck in "D" state on Linux.

  s6-supervise has a built-in protection against misbehaving ./finish
scripts: if ./finish is still around after 5 seconds, it kills it.
(With a SIGKILL. When a service is down is not the time to be polite.)
AFAICT, runsv does not have such a protection, which makes it even more
important to pay attention when writing ./finish scripts.

  One way or the other, ./finish should only be used scarcely, for clean-up
duties that absolutely need to happen when the long-lived process has died:
removing stale or temporary files, for instance. Those should be brief
operations and absolutely cannot block.
  So, if you're implementing reporting in ./finish, make sure you are using
fast, non-blocking commands that just fail (possibly logging an error
message) if they have trouble doing their job.

  The way I would implement reporting wouldn't be based on ./finish, but on
an external set of processes listening to down/up/ready notifications in
/service/foobar/event. It would only work with s6, though.

-- 
  Laurent
Received on Fri Jan 02 2015 - 12:59:15 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:18 UTC