Re: Some comments/questions about s6-rc semantics from Laurent Bercot on 2016-07-12 (supervision)

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Tue, 12 Jul 2016 14:34:10 +0200

On 12/07/2016 01:32, Avery Pennarun wrote:
> I recently started doing an experiment to convert our buildroot-based
> system to use s6-rc as an init system.

  Nice!

  You make very good points. I'll answer them as best as I can.

> - Logs in s6-rc are weird and inconvenient compared to the equivalent
> in plain s6 (or daemontools). In particular, I have to manually
> redirect fd#2, I have to create a whole separate service for each
> logger, and I have to create consumer-for/producer-for files. The
> whole producer/consumer/pipeline design seems a little excessively
> general. Wouldn't it be nicer to just allow a log/ subdir like s6
> would, and if I really don't like it, *then* let me play around with
> pipelines?

  Here, to be very honest, the architecture is implementation-driven.

  The baked-in service/log mechanism in s6-svscan is indeed very nice when
you're just using daemontools or s6, but it's ad-hoc, and like all ad-hoc
mechanisms, it's a real pain when you want to program around it.
I tried to use it, and to have longrun definition directories with their
log/ subdir. But... the s6-rc-compile code was full of specialcasing
everywhere, and designing a working s6-rc-update seemed like an
insurmountable challenge. An idea that was simple at start was becoming a
nightmare, just because of that special case.
  It was much, much simpler to flatten everything and treat loggers as
regular services; once I made that decision, the project immediately stopped
being a nightmare. And I realized that with the new architecture, I could
basically have pipelines of arbitrary length for free; since there was a
small demand for that, it was definitely a no-brainer.

  I agree, though, that it's more complex for the user than just service
and service/log. But it doesn't mean it has to remain that way. The
s6-rc-compile source format was made to be easy to programmatically
generate; once s6-rc picks up in popularity, I expect people to actually
write preprocessors for it. So you could have your own format with a foobar
service definition and a foobar/log service definition, and then a
preprocessor scans your definitions and automatically creates a source
database with foobar and foobar-log services with the right dependencies and
producer-for/consumer-for files. This operation is probably doable today in
a few lines of your favorite interpreted language; I didn't bake it in
s6-rc-compile because I wanted to keep the compiler, and the source format
it's using, simple; and keeping the format simple and predictable keeps it
easy to autogenerate.

  All complaints about the s6-rc-compile source format can basically be
answered with "write your own favorite format and transform it". :)

> - Oneshots don't seem to get either pipelines or loggers. I have some
> relatively complex oneshots, and I would really like to capture their
> logs, if only to prefix them with the name of the oneshot.

  s6-rc cannot provide you with an architecture to capture their logs,
because the only piece of architecture it can rely on is the existence of
a supervision tree. Apart from that, s6-rc can run very early, and needs
to be pretty much self-sufficient.

  It would be possible to log oneshots' outputs via a dedicated logger for
the s6rc-oneshot-runner service (which is a longrun that spawns all
oneshots under the supervision tree, for environment reproducibility
purposes), but 1. that would require an additional logging directory under
/run, which would duplicate the work of a catch-all logger, and 2. it's
more flexible to just let the user handle that output.

  Normally, oneshots' output is simply written to s6-rc's stdout and stderr,
which you can redirect as you see fit; you can also redirect them in the
up and down scripts for the oneshots themselves. I don't think it could
be any more flexible, any other behaviour would be enforcing policy.

  Note that oneshots must be idempotent, so it's generally risky to have
complex oneshots. I really recommend you to divide your complex oneshots into
a series of simple ones, and to make a bundle containing these simple ones
to represent the complex service.

> - Oneshot up/down scripts *must* be execline, while longrun run/finish
> scripts can use #! to specify any interpreter. Why the discrepancy?
> Many of my scripts are quite complicated, and this just forces me to
> use another layer of indirection, for reasons I don't understand.

  The reason is that longrun run/finish scripts are copied as executable
files into a service directory, whereas oneshot up/down scripts are baked
into the database as a command line. execline is just a way of saying
"it's a Unix command line, not an executable file"; this allows you, for
instance, to write your oneshot scripts sysv-style, as /etc/init.d/foobar,
and to declare foobar/up as "/etc/init.d/foobar start" and foobar/down as
"/etc/init.d/foobar stop". The plan was never to enforce execline usage
in up/down scripts, but to allow your real scripts to take arguments.

  The layer of indirection is quite intended. Now, if an up/down script is
simple enough to be easily expressed in execline and entirely fit in one
command line, all the better, but that's by no means an expectation.

> - What's the rationale for using fd load/store operations between
> pipeline elements, instead of just a mkfifo like daemontools uses?
> The latter is much simpler and doesn't require a separate, error prone
> daemon.

  Daemontools (and s6) don't use named pipes either. They use anonymous
pipes between a service and its logger; those pipes are maintained open
(fd-held) by the s6-svscan process. Runit does the same thing, except that
the pipes are held by the runsv process for a service and its logger, not
by the runsvdir process.

  This is necessary to ensure you don't lose logs whenever a service or its
logger dies. If you have a simple pipe (be it named or anonymous) between a
service and its logger, if the logger dies, any data still in the pipe is
lost. Not good. And if the pipe is anonymous, you can't reuse it ever again
for data transmission: you have to close the other end, somehow recreate
the pipe, and restart the processes. Ugh.

  The way daemontools avoids that is by keeping the pipe open: both its
ends have an open copy in svscan, so even if the daemon or the logger dies,
the pipe still exists, it's still usable for data transmission, and when the
process starts again, it's as if nothing had happened; it doesn't impact the
other end. I say that svscan is "holding the pipe descriptors".

  fd-holding is useful, but doing it in the scanner process is ad-hoc.
Runit does it in the supervisor process, which is less ad-hoc, but runsv
watches both a service and its logger, and *that* is ad-hoc. No matter how
you look at it, a "service piped into logger" mechanism provided by the
supervision framework requires specialcasing in the supervision programs
themselves.

  The s6-rc architecture does away with that. It delegates the fd-holding
to a separate daemon written entirely for this purpose: this is just a
generalization of what svscan does. And the s6-fdholder-store and
s6-fdholder-retrieve operations are necessary because s6-fdholderd is not
an ancestor of the services, so the services cannot get their pipes simply
by inheriting them via fork() as they do with svscan. On the other hand,
since the mechanism is generalized, this gives you reliable pipelines
for free - no need to have specific code in the supervisors to manage
pipelines.

  If there was no fd-holding, and pipelines were just a series of fifos
between a producer and a consumer, then as soon as one element in the chain
restarted, you would lose all the data still upstream of the failure point.

  Additionally, various OSes are buggy in various ways with named pipes,
specifically what happens when the last writer closes then reopens, with a
playful tendency to make select/poll in the reader fail in funny ways
afterwards, so asynchronous readers would busyloop, and trying to get it
sorted was hell - the easiest solution by far was "don't do that", which led
to "always keep a reader and a writer on it", and we're back to fd-holding
anyway.

> - It would be really nice if I could provide a data/ directory in
> oneshots and longruns, like I can with an s6 service. Especially
> since up/down must be executed using execline, it would be nice if I
> could at least direct them to run a full script in another file nearby
> in the file tree, for clarity.

  You have the whole filesystem all to yourself!

  There's such a thing as locality for longruns, because the service
directory is necessary. But for oneshots, there's a command line baked into
the database, and that's it: no other information is specific to s6-rc,
your oneshot script contains it all. If you have data relative to your
oneshot, you can store it wherever you want, and access it in your oneshot
itself via absolute paths. Anything else would be s6-rc enforcing policy
on you.

> - It's unclear in the docs how the s6-rc/compiled/ directory is
> supposed to be replaced.

  Compiling into a separate directory is indeed the intended usage. The way
it should go is:

  * name your current database /etc/s6-rc/compiled-$unique (I like to use
`s6-clock` as $unique)
  * have /etc/s6-rc/compiled be a symlink to your current database
  * To compile and update to a new database:
       s6-rc-compile /etc/s6-rc/compiled-$newunique $sources
    && s6-rc-update /etc/s6-rc/compiled-$newunique
    && ln -sf compiled-$otherunique /etc/s6-rc/compiled
    (optional) && rm -rf /etc/s6-rc/compiled-$unique

  I agree the documentation does not make it obvious what the intended
usage is, and your question is a FAQ. I will update the documentation to
address this.

> - It's unclear whether the "s6-rc change" command is properly
> re-entrant.

  It is not, and there are no plans to make it so. Ensuring consistency of
the database and live-update of the dependencies with concurrent or recursive
s6-rc invocations would be an order of magnitude more complex than the
current code, and I don't think the benefits would be worth it; there
should always be a way to model your dependencies that do not require s6-rc
to be reentrant.

> Imagine if I have a bundle called "all" that contains all
> my basic services. I run 's6-rc -u change all' which starts bringing
> them up. While bringing up X, it realizes that in order to continue,
> it must first bring up Y, which is part of 'all', but not listed as a
> dependency of X (since X doesn't *always* require Y).

  What is this model where dependencies aren't known until runtime? o.O
  I agree that s6-rc is not suited to fluctuating, uncertain dependencies.
In the general case, either X depends on Y, or it doesn't; and if you're
not sure, the safe assumption is that it does.

  If you need lazy starts, s6-rc isn't the right communication mechanism.
You could have a "lazy-Y" which X depends on, but that only starts a
listener, and when X starts, if it needs Y, it sends a command to lazy-Y,
which then starts Y. Or something like that. But if Y is an optional part
of X, you could also let X control it entirely instead of making it a
separate service. I don't know the specifics of your service, so I can't
say what the appropriate solution would be here.

> - I need the ability to "atomically cycle" a service (eg. because its
> config files have changed). For oneshots, that means down+up, and for
> longruns, that means terminate+run.

  There are two possibilities there: do you want to restart them and all
their dependencies, or do you want to restart them alone, at the risk of
something breaking while they're restarting? It's not the same thing at all.
The former involves "s6-rc change"; the latter does not, and s6-rc won't
prevent you from shooting yourself in the foot but will deny any involvement
and keep saying the services are still in the state it put them into. :)

> By atomically, I mean if two
> different callers try to cycle the service at once, one should finish
> before the next one does. Or even better, I'd like "at least once"
> semantics: let's say A and B both add new files to /etc/daemon.d.
> They both try to restart service X at the same time. There's no need
> to actually restart X twice; we just need a guarantee that the most
> recent start (not stop) happened *after* the later of A and B asked
> for a restart, and then both restart requests can finish
> simultaneously. Is there a good way to build these semantics around
> the current system?

  It's dangerous to do that, and I don't think it's possible to implement
in all generality. Let's say A added an A file to /etc/daemon.d and restarted
X. B is adding a B file to /etc/daemon.d at the same time (to be generous,
we'll assume it's doing so atomically, by rename()ing a file into the
directory), while X is restarting. How does B, or s6-rc, know whether X has
read /etc/daemon.d before or after the B file was added? If before, another
restart is required; if after, it is not - but we have no way of knowing,
and the only safe option is to always restart. The only way you can avoid
a double restart is if the second "restart" command arrives when X is not
up yet; and if your "restart" command is s6-svc -t /run/service/X, it is
naturally the case (it will do nothing if X is down).

> - Relatedly, I would like a command similar to s6-wait that works for
> any s6-rc service. It's fairly easy to translate a longrun into an s6
> service and just use s6-wait directly. However, that doesn't work
> with oneshots, and I would quite often like to wait for a oneshot to
> complete.

  The s6-rc dependency mechanism is precisely supposed to handle the
"wait for that service to complete" part for you. Inside a s6-rc invocation,
a transition is only performed when all its dependencies are completed, so
you don't have to worry about it. Outside a s6-rc invocation, if you want
to make sure X is up before doing Y, that is exactly what
"s6-rc -u change X && Y" is for. Do you have a use case that is not
covered by that?

> - It doesn't seem to be clearly documented what signal(s) s6-rc uses
> to stop services. It also doesn't give much flexibility in what kind
> of status you wait for. By comparison, s6-svc gives all the
> flexibility I'd like.

  Indeed. The command used to stop longruns is s6-svc -d, which sends a
SIGTERM. (And a SIGCONT, in case a process has been stopped.) It's the
standard signal for process termination. If your longrun understands
another signal instead, you may need to hack the run script (and flame the
daemon's authors :)); the execline "trap" command was written for this
usage. I'm looking at you, consul.

  Do you think something like a "termination-signal" file in a longrun
definition directory would be useful? I'm not promising anything, since it
would add some complexity to s6-rc-compile and s6-rc and I'm not sure the
benefits would be worth it, since it's possible to work around broken
daemons in run scripts; but if there's demand for it, that might be an
option in the future.

-- 
  Laurent

Received on Tue Jul 12 2016 - 12:34:10 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC