Re: stage2 as a service [was: Some suggestions on old-fashioned usage with s6 2.10.x]

From: Laurent Bercot <ska-supervision_at_skarnet.org>
Date: Sun, 31 Jan 2021 10:25:22 +0000

  Hi Stefan,
  Long time no see!

  A few comments:


># optional: -- Question: Is this necessary?
> redirfd -w 0 ${SCANDIR}/service/s6-svscan-log/fifo
> # now the catch all logger runs
> fdclose 0

  I'm not sure what you're trying to do here. The catch-all logger
should be automatically unblocked when
${SCANDIR}/service/s6-svscan-log/run starts.
  The fifo trick should not be visible at all in stage 2: by the time
stage 2 is running, everything is clean and no trickery should take
place. The point of the fifo trick is to make the supervision tree
log to a service that is part of the same supervision tree; but once
the tree has started, no sleight of hand is required.


>foreground { s6-svc -O . } # don't restart me

  If you have to do this, it is the first sign that you're abusing
the supervision pattern; see below.


>foreground { s6-rc -l ${LIVEDIR}/live -t 10000 change ${RCDEFAULT} }
># notify s6-supervise:
>fdmove 1 3
>foreground { echo "s6-rc ready, stage 2 is up." }
>fdclose 1 # -- Question: Is this necessary?

  It's not strictly necessary to close the fd after notifying readiness,
but it's a good idea nonetheless since the fd is unusable afterwards.
However, readiness notification is only useful when your service is
actually providing a... service once it's ready; here, your "service"
dies immediately, and is not restarted.
  That's because it's really a oneshot that you're treating as a
longrun, which is abusing the pattern.


># NB: shutdown should create ./down here, to avoid race conditions

  And here is the final proof: in order to make your architecture work,
you have to *fight* supervision features, because they are getting in
your way instead of helping you.
  This shows that it's really not a good idea to run stage 2 as a
supervised service. Stage 2 is really a one-time initialization script
that should be run after the supervision tree is started, but *not*
supervised.


> { # fallback login
> sulogin --force -t 600 # timeout 600 seconds, i.e. 10 minutes.
> # kernel panic
> }

  Your need for sulogin here comes from the fact that you're doing quite
complex operations in stage 1: a user-defined set of hooks, then
several filesystem mounts, then another user-defined set of hooks.
And even then, you're running those in foreground blocks, so you're
not catching the errors; the only time your fallback activates is if
the cp -a from ${REPO} fails. Was that intended?

  In any case, that's a lot of error-prone work that could be done in
stage 2 instead. If you keep stage 1 as barebones as possible (and
only mount one single writable filesystem for the service directories)
you should be able to do away with sulogin entirely. sulogin is a
horrible hack that was only written because sysvinit is complex enough
that it needs a special debugging tool if something breaks in the
middle.
  With an s6-based init, it's not the case. Ideally, any failure that
happens before your early getty is running can only be serious enough
that you have to init=/bin/sh anyway. And for everything else, you have
your early getty. No need for special tools.


>Also I may switch to s6-linux-init finally.

  It should definitely spare you a lot of work. That's what it's for :)

--
  Laurent
Received on Sun Jan 31 2021 - 10:25:22 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC