Re: runit kill runsv from Joan Picanyol i Puig on 2016-09-01 (supervision)

From: Joan Picanyol i Puig <lists-supervision_at_biaix.org>
Date: Thu, 1 Sep 2016 16:47:21 +0200

[sorry for replying late, catching up]

* Laurent Bercot <ska-supervision_at_skarnet.org> [20160627 18:05]:
> On 27/06/2016 14:02, Joan Picanyol i Puig wrote:
> >However, couldn't they know whether their child did not cease to run
> >because
> >of a signal they sent?
>
> I'm not sure about runsv, but s6-supervise is a state machine, and the
> service state only goes from UP to FINISH when the supervisor receives a
> SIGCHLD. The state does not change at all after the supervisor sent a
> signal: it sent a signal, yeah, so what - it's entirely up to the daemon
> what to do with that signal.

I understand: supervisors only exec() processes and propagate signals, they
have no saying in nor can expect what their effect is.

> There's an exception for SIGSTOP because stopped daemons won't die
> before you SIGCONT them, but that's it; even sending SIGKILL won't
> make s6-supervise change states. Of course, if you send SIGKILL,
> you're going to receive a SIGCHLD very soon, and *that* will trigger a
> state change.

Given that SIGKILL shares with SIGSTOP the fact that they can't be
caught (and thus supervisors can assume a forthcoming SIGCHLD) signals
(pun intended) that the exception should be extended?

> >No, but neither can the admin enforce this policy automatically and
> >portably using current supervisors. Other than the "dedicated user/login
> >class/cgroup" scheme proposed by Jan (which can be considered best
> >practice anyway), it'd be nice if they exposed this somehow (hand-waving
> >SMOP ahead: duplicate the pid field in ./status and remove the working
> >copy only when receiving a down signal).
>
> No need to duplicate the pid field: if s6-supervise dies before the service
> goes down, the pid field in supervise/status is left unchanged, so it still
> contains the correct pid. I suspect runsv works the same.

Ah, ok, it didn't occur to me that pid 0 in supervise/status could be
used to mean "never run or got SIGCHLD"

> I guess a partial mitigation strategy could be "if supervise/status exists
> and its pid field is nonzero when the supervisor starts, warn that an
> instance of the daemon may still be running and print its pid". Do you
> think it would be worth the effort?

As well as the warning (which would make troubleshooting easier and
might have probably avoided this thread), a robust automation enabling
ui (in s6-svstat / s6-svok) would round this additional feature and make it
yet more useful.

keep up the good work

-- 
pica

Received on Thu Sep 01 2016 - 14:47:21 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC