Re: How to report a death by signal ? (was: loopwhilex)

From: Peter Pentchev <roam_at_ringlet.net>
Date: Wed, 18 Feb 2015 12:58:00 +0200

On Wed, Feb 18, 2015 at 02:16:24AM +0100, Laurent Bercot wrote:
>
> OK, I want to tackle this problem once and for all, because it
> has been gnawing at the edges of my mind for some time, and Olivier
> forced me to look it in the face, and it's not pretty.
>
> The situation is this: we have a parent process P (itself child
> of a grandparent process G) spawning a child process C and waiting
> for it. Either C dies normally with an exit code from 0 to 255, or
> it is killed by a signal.
> I want P to report to G what happened to C, with as much precision
> as possible.
>
> The problem is, there's more information in a wstat (the
> structure filled in by wait()) than a process can report by
> simply exiting. P could exit with the same exit code as C, but
> then what should it do if C has been killed by a signal ?
>
> An idea is to have P kill itself with the same signal that killed C.
> But that's actually not right, because P itself could be killed by a
> signal from another source, and G needs that information. "P has been
> killed by a signal" and "C has been killed by a signal" are two
> different informations and should be reported in a different way.
>
> So, any way you look at it, there is always more information than we
> can report.
> Is there a satisfactory way to proceed ?
>
> If P knows that C will not be using the whole range of valid exit
> codes, it can reserve one to report the death of C by a signal. But
> if P does not know, or if C actually uses the whole range ?
> There has to be some information loss somewhere, and that is
> annoying me. Any thoughts ?
>
> Note that "foreground" has it easy: it does not need to report
> information to its parent by exiting, it just needs to report
> information to the process it execs into - which is trivially
> done via the environment. The "?" environment variable can hold
> a whole wstat's worth of information, and more; so it's a non-issue
> for foreground.

Sorry for the full quote, but your message is both comprehensive and
concise :)

OK, so the "not using the whole range of valid exit codes" point rules
out my obvious reply - "do what the shell does - exit 128 + signum".

Now the question is, do you want to solve this problem in general, or do
you want to solve it for a particular combination of programs, even if
new programs may be added to that combination in the future, but only
under certain rules? If it's the former (in general), then, sorry, I
don't have a satisfactory answer for you, and the fact that the POSIX
shell still keeps the "exit 128 + signum" behavior mostly means that
nobody else has come up with a better one, either (or it might be
available at least as some kind of an option).

If it's the latter though, that is, you are only looking for a solution
for a certain system, then it's up to you to come up with a way.
Personally, I quite like the idea of some kind of a pipe (be it a
pipe(2) pair of file descriptors or an AF_UNIX/PF_UNSPEC socketpair or
some other kind of communication channel based on file descriptors),
even if it is only unidirectional:
- P has a way of telling G that something has happened by sending
  specific combinations of bytes up the pipe (many options, up to and
  including full-blown NL-terminated text strings consisting of
  whitespace-separated words - and of course, binary structures are
  always an alternative, but let's not talk about ASN.1, shall we? :)
- G has a way of knowing exactly when P has ended, at the exact moment
  that P has ended; this came very, very useful lately when I had to
  write something that will restart a service as soon as possible, but
  really, very quickly, *before* the kernel has finished dumping the old
  process's core (it could take up to a minute, which was... not good).
- various ways for G to let P know which file descriptor to use for
  passing information up the pipe - a command-line option, an
  environment variable, in some extreme cases even a file, etc.

Just my two cents, albeit a bit long-winded :)

G'luck,
Peter

-- 
Peter Pentchev  roam_at_ringlet.net roam_at_FreeBSD.org p.penchev_at_storpool.com
PGP key:        http://people.FreeBSD.org/~roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115  C354 651E EFB0 2527 DF13



Received on Wed Feb 18 2015 - 10:58:00 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:38:49 UTC