Re: dependant services from Avery Payne on 2015-04-21 (supervision)

From: Avery Payne <avery.p.payne_at_gmail.com>
Date: Tue, 21 Apr 2015 16:52:39 -0700

On 4/21/2015 2:56 PM, Buck Evan wrote:
> My understanding of s6 socket activation is that services should open,
> hold onto their listening socket when they're up, and s6 relies on the
> OS for swapping out inactive services. It's not socket activation in
> the usual sense. http://skarnet.org/software/s6/socket-activation.html
>
I apologize, I was a bit hasty and I think I need more sleep. I'm
confusing socket activation with some other s6 feature, perhaps I was
confusing it with how s6-notifywhenup is used...
http://skarnet.org/software/s6/s6-notifywhenup.html
> So I wonder what the "full guarantee" provided by s6 that you
> mentioned looks like.
> It seems like in such a world all services would race and the
> determinism of the race would depend on each service's implementation.
This I do understand, having gone through it with supervision-scripts.
The basic problem is that a running service does not mean a service is
ready, it only means it's "up".

Dependency handling "with guarantee" means there is some means by which
the child service itself signals "I'm fully up and running", vs. "I'm
started but not ready". Because there is no polling going on, this
allows the start-up of the parent daemon to sleep until it either is
notified or times out. And you get a "clean" start-up of the parent
because the children have directly signaled that "we're all ready".

Dependency handling "without guarantee" is what my project does as an
optional feature - it brings up the child process and then calls the
child's ./check script to see if everything is OK, which is polling the
child (and wasting CPU cycles). This is fine for "light" use because
most child processes will start quickly and the parent won't time out
while waiting. There are trade-offs for using this feature. First,
./check scripts may have unintended bugs, behaviors, or issues that you
can't see or resolve, unlike the child directly signalling that it is
ready for use. Second, the polling approach adds to CPU overhead,
making it less than ideal for mobile computing - it will draw more power
over time. Third, there are edge cases where it can make a bad
situation worse - picture a heavily loaded system that takes 20+ minutes
to start a child process, and the result being the parent spawn-loops
repeatedly, which just adds even more load. That's just the three I can
think off off the top of my head - I'm sure there's more. It's also why
it's not enabled by default.
Received on Tue Apr 21 2015 - 23:52:39 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC