Peter Stuge
2018-Aug-23 17:49 UTC
openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd
Damien Miller wrote:> I agree: what is happening here seems to be mostly bad assumptions and > inflexibility inside systemd.I didn't say that, and I don't agree with that, to me it's welcome ambition rather than bad assumptions. Consider this: How could systemd determine whether startup of a foreground daemon completed successfully or failed? Other than explicit notification (like a AF_UNIX message) systemd could only use time; it could wait for the daemon to exit(EXIT_FAILURE) after exec() - but how long is long enough? Every answer is incorrect. Since systemd can't know when sshd has successfully started I find it really reasonable to assume "immediately" in the Type=simple case.> I'm surprised that systemd made these design decisions, because sshd is > not doing anything historically unique with regards to startup or reload > behaviour and "works with existing daemons" seems to be requirement #0 > if you're writing an init system.That's not fair. systemd works with sshd just as well as if I would add sshd to my inittab on a SysV init system, but that's not so useful. systemd works well with sshd using Type=forking, but if the config file breaks and a reload is issued (and sshd exits, because bad config) then systemd detects that sshd exited, but it can't know why, so it can't output a status message. systemd is indeed more ambitious than e.g. SysV init, and for service management I consider that a leap in the right direction. (For many other things which systemd wants to do not so much - I don't use those.)> Maybe the other daemon vendors didn't push back against this, but I'm > willing to.Please don't push back just for the sake of it. Did you look at the code I sent? Would you take a patch with essentially that code, without any libsystemd dependency, to make sshd work as a Type=notify service, enabling maximum usability with systemd? //Peter
Jochen Bern
2018-Aug-24 12:04 UTC
openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd
On 08/23/2018 07:49 PM, Peter Stuge wrote:> How could systemd determine whether startup of a foreground daemon > completed successfully or failed? > Other than explicit notification (like a AF_UNIX message) systemd > could only use time; it could wait for the daemon to exit(EXIT_FAILURE) > after exec() - but how long is long enough? Every answer is incorrect.If we can agree that neither systemd nor "legacy" methods(*) of getting feedback from daemon processes will cease to exist just because the other side wishes them to hard enough, then complementing either side (but preferably systemd) with a (general, configurable, contrib/ subdir based) wrapper to translate as needed would seem a pragmatic solution. </?.02> (*) PID file, lookup in the process table, check for a LISTEN, pattern match in a logfile, running a dedicated *client* executable / Nagios plugin / ${DAEMON}ctl tool for a test, throwing the daemon a SIGAREYOUWELL/shmem/semaphore/... request, you name it Regards, -- Jochen Bern Systemingenieur Binect GmbH -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4278 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20180824/74d34f7e/attachment-0001.p7s>
Colin Watson
2018-Aug-24 17:19 UTC
openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd
On Fri, Aug 24, 2018 at 02:04:13PM +0200, Jochen Bern wrote:> On 08/23/2018 07:49 PM, Peter Stuge wrote: > > How could systemd determine whether startup of a foreground daemon > > completed successfully or failed? > > Other than explicit notification (like a AF_UNIX message) systemd > > could only use time; it could wait for the daemon to exit(EXIT_FAILURE) > > after exec() - but how long is long enough? Every answer is incorrect. > > If we can agree that neither systemd nor "legacy" methods(*) of getting > feedback from daemon processes will cease to exist just because the > other side wishes them to hard enough, then complementing either side > (but preferably systemd) with a (general, configurable, contrib/ subdir > based) wrapper to translate as needed would seem a pragmatic solution. > </?.02> > > (*) PID file, lookup in the process table, check for a LISTEN, pattern > match in a logfile, running a dedicated *client* executable / Nagios > plugin / ${DAEMON}ctl tool for a test, throwing the daemon a > SIGAREYOUWELL/shmem/semaphore/... request, you name itI doubt that anyone using OpenSSH with systemd would want to use a polling-based (and thus inefficient) hack like that when they could just apply the tiny patch to slot in an sd_notify call between listen and accept. (And I definitely see the logic behind notifying the service manager at that point; I've dealt with complex services built on top of OpenSSH that needed to arrange the boot sequence so that they started only once sshd was actually ready to accept connections, and without this kind of approach they had to settle for arbitrary delays and race conditions.) systemd has its structural problems, but this is one thing it gets right. To my mind, the reasons for avoiding linking against libsystemd with a configure-time switch are essentially political; if you're running on a systemd-based system then it's paged in anyway so the runtime cost is negligible, if you're not then sd_notify is already careful to do nothing and do so cheaply, and in general I think it makes more sense to use common code to notify the service manager than to duplicate it. (I still have a soft spot for the hacky "SIGSTOP yourself and have init send you SIGCONT when it notices" approach to this problem that we took in upstart, but I can understand why systemd preferred to do something else.) Obviously it's better to get patches upstream wherever possible. But honestly, speaking as a downstream who maintains a patch that calls sd_notify in the right place, I'd rather have to maintain that patch indefinitely than have a worse hack upstream that I'd then have to undo or otherwise work around. -- Colin Watson [cjwatson at debian.org]