Jakub Jelen
2018-Aug-22 15:45 UTC
openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd
On Wed, 2018-08-22 at 09:02 -0500, kevin martin wrote:> Simple seems to have fixed it. I was also trying with "forking" as > the > type and that was failing as well.That is not as simple as that -- we lived with "simple" for long time, but it was not covering some corner cases so we ended up using the sd_notify, since that was the only reliable way for systemd to know the service is working. For others interested in this topic, there was a long discussion in bug #2641, unfortunately without upstream solution: https://bugzilla.mindrot.org/show_bug.cgi?id=2641 Regards, -- Jakub Jelen Software Engineer Security Technologies Red Hat, Inc.
kevin martin
2018-Aug-22 15:53 UTC
openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd
yep, that race condition is exactly what i was experiencing. I'm not sure why having the systemd notify code in openssh as a configure time option would be such a bad thing. --- Regards, Kevin Martin On Wed, Aug 22, 2018 at 10:45 AM Jakub Jelen <jjelen at redhat.com> wrote:> On Wed, 2018-08-22 at 09:02 -0500, kevin martin wrote: > > Simple seems to have fixed it. I was also trying with "forking" as > > the > > type and that was failing as well. > > That is not as simple as that -- we lived with "simple" for long time, > but it was not covering some corner cases so we ended up using the > sd_notify, since that was the only reliable way for systemd to know the > service is working. > > For others interested in this topic, there was a long discussion in bug > #2641, unfortunately without upstream solution: > > https://bugzilla.mindrot.org/show_bug.cgi?id=2641 > > Regards, > -- > Jakub Jelen > Software Engineer > Security Technologies > Red Hat, Inc. > >
Peter Stuge
2018-Aug-22 21:32 UTC
openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd
kevin martin wrote:> not sure why having the systemd notify code in openssh as a > configure time option would be such a bad thing.At the very least it introduces a dependency on libsystemd into sshd, which is undesirable for reasons of security and convenience. The principle of "you are done when you can not remove any more" confirms that it is unwise to add dependencies without very careful consideration. I've read through the debian and Red Hat bug reports. There are two different but related problems here: 1. For systemctl [re]start, when a .service file has Type=simple, systemd assumes that service startup can never fail, and immediately considers this service successfully started when the exec() of sshd has succeeded. That's debatable design within systemd, but it's hard for systemd to know when a given service has actually started successfully, and services which fit that assumption do exist. So when sshd detects an error on startup and exits with an error code shortly after being started, systemd considers the service to first have started successfully and then to have exited with an error, so it then restarts the service. Repeat. When service limits are exhausted the service ends up in a failed state. Meanwhile, the systemctl [re]start command doesn't report any error to the administrator, because systemd considers the service to have [re]started successfully once. This is "error messages are lost". 2. For systemctl reload, systemd can and arguably should send SIGHUP to sshd. More uncertainty and assumptions within systemd follows; sshd re-exec:s, meaning that the PID stays the same, so systemd doesn't receive SIGCHLD and so even if 1. is fixed, here systemd will not understand that there an error during startup of the new sshd is to be considered a failed reload. Ie. the above problems apply here again. The systemctl reload sshd command is always immediately successful, even if re-exec:ed sshd detects an error in the config file. In both these cases, systemctl reports no error, while sshd isn't running. So what to do? A workaround for [re]start is to add sshd -t ExecStartPre linting, but that doesn't help at all with reload. It would be good to have sshd integrate with systemd here, but we need to avoid the libsystemd dependency. Fortunately, sd_notify() doesn't need to do all too much; almost everything is used before in the OpenSSH codebase, so it's easy enough to add local code for it. It's a sendmsg() with SCM_CREDENTIALS to the AF_UNIX SOCK_DGRAM named in $NOTIFY_SOCKET. The file descriptor passing code in monitor_fdpass.c sends other messages with ancillary data. Damien, how do you feel about adding the notification without the dependency, maybe conditioned on a configure.ac check for (Linux-only) SCM_CREDENTIALS? I think the minimum viable product would be to emit READY=1 once startup is complete and RELOADING=1 on SIGHUP receipt. STOPPING=1 would also make sense in sshd exit paths if something could end up blocking along the way, but at least the SIGTERM case in server_accept_loop() doesn't seem to need that. STATUS= and ERRNO= could be nice-to-haves for error messages. So I wrote a simple sd_notify() and am attaching it here, but the address part and a connect() may need to be outside the function with privilege separation. Thoughts on this idea? //Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: sd_notify.c Type: text/x-c Size: 2545 bytes Desc: not available URL: <http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20180822/00bbee94/attachment.c>