thr3ads.net - openssh unix dev - openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Peter Stuge

2018-Aug-22 21:32 UTC

openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd

kevin martin wrote:> not sure why having the systemd notify code in openssh as a
> configure time option would be such a bad thing.
At the very least it introduces a dependency on libsystemd into sshd,
which is undesirable for reasons of security and convenience. The
principle of "you are done when you can not remove any more" confirms
that it is unwise to add dependencies without very careful consideration.

I've read through the debian and Red Hat bug reports.

There are two different but related problems here:

1. For systemctl [re]start, when a .service file has Type=simple,
systemd assumes that service startup can never fail, and immediately
considers this service successfully started when the exec() of sshd
has succeeded.

That's debatable design within systemd, but it's hard for systemd to
know when a given service has actually started successfully, and
services which fit that assumption do exist.

So when sshd detects an error on startup and exits with an error code
shortly after being started, systemd considers the service to first
have started successfully and then to have exited with an error, so
it then restarts the service. Repeat.

When service limits are exhausted the service ends up in a failed state.

Meanwhile, the systemctl [re]start command doesn't report any error
to the administrator, because systemd considers the service to have
[re]started successfully once. This is "error messages are lost".

2. For systemctl reload, systemd can and arguably should send SIGHUP
to sshd. More uncertainty and assumptions within systemd follows;
sshd re-exec:s, meaning that the PID stays the same, so systemd
doesn't receive SIGCHLD and so even if 1. is fixed, here systemd will
not understand that there an error during startup of the new sshd is
to be considered a failed reload. Ie. the above problems apply here
again. The systemctl reload sshd command is always immediately
successful, even if re-exec:ed sshd detects an error in the config
file.

In both these cases, systemctl reports no error, while sshd isn't running.

So what to do?

A workaround for [re]start is to add sshd -t ExecStartPre linting,
but that doesn't help at all with reload.

It would be good to have sshd integrate with systemd here, but we
need to avoid the libsystemd dependency.

Fortunately, sd_notify() doesn't need to do all too much; almost
everything is used before in the OpenSSH codebase, so it's easy
enough to add local code for it. It's a sendmsg() with SCM_CREDENTIALS
to the AF_UNIX SOCK_DGRAM named in $NOTIFY_SOCKET.

The file descriptor passing code in monitor_fdpass.c sends other
messages with ancillary data.

Damien, how do you feel about adding the notification without the
dependency, maybe conditioned on a configure.ac check for (Linux-only)
SCM_CREDENTIALS?

I think the minimum viable product would be to emit READY=1 once
startup is complete and RELOADING=1 on SIGHUP receipt.

STOPPING=1 would also make sense in sshd exit paths if something
could end up blocking along the way, but at least the SIGTERM case
in server_accept_loop() doesn't seem to need that.

STATUS= and ERRNO= could be nice-to-haves for error messages.

So I wrote a simple sd_notify() and am attaching it here, but the
address part and a connect() may need to be outside the function
with privilege separation. Thoughts on this idea?

//Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sd_notify.c
Type: text/x-c
Size: 2545 bytes
Desc: not available
URL:
<http://lists.mindrot.org/pipermail/openssh-unix-dev/attachments/20180822/00bbee94/attachment.c>

Damien Miller

2018-Aug-23 04:01 UTC

head link

openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd

On Wed, 22 Aug 2018, Peter Stuge wrote:
> kevin martin wrote:
> > not sure why having the systemd notify code in openssh as a
> > configure time option would be such a bad thing.
> 
> At the very least it introduces a dependency on libsystemd into sshd,
> which is undesirable for reasons of security and convenience. The
> principle of "you are done when you can not remove any more"
confirms
> that it is unwise to add dependencies without very careful consideration.
> 
> 
> I've read through the debian and Red Hat bug reports.
> 
> There are two different but related problems here:
> 
> 1. For systemctl [re]start, when a .service file has Type=simple,
> systemd assumes that service startup can never fail, and immediately
> considers this service successfully started when the exec() of sshd
> has succeeded.
> 
> That's debatable design within systemd, but it's hard for systemd
to
> know when a given service has actually started successfully, and
> services which fit that assumption do exist.
> 
> So when sshd detects an error on startup and exits with an error code
> shortly after being started, systemd considers the service to first
> have started successfully and then to have exited with an error, so
> it then restarts the service. Repeat.
> 
> When service limits are exhausted the service ends up in a failed state.
> 
> Meanwhile, the systemctl [re]start command doesn't report any error
> to the administrator, because systemd considers the service to have
> [re]started successfully once. This is "error messages are lost".
> 
> 
> 2. For systemctl reload, systemd can and arguably should send SIGHUP
> to sshd. More uncertainty and assumptions within systemd follows;
> sshd re-exec:s, meaning that the PID stays the same, so systemd
> doesn't receive SIGCHLD and so even if 1. is fixed, here systemd will
> not understand that there an error during startup of the new sshd is
> to be considered a failed reload. Ie. the above problems apply here
> again. The systemctl reload sshd command is always immediately
> successful, even if re-exec:ed sshd detects an error in the config
> file.
Thanks for the detailed write up, Peter.

I agree: what is happening here seems to be mostly bad assumptions and
inflexibility inside systemd.

I'm surprised that systemd made these design decisions, because sshd is
not doing anything historically unique with regards to startup or reload
behaviour and "works with existing daemons" seems to be requirement #0
if you're writing an init system.

Maybe the other daemon vendors didn't push back against this, but I'm
willing to.

-d

kevin martin

2018-Aug-23 14:53 UTC

head link

openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd

I'm not sure I agree with Peter in respect to his comment about
"building a
dependency to systemd".  The only time a "dependency" would be
created is
when the end-user would configure it to be there with a configure time flag
of --with-systemd.  Just having the code available and dormant without that
flag being provided builds in no dependency whatsoever and gives the
end-user their option to choose.

---


Regards,

Kevin Martin


On Wed, Aug 22, 2018 at 11:02 PM Damien Miller <djm at mindrot.org> wrote:
> On Wed, 22 Aug 2018, Peter Stuge wrote:
>
> > kevin martin wrote:
> > > not sure why having the systemd notify code in openssh as a
> > > configure time option would be such a bad thing.
> >
> > At the very least it introduces a dependency on libsystemd into sshd,
> > which is undesirable for reasons of security and convenience. The
> > principle of "you are done when you can not remove any more"
confirms
> > that it is unwise to add dependencies without very careful
consideration.
> >
> >
> > I've read through the debian and Red Hat bug reports.
> >
> > There are two different but related problems here:
> >
> > 1. For systemctl [re]start, when a .service file has Type=simple,
> > systemd assumes that service startup can never fail, and immediately
> > considers this service successfully started when the exec() of sshd
> > has succeeded.
> >
> > That's debatable design within systemd, but it's hard for
systemd to
> > know when a given service has actually started successfully, and
> > services which fit that assumption do exist.
> >
> > So when sshd detects an error on startup and exits with an error code
> > shortly after being started, systemd considers the service to first
> > have started successfully and then to have exited with an error, so
> > it then restarts the service. Repeat.
> >
> > When service limits are exhausted the service ends up in a failed
state.
> >
> > Meanwhile, the systemctl [re]start command doesn't report any
error
> > to the administrator, because systemd considers the service to have
> > [re]started successfully once. This is "error messages are
lost".
> >
> >
> > 2. For systemctl reload, systemd can and arguably should send SIGHUP
> > to sshd. More uncertainty and assumptions within systemd follows;
> > sshd re-exec:s, meaning that the PID stays the same, so systemd
> > doesn't receive SIGCHLD and so even if 1. is fixed, here systemd
will
> > not understand that there an error during startup of the new sshd is
> > to be considered a failed reload. Ie. the above problems apply here
> > again. The systemctl reload sshd command is always immediately
> > successful, even if re-exec:ed sshd detects an error in the config
> > file.
>
> Thanks for the detailed write up, Peter.
>
> I agree: what is happening here seems to be mostly bad assumptions and
> inflexibility inside systemd.
>
> I'm surprised that systemd made these design decisions, because sshd is
> not doing anything historically unique with regards to startup or reload
> behaviour and "works with existing daemons" seems to be
requirement #0
> if you're writing an init system.
>
> Maybe the other daemon vendors didn't push back against this, but
I'm
> willing to.
>
> -d
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev at mindrot.org
> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
>

Peter Stuge

2018-Aug-23 17:49 UTC

head link

openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd

Damien Miller wrote:> I agree: what is happening here seems to be mostly bad assumptions and
> inflexibility inside systemd.
I didn't say that, and I don't agree with that, to me it's welcome
ambition rather than bad assumptions.

Consider this:

How could systemd determine whether startup of a foreground daemon
completed successfully or failed?

Other than explicit notification (like a AF_UNIX message) systemd
could only use time; it could wait for the daemon to exit(EXIT_FAILURE)
after exec() - but how long is long enough? Every answer is incorrect.

Since systemd can't know when sshd has successfully started I find it
really reasonable to assume "immediately" in the Type=simple case.

> I'm surprised that systemd made these design decisions, because sshd is
> not doing anything historically unique with regards to startup or reload
> behaviour and "works with existing daemons" seems to be
requirement #0
> if you're writing an init system.
That's not fair.

systemd works with sshd just as well as if I would add sshd to my inittab
on a SysV init system, but that's not so useful.

systemd works well with sshd using Type=forking, but if the config
file breaks and a reload is issued (and sshd exits, because bad config)
then systemd detects that sshd exited, but it can't know why, so it
can't output a status message.

systemd is indeed more ambitious than e.g. SysV init, and for service
management I consider that a leap in the right direction. (For many other
things which systemd wants to do not so much - I don't use those.)

> Maybe the other daemon vendors didn't push back against this, but
I'm
> willing to.
Please don't push back just for the sake of it.

Did you look at the code I sent?

Would you take a patch with essentially that code, without any
libsystemd dependency, to make sshd work as a Type=notify service,
enabling maximum usability with systemd?

//Peter

openssh unix dev - Aug 2018 - openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd

openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd

openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd

openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd

openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd