kevin martin
2018-Aug-23 14:53 UTC
openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd
I'm not sure I agree with Peter in respect to his comment about "building a dependency to systemd". The only time a "dependency" would be created is when the end-user would configure it to be there with a configure time flag of --with-systemd. Just having the code available and dormant without that flag being provided builds in no dependency whatsoever and gives the end-user their option to choose. --- Regards, Kevin Martin On Wed, Aug 22, 2018 at 11:02 PM Damien Miller <djm at mindrot.org> wrote:> On Wed, 22 Aug 2018, Peter Stuge wrote: > > > kevin martin wrote: > > > not sure why having the systemd notify code in openssh as a > > > configure time option would be such a bad thing. > > > > At the very least it introduces a dependency on libsystemd into sshd, > > which is undesirable for reasons of security and convenience. The > > principle of "you are done when you can not remove any more" confirms > > that it is unwise to add dependencies without very careful consideration. > > > > > > I've read through the debian and Red Hat bug reports. > > > > There are two different but related problems here: > > > > 1. For systemctl [re]start, when a .service file has Type=simple, > > systemd assumes that service startup can never fail, and immediately > > considers this service successfully started when the exec() of sshd > > has succeeded. > > > > That's debatable design within systemd, but it's hard for systemd to > > know when a given service has actually started successfully, and > > services which fit that assumption do exist. > > > > So when sshd detects an error on startup and exits with an error code > > shortly after being started, systemd considers the service to first > > have started successfully and then to have exited with an error, so > > it then restarts the service. Repeat. > > > > When service limits are exhausted the service ends up in a failed state. > > > > Meanwhile, the systemctl [re]start command doesn't report any error > > to the administrator, because systemd considers the service to have > > [re]started successfully once. This is "error messages are lost". > > > > > > 2. For systemctl reload, systemd can and arguably should send SIGHUP > > to sshd. More uncertainty and assumptions within systemd follows; > > sshd re-exec:s, meaning that the PID stays the same, so systemd > > doesn't receive SIGCHLD and so even if 1. is fixed, here systemd will > > not understand that there an error during startup of the new sshd is > > to be considered a failed reload. Ie. the above problems apply here > > again. The systemctl reload sshd command is always immediately > > successful, even if re-exec:ed sshd detects an error in the config > > file. > > Thanks for the detailed write up, Peter. > > I agree: what is happening here seems to be mostly bad assumptions and > inflexibility inside systemd. > > I'm surprised that systemd made these design decisions, because sshd is > not doing anything historically unique with regards to startup or reload > behaviour and "works with existing daemons" seems to be requirement #0 > if you're writing an init system. > > Maybe the other daemon vendors didn't push back against this, but I'm > willing to. > > -d > _______________________________________________ > openssh-unix-dev mailing list > openssh-unix-dev at mindrot.org > https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev >
Emmanuel Deloget
2018-Aug-23 16:21 UTC
openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd
Hello, On Thu, Aug 23, 2018 at 4:53 PM kevin martin <ktmdms at gmail.com> wrote:> > I'm not sure I agree with Peter in respect to his comment about "building a > dependency to systemd". The only time a "dependency" would be created is > when the end-user would configure it to be there with a configure time flag > of --with-systemd. Just having the code available and dormant without that > flag being provided builds in no dependency whatsoever and gives the > end-user their option to choose.Not sure I should step in, but the code to deal with the user selection and to notify systemd is a dependency - even if it's compiled out. The fact is that you still ave to maintain it and to test it regularly. The problem looks like a systemd configuration error. systemd allows you to start a non-systemd-aware daemon. You need to look at [Service] / Type (notify is used for systemd-aware daemons). BR, -- Emmanuel Deloget
kevin martin
2018-Aug-23 16:48 UTC
openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd
While I appreciate the need to code it and test it regularly, Peter wrote a bit of notify code and provided it to Damien to essentially do what the API code into systemd already does seemingly which seems like remaking the wheel to me, and would still require ongoing maintenance and testing. The systemd API is developed and maintained external to openssh and is there specifically to make it easier for apps that want to become daemons to be able to be used effectively in the systemd environment. I hated the fact that most flavors of Linux moved to systemd from the init system but it's what we, the end users (companies with 100's of thousands of Linux instances running) have to live with and to have Redhat make changes to *your* code to include systemd enhancements (and other vendors that don't necessarily take their codebase from Redhat) I would think would/could lead to issues (like this one) ongoing. If *you as the developers included the API access as a configurable option then *we the consumer could move to your newer codebase products sooner and get the enhancements that you folks work so diligently to make in your application which is a win-win for all of us. --- Regards, Kevin Martin On Thu, Aug 23, 2018 at 11:21 AM Emmanuel Deloget <logout at free.fr> wrote:> Hello, > > On Thu, Aug 23, 2018 at 4:53 PM kevin martin <ktmdms at gmail.com> wrote: > > > > I'm not sure I agree with Peter in respect to his comment about > "building a > > dependency to systemd". The only time a "dependency" would be created is > > when the end-user would configure it to be there with a configure time > flag > > of --with-systemd. Just having the code available and dormant without > that > > flag being provided builds in no dependency whatsoever and gives the > > end-user their option to choose. > > Not sure I should step in, but the code to deal with the user > selection and to notify systemd is a dependency - even if it's > compiled out. The fact is that you still ave to maintain it and to > test it regularly. > > The problem looks like a systemd configuration error. systemd allows > you to start a non-systemd-aware daemon. You need to look at [Service] > / Type (notify is used for systemd-aware daemons). > > BR, > > -- Emmanuel Deloget >
Damien Miller
2018-Aug-23 23:50 UTC
openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd
On Thu, 23 Aug 2018, kevin martin wrote:> I'm not sure I agree with Peter in respect to his comment about "building a > dependency to systemd".? The only time a "dependency" would be created is > when the end-user would configure it to be there with a configure time flag > of --with-systemd.? Just having the code available and dormant without that > flag being provided builds in no dependency whatsoever and gives the > end-user their option to choose.If it's in the code that we maintain then it's a dependency. I'm don't think any other definition makes sense.
Peter Stuge
2018-Aug-24 18:06 UTC
openssh 7.6 and 7.7 on Oracle Linux 7 (compiled from source) doesn't start correctly with systemd
Damien Miller wrote:> If it's in the code that we maintain then it's a dependency. I'm don't > think any other definition makes sense.I define dependency as external library or component, required at compile- and runtime. Emmanuel Deloget wrote:> Not sure I should step in, but the code to deal with the user selectionWhat user selection?> and to notify systemd is a dependency - even if it's compiled out. The > fact is that you still ave to maintain it and to test it regularly.Did you read the code I sent? It does six system calls in about 85 lines, three of which can't fail. I wrote it exactly because something so short is sufficient. The existing code in OpenSSH which sends messages with ancillary data was last touched in 2010, 8 years ago. This isn't complicated, making a compile- and runtime dependency on libsystemd doubly undesirable.> The problem looks like a systemd configuration error. systemd allows > you to start a non-systemd-aware daemon. You need to look at [Service] > / Type (notify is used for systemd-aware daemons).The discussion isn't about the basic functionality known from sysvinit, using init scripts or even inittab directly. sshd of course already runs fine on many systemd systems, and it runs fine also without debian's patch to depend on libsystemd, when only considering the bare minimum of a service manager. But that ignores the additional functionality offered by systemd to its users. I think it makes sense for sshd to support that functionality if the cost of doing so is low. I suggest that my proposed code is low cost. The implemented API (AF_UNIX message with ancillary data) is documented and there's no technical reason for it to change, so the maintenance burden will likely be similar to monitor_fdpass.c - little change in 8 years. kevin martin wrote:> seems like remaking the wheel to meIt is, but I think avoiding the library dependency is a good reason to do so, especially considering how little code is needed.> would still require ongoing maintenance and testingDid you read the code?> If *you as the developers included the API access as a configurable option > then *we the consumer could move to your newer codebase products sooner and > get the enhancements that you folks work so diligently to make in your > application which is a win-win for all of us.You imply an obligation for developers to enable consumers to "move to newer codebase products" in their systems - please re-read the software license, and please remember that no such obligation exists. Over the years I've learned that open source software only works if you take responsibility for it yourself. If you fail to do so then you will inevitably have a bad experience. One way is to get a support contract (like Red Hat in this case) and then they are of course obliged to honor that. You'll get something, (a dependency on libsystemd) but since they too optimize for cost you can be pretty sure that it's not the best solution - that would come from upstream. David Newall wrote:> I'm old school and think systemd is an overly complicated abomination.I agree completely that the systemd implementation and overall ambition is overly complicated. But the unit data model is very very good, and I think systemd is a great improvement in service management compared to anything on Linux before. While we argue, Windows has proper service management since the 1990s.> The more projects that do support it, the more legitimacy it is lent.That ship has sailed. Please think more about the role of Red Hat in the Linux ecosystem. Jochen Bern wrote:> wrapper to translate as needed would seem a pragmatic solutionI consider that wrapper to be Type=forking in the service file, but anything short of explicit notification from daemon to service manager leaves the service manager without complete information about the state of the daemon. That's not great.> (*) PID file, lookup in the process table, check for a LISTEN, pattern > match in a logfile, running a dedicated *client* executable / Nagios > plugin / ${DAEMON}ctl tool for a test, throwing the daemon a > SIGAREYOUWELL/shmem/semaphore/... request, you name itNeither are both explicit and generic (across services). I think the notify socket is a good simple solution, and one that is not tied to systemd, I think it is worth supporting. //Peter