Hi folks, I have successfully compiled and run OpenSSH 4.1p1 on NCR MPRAS: $ uname -a UNIX_SV support1 4.0 3.0 3446 Pentium Pro(TM)-EISA/PCI $ However, I have found one pretty critical problem, arising from the way that MPRAS handles changes to the IP stack. Background: To update any of the IP or TCP configuration options, system administrators should use the program "tcpconfig". This prompts the admin for a bunch of options, and then, once they have been confirmed, it resets the ENTIRE IP stack, and applies the new settings. Clearly, this is not something that should be done remotely! However, the net result of this is that OpenSSH generates thousands of error messages in the "accept" loop, filling up the (prehistoric) syslog facility that has no concept of "previous message repeated 122342234 times"! This continues even after the IP stack has completed reloading. Question: Would it be unreasonable to add some kind of loop counter that tracks the number of consecutive accept errors, and if it reaches a certain threshold, restarts the daemon from a suitable point? Maybe it could only apply to MPRAS, if this is not a good option for other systems? Many thanks for your consideration. Rogan P.S. I did look at the diffs between 4.1p1 and current, and there does not seem to have been any activity in this regard. P.P.S Please copy me on responses, I am not subscribed to this list.
On Wed, Nov 16, 2005 at 01:21:19PM +0200, Rogan Dawes wrote:> To update any of the IP or TCP configuration options, system > administrators should use the program "tcpconfig". This prompts the > admin for a bunch of options, and then, once they have been confirmed, > it resets the ENTIRE IP stack, and applies the new settings. Clearly, > this is not something that should be done remotely! > > However, the net result of this is that OpenSSH generates thousands of > error messages in the "accept" loop, filling up the (prehistoric) syslog > facility that has no concept of "previous message repeated 122342234 times"! > > This continues even after the IP stack has completed reloading. > > Question: > > Would it be unreasonable to add some kind of loop counter that tracks > the number of consecutive accept errors, and if it reaches a certain > threshold, restarts the daemon from a suitable point?It would not be hard, but it seems like an awful hack. How do the native utilities behave under those circumstances? The following patch ought to do it (against 4.2p1). You will need to either rebuild configure with autoconf or add "-DBROKEN_ACCEPT" to your CFLAGS. Index: configure.ac ==================================================================RCS file: /usr/local/src/security/openssh/cvs/openssh_cvs/configure.ac,v retrieving revision 1.292 diff -u -p -r1.292 configure.ac --- configure.ac 31 Aug 2005 16:59:49 -0000 1.292 +++ configure.ac 16 Nov 2005 12:44:59 -0000 @@ -418,6 +418,7 @@ mips-sony-bsd|mips-sony-newsos4) AC_DEFINE(SETEUID_BREAKS_SETUID) AC_DEFINE(BROKEN_SETREUID) AC_DEFINE(BROKEN_SETREGID) + AC_DEFINE(BROKEN_ACCEPT, 1, [broken accept]) ;; *-sni-sysv*) # /usr/ucblib MUST NOT be searched on ReliantUNIX Index: sshd.c ==================================================================RCS file: /usr/local/src/security/openssh/cvs/openssh_cvs/sshd.c,v retrieving revision 1.313 diff -u -p -r1.313 sshd.c --- sshd.c 26 Jul 2005 11:54:56 -0000 1.313 +++ sshd.c 16 Nov 2005 12:44:25 -0000 @@ -1434,6 +1434,14 @@ main(int ac, char **av) error("accept: %.100s", strerror(errno)); continue; } +#ifdef BROKEN_ACCEPT + if (errno == ENXIO) { + static int enxio_count = 0; + + if (enxio_count++ > 10000) + received_sighup = 1; + } +#endif if (unset_nonblock(newsock) == -1) { close(newsock); continue; -- Darren Tucker (dtucker at zip.com.au) GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69 Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.
Darren Tucker wrote:> On Wed, Nov 16, 2005 at 01:21:19PM +0200, Rogan Dawes wrote: > >>To update any of the IP or TCP configuration options, system >>administrators should use the program "tcpconfig". This prompts the >>admin for a bunch of options, and then, once they have been confirmed, >>it resets the ENTIRE IP stack, and applies the new settings. Clearly, >>this is not something that should be done remotely! >> >>However, the net result of this is that OpenSSH generates thousands of >>error messages in the "accept" loop, filling up the (prehistoric) syslog >>facility that has no concept of "previous message repeated 122342234 times"! >> >>This continues even after the IP stack has completed reloading. >> >>Question: >> >>Would it be unreasonable to add some kind of loop counter that tracks >>the number of consecutive accept errors, and if it reaches a certain >>threshold, restarts the daemon from a suitable point? > > > It would not be hard, but it seems like an awful hack. How do the > native utilities behave under those circumstances? >They seem to handle it well enough, apparently. The alternative is to run OpenSSH from inetd, if I can find out how to make my changes persistent! It may be cleaner in the long run. Thanks for the speedy patch. I thought that the received_sighup would have a role to play somehow. Rogan
On Wed, Nov 16, 2005 at 11:49:43PM +1100, Darren Tucker wrote:> The following patch ought to do it (against 4.2p1). You will need > to either rebuild configure with autoconf or add "-DBROKEN_ACCEPT" > to your CFLAGS.And now a patch that has a chance of working.... Index: configure.ac ==================================================================RCS file: /usr/local/src/security/openssh/cvs/openssh_cvs/configure.ac,v retrieving revision 1.292 diff -u -p -r1.292 configure.ac --- configure.ac 31 Aug 2005 16:59:49 -0000 1.292 +++ configure.ac 16 Nov 2005 12:44:59 -0000 @@ -418,6 +418,7 @@ mips-sony-bsd|mips-sony-newsos4) AC_DEFINE(SETEUID_BREAKS_SETUID) AC_DEFINE(BROKEN_SETREUID) AC_DEFINE(BROKEN_SETREGID) + AC_DEFINE(BROKEN_ACCEPT, 1, [broken accept]) ;; *-sni-sysv*) # /usr/ucblib MUST NOT be searched on ReliantUNIX Index: sshd.c ==================================================================RCS file: /usr/local/src/security/openssh/cvs/openssh_cvs/sshd.c,v retrieving revision 1.313 diff -u -p -r1.313 sshd.c --- sshd.c 26 Jul 2005 11:54:56 -0000 1.313 +++ sshd.c 16 Nov 2005 12:57:14 -0000 @@ -1429,6 +1429,14 @@ main(int ac, char **av) fromlen = sizeof(from); newsock = accept(listen_socks[i], (struct sockaddr *)&from, &fromlen); +#ifdef BROKEN_ACCEPT + if (errno == ENXIO) { + static int enxio_count = 0; + + if (enxio_count++ > 10000) + received_sighup = 1; + } +#endif if (newsock < 0) { if (errno != EINTR && errno != EWOULDBLOCK) error("accept: %.100s", strerror(errno)); -- Darren Tucker (dtucker at zip.com.au) GPG key 8FF4FA69 / D9A3 86E9 7EEE AF4B B2D4 37C9 C982 80C7 8FF4 FA69 Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.