Bladt Norbert
2000-May-19 13:23 UTC
Solved: on Solaris, "couldn't wait for child '...' completion: No child processes"
> John Horne [SMTP:J.Horne at plymouth.ac.uk] wrote: > > Emanuel Borsboom <emanuel at heatdeath.org> wrote: >> Trying to install the portable OpenSSH on Solaris 2.6. Compiling from >> openssh-2.1.0.tar.gz using gcc. Compiles and installs fine. sshd >> starts fine. First connection from another system works. Child sshd is >> forked, but the parent dies and logs: >> >> May 16 11:40:56 qtrade-dev sshd[6510]: error: Couldn't wait for child >> '/usr=/bin/ls -alni' completion: No child processes >> May 16 11:40:56 qtrade-dev last message repeated 3 times >> May 16 11:40:56 qtrade-dev sshd[6510]: error: -1 Command '/usr/bin/ls >> -alni=': select() failed: Interrupted system call >> May 16 11:40:56 qtrade-dev sshd[6510]: error: Couldn't wait for child >> '/usr=/bin/ls -alni' completion: No child processes >> >[rest snipped]> I too get this on a Sun Ultra 10, Solaris 8 using SSL 0.9.5a; SSH 2.1.0and> gcc version 2.95.2. I'll take a look, but don't expect anything since I'm > not really a C programmer! (sorry)Me too on Solaris 7. However, I am a C programmer and I was able to fix it. The timeout ("interrupted system call" message above) occurs because the timeout for the entropy commands is to small (100 msec). I raised it to 2000 msec (500 msec was too small, too) and now it works without these error messages. The messages "No child process" is a consequence of the interrupted system call message. The location to fix is in config.h: /* Builtin PRNG command timeout */ #define ENTROPY_TIMEOUT_MSEC 100 I changed the original 100 to 2000, did a "make sshd" and that's it. Hope this helps, Norbert. P.S. The real fix for the next release would be to either ask for the timeout value, determine it automagically in some way or change the hard-coded value of 100 in the "configure" script to something more reasonable. -- Norbert Bladt ATAG debis Informatik, TZ1 - Z364 Industriestrasse 1, CH 3052-Zollikofen E-Mail: norbert.bladt at adi.ch Tel.: +41 31 915 3964 Fax: +41 31 915 3640
Andre Lucas
2000-May-19 14:20 UTC
Solved: on Solaris, "couldn't wait for child '...' completion: Nochild processes"
No! The error message is because I used 'error()' instead of 'debug()'. That's changed in the 2.1.0p1 release, I believe. (I don't have access to it right now.) This was discussed last week for 2.1.0 - just change error() to debug() for those two messages. If you're not using 2.1.0p1, please check it out - other bugs are fixed there too. I'll post a cure for the 'missing fixprogs' problem later today, just install ssh_prng_cmds manually until then. Don't set the timeout so high. It's too much of a delay even with the delay at 100ms. 2.1.0p1 calculates the timeout differently, too. Work is under way to reduce the builtin PRNG delay to something that you won't notice. Expect a patch soon. Those error messages are just noise. Ta, -Andre' Bladt Norbert wrote:> > > John Horne [SMTP:J.Horne at plymouth.ac.uk] wrote: > > > > Emanuel Borsboom <emanuel at heatdeath.org> wrote: > >> Trying to install the portable OpenSSH on Solaris 2.6. Compiling from > >> openssh-2.1.0.tar.gz using gcc. Compiles and installs fine. sshd > >> starts fine. First connection from another system works. Child sshd is > >> forked, but the parent dies and logs: > >> > >> May 16 11:40:56 qtrade-dev sshd[6510]: error: Couldn't wait for child > >> '/usr=/bin/ls -alni' completion: No child processes > >> May 16 11:40:56 qtrade-dev last message repeated 3 times > >> May 16 11:40:56 qtrade-dev sshd[6510]: error: -1 Command '/usr/bin/ls > >> -alni=': select() failed: Interrupted system call > >> May 16 11:40:56 qtrade-dev sshd[6510]: error: Couldn't wait for child > >> '/usr=/bin/ls -alni' completion: No child processes > >> > >[rest snipped] > > > I too get this on a Sun Ultra 10, Solaris 8 using SSL 0.9.5a; SSH 2.1.0 > and > > gcc version 2.95.2. I'll take a look, but don't expect anything since I'm > > not really a C programmer! (sorry) > Me too on Solaris 7. > However, I am a C programmer and I was able to fix it. > The timeout ("interrupted system call" message above) > occurs because the timeout for the entropy commands is > to small (100 msec). > I raised it to 2000 msec (500 msec was too small, too) > and now it works without these error messages. > The messages "No child process" is a consequence of the > interrupted system call message. > > The location to fix is in config.h: > > /* Builtin PRNG command timeout */ > #define ENTROPY_TIMEOUT_MSEC 100 > > I changed the original 100 to 2000, did a "make sshd" and that's it. > > Hope this helps, > > Norbert. > > P.S. The real fix for the next release would be to either > ask for the timeout value, determine it automagically in > some way or change the hard-coded value of 100 in the "configure" > script to something more reasonable. > > -- > Norbert Bladt > ATAG debis Informatik, TZ1 - Z364 > Industriestrasse 1, CH 3052-Zollikofen > E-Mail: norbert.bladt at adi.ch Tel.: +41 31 915 3964 Fax: +41 31 915 3640
Aran Cox
2000-May-19 14:41 UTC
SCO OS 5.0.5, issues was Re: Solved: on Solaris, "couldn't wait for child '...' completion: Nochild processes"
I am seeing these same errors when using the built-in RNG. I raised the delay as suggested and it didn't change anything on my system. I am trying to get 2.1.0 to function on SCO OS 5.0.5 using the SCO development environment. Before I get into my troubles with the couldn't wait for child errors I'll lay out what I did to get ssh-2.1.0 to run on SCO OS: Had to define MAXPATHLEN in defines.h. I defined it as 1024. I couldn't figure out where this is defined in SCO OS, but I think I found MAXPATHLEN to be defined in /udk/usr/include/sys/param.h as 1024, so I added it to define.h by hand. If HAVE_DEV_PTMX is defined, code in pty.c (function pty_alloc) is used that seems to be designed for Solaris 2.X. The header above the code is /* * This code is used e.g. on Solaris 2.x. (Note that Solaris 2.3 * also has bsd-style ptys, but they simply do not work.) */ It tries to use device names like /dev/pts000 and the code in pty_make_controlling_tty to fail. Specifically this code fails: /* Verify that we now have a controlling tty. */ fd = open("/dev/tty", O_WRONLY); if (fd < 0) error("open /dev/tty failed - could not set controlling tty: %.100s", strerror(errno)); else { close(fd); } Causing this message to be generated by the sshd when run with the -d option: error: open /dev/tty failed - could not set controlling tty: No such device or address This doesn't stop openssh from functioning, but I can't issue the resize command and that's a problem. If I alter the config.h line that defines HAVE_DEV_PTMX to: #undef HAVE_DEV_PTMX then it compiles with code that seems to work exactly as expected, choosing tty device names like /dev/ttyp8. I don't know what to think about the /dev/pts* problem. Is it possible that /dev/pts* aren't tty's? Or are SCO OpenServers /dev/pts* devices broken just as the comment states that Solaris 2.3's are? Or is it simply that there is another method for releasing/setting controlling tty's under SCO? The Couldn't wait for child error messages is generated after a failed call to waitpid. Now, in the initial sshd process the commands issued to gather entropy exit and become zombies. As a consequence the waitpid call returns as expected. In the forked sshd processes spawned to handle an incoming connection, the processes do not become zombies, they just exit causing the subsequent call to waitpid to fail. At least, that's been my experience under SCO OS 5.0.5 This behaviour is also visible under linux if you use the built-in entropy generation code. Now, SCO OS 5.0.5 also fills the log with another error message which linux doesn't show. The sshd child (again not the master daemon, just the daemons spawned to handle connections) generates these error messages: May 19 09:32:15 ohare sshd[16872]: error: Command '/bin/df -i': select() failed: Interrupted system call These errors immediately precede the couldn't wait for child messages. And I assume they are being caused by the same thing. I realize I didn't supply any patches to fix the first two issues (MAXPATHLEN, PTY stuff), I'm a bit unfamiliar with autoconf just yet and I only have access to SCO machines while I'm at work (where I have a long list of things the boss actually wants me to be working on.) However, I will be looking in to what is up with the failed waitpid calls (under linux) and can hopefully figure it out this weekend. Bladt Norbert wrote:> > > John Horne [SMTP:J.Horne at plymouth.ac.uk] wrote: > > > > Emanuel Borsboom <emanuel at heatdeath.org> wrote: > >> Trying to install the portable OpenSSH on Solaris 2.6. Compiling from > >> openssh-2.1.0.tar.gz using gcc. Compiles and installs fine. sshd > >> starts fine. First connection from another system works. Child sshd is > >> forked, but the parent dies and logs: > >> > >> May 16 11:40:56 qtrade-dev sshd[6510]: error: Couldn't wait for child > >> '/usr=/bin/ls -alni' completion: No child processes > >> May 16 11:40:56 qtrade-dev last message repeated 3 times > >> May 16 11:40:56 qtrade-dev sshd[6510]: error: -1 Command '/usr/bin/ls > >> -alni=': select() failed: Interrupted system call > >> May 16 11:40:56 qtrade-dev sshd[6510]: error: Couldn't wait for child > >> '/usr=/bin/ls -alni' completion: No child processes > >> > >[rest snipped] > > > I too get this on a Sun Ultra 10, Solaris 8 using SSL 0.9.5a; SSH 2.1.0 > and > > gcc version 2.95.2. I'll take a look, but don't expect anything since I'm > > not really a C programmer! (sorry) > Me too on Solaris 7. > However, I am a C programmer and I was able to fix it. > The timeout ("interrupted system call" message above) > occurs because the timeout for the entropy commands is > to small (100 msec). > I raised it to 2000 msec (500 msec was too small, too) > and now it works without these error messages. > The messages "No child process" is a consequence of the > interrupted system call message. > > The location to fix is in config.h: > > /* Builtin PRNG command timeout */ > #define ENTROPY_TIMEOUT_MSEC 100 > > I changed the original 100 to 2000, did a "make sshd" and that's it. > > Hope this helps, > > Norbert. > > P.S. The real fix for the next release would be to either > ask for the timeout value, determine it automagically in > some way or change the hard-coded value of 100 in the "configure" > script to something more reasonable. > > -- > Norbert Bladt > ATAG debis Informatik, TZ1 - Z364 > Industriestrasse 1, CH 3052-Zollikofen > E-Mail: norbert.bladt at adi.ch Tel.: +41 31 915 3964 Fax: +41 31 915 3640-- Aran Cox Engineering Telegroup Coralville - Coral Center
Reasonably Related Threads
- on Solaris, "couldn't wait for child '...' completion: No child processes"
- Patch: OpenSSH 2.1.0 under Solaris 8, Solaris 7 and other systems , too
- OpenSSH on Reliant UNIX
- Rhosts-RSA authentication broken
- AW: Solved: on Solaris, "couldn't wait for child '...' completion : Nochild processes"