Three possibly related bugs to report. N.B. The test machines in question are in peak form (with the exception of different kernel versions) and were working 100% under the old ssh 1.2.x. The two clients we tested from are machines running 2.2.13 & 2.2.14preX Linux kernels. The server where the problems appeared is running 2.2.12. 1. sshd dies periodically. The crash occurred just after a connect immediately followed by a hang-up. It is unclear if that is relevant. It could be coincidental. I have only examined the logs for this one failure (there have been 2-3 others but we just restarted the daemon). This failure was seen after BB (Big Brother) has been probing ssh for several days. BB probes sshd to see if it responds and when it does it promptly hangs up without negotiating a connection. In response to this rude hang-up sshd usually logs a warning and goes back to waiting... For some reason every couple of days it decides to die. I grabbed the log except below at the last crash. /dev/urandom is in use by other things on the system without difficulties (to my knowledge anyways...). Nov 21 20:59:20 aserver sshd[4059]: Connection from x.x.x.170 port 2222 Nov 21 20:59:20 aserver sshd[4059]: fatal: Bad protocol version identification: quit Nov 21 20:59:56 aserver sshd[4047]: Closing connection to x.x.x.18 Nov 21 21:04:28 aserver sshd[4092]: Connection from x.x.x.170 port 2258 Nov 21 21:04:28 aserver sshd[4092]: fatal: Bad protocol version identification: quit Nov 21 21:04:28 aserver sshd[24736]: fatal: Couldn't read from random pool "/dev/urandom":Interrupted system call ^^ After this we get a page from BB indicating ssh has given up the ghost... 2. sshd will sometimes hang when disconnecting from a server. -ssh host -we do some work -we hit CTRL-D to disconnect -we logout on remote system -ssh does not disconnect from remote system and will stay hung indefinitely (an ps -axuww shows an sshd process still running on the pty.) 3. For no rhyme or reason, we occasionally get an warning message just before we get a shell prompt when connecting to some of our servers through openssh. All our test servers are running the same software build (distribution) and the same version of openssh yet only some of them occasionally see the problem. This is the message we get: chan_shutdown_read failed for #0/fd4: Transport endpoint is not connected It is not clear what relation the warning message may have to the other 2 bugs. The warning message does not seem to indicate that shell will either hang or kill the parent sshd. I am willing to test various things to try and help isolate the problem(s). I'm open to suggestions... Regards, Rob -- ----------------"Linux the choice of a GNU Generation!"----------------- Robert Hardy C.E.O. Webcon Inc. rhardy at webcon.net PGP Key available by finger (613) 276-6206
I neglected to mention in my previous email that all machines are running our own distribution which is basically a customized version of Redhat 6.0 with alot of updates & patches. Regards, Rob -- ----------------"Linux the choice of a GNU Generation!"----------------- Robert Hardy C.E.O. Webcon Inc. rhardy at webcon.net PGP Key available by finger (613) 276-6206
On Sun, 21 Nov 1999, Robert Hardy wrote:> Three possibly related bugs to report. N.B. The test machines in question > are in peak form (with the exception of different kernel versions) and were > working 100% under the old ssh 1.2.x. The two clients we tested from are > machines running 2.2.13 & 2.2.14preX Linux kernels. The server where the > problems appeared is running 2.2.12. > > 1. sshd dies periodically. The crash occurred just after a connectCan you try the following patch and tell me if it makes a difference? Index: helper.c ==================================================================RCS file: /var/cvs/openssh/helper.c,v retrieving revision 1.6 diff -u -r1.6 helper.c --- helper.c 1999/11/22 02:55:36 1.6 +++ helper.c 1999/11/22 03:59:24 @@ -130,9 +129,12 @@ #endif /* HAVE_EGD */ - c = read(random_pool, buf, len); - if (c == -1) - fatal("Couldn't read from random pool \"%s\": %s", RANDOM_POOL, strerror(errno)); + do { + c = read(random_pool, buf, len); + + if ((c == -1) && (errno != EINTR)) + fatal("Couldn't read from random pool \"%s\": %s", RANDOM_POOL, strerror(errno)); + } while (c == -1); if (c != len) fatal("Short read from random pool \"%s\"", RANDOM_POOL);> 2. sshd will sometimes hang when disconnecting from a server. > -ssh host > -we do some work > -we hit CTRL-D to disconnect > -we logout on remote system > -ssh does not disconnect from remote system and will stay hung indefinitely > (an ps -axuww shows an sshd process still running on the pty.)Any ideas on how to trigger the hang?> I am willing to test various things to try and help isolate the problem(s). > I'm open to suggestions...If you can be bothered, a gdb trace of problem #2 from the client and server would be a godsend. Regards, Damien -- | "Bombay is 250ms from New York in the new world order" - Alan Cox | Damien Miller - http://www.mindrot.org/ | Email: djm at mindrot.org (home) -or- djm at ibs.com.au (work)
hi, who says this? client? sever? can you provide debugging output from 'ssh -v' and/or 'sshd -d'? these messages are related to port/agent/x11-forwarding, please provide more info. On Sun, Nov 21, 1999 at 10:33:47PM -0500, Robert Hardy wrote:> 3. For no rhyme or reason, we occasionally get an warning message just > before we get a shell prompt when connecting to some of our servers > through openssh. All our test servers are running the same software build > (distribution) and the same version of openssh yet only some of them > occasionally see the problem. This is the message we get: > chan_shutdown_read failed for #0/fd4: Transport endpoint is not connected > > It is not clear what relation the warning message may have to the other 2 > bugs. The warning message does not seem to indicate that shell will > either hang or kill the parent sshd.
Possibly Parallel Threads
- mysterious crash of a particular worker
- Openssh 1.2pre15: Command terminated on sig. 11
- DO NOT REPLY [Bug 3740] New: --delete with -a failes with warning (though -a is supposed to imply -r)
- Patch to enable multiple possible sources of entropy
- ANNOUNCE: 1.2pre14