bugzilla-daemon at bugzilla.mindrot.org
2007-Sep-17 03:02 UTC
[Bug 1363] New: sshd gets stuck: select() in packet_read_seqnr waits indefinitely
http://bugzilla.mindrot.org/show_bug.cgi?id=1363 Summary: sshd gets stuck: select() in packet_read_seqnr waits indefinitely Product: Portable OpenSSH Version: 4.2p1 Platform: All URL: http://marc.info/?t=117394251600035 OS/Version: All Status: NEW Keywords: patch Severity: major Priority: P2 Component: sshd AssignedTo: bitbucket at mindrot.org ReportedBy: openssh at fjarlq.com Created an attachment (id=1348) --> (http://bugzilla.mindrot.org/attachment.cgi?id=1348) latest version of fix -- this has been tested This bug was discussed on openssh-unix-dev in March 2007: http://marc.info/?t=117394251600035 During the discussion, Darren Tucker created a fix for the problem and I (Matt Day) revised and tested it. The latest version of the patch is attached. Original problem report: I'm having a problem where sshd login sessions are occasionally (as often as once a day) getting stuck indefinitely. I enabled debug messages and got a backtrace of a stuck sshd, and I think I've found the bug. sshd version: OpenSSH_4.2p1 FreeBSD-20050903, OpenSSL 0.9.7e-p1 25 Oct 2004 Uncommented lines (ie. nondefault settings) in sshd_config: LogLevel DEBUG ClientAliveInterval 90 Subsystem sftp /usr/libexec/sftp-server SSH client: PuTTY version 0.58, default settings OS/HW: FreeBSD 6.1-RELEASE running on 64-bit x86 ("amd64" platform) Executive summary: The select() in packet_read_seqnr() waits indefinitely, resulting in stuck SSH sessions when networking problems interfere with key exchange. Would like to be able to set a timeout there, or send SSH keepalives during key exchange. Periodically (every 60 minutes) the SSH client initiates rekeying via key exchange. Here's an example of a successful rekeying: Mar 11 19:02:35 SSH2_MSG_KEXINIT received Mar 11 19:02:35 SSH2_MSG_KEXINIT sent Mar 11 19:02:35 kex: client->server aes256-ctr hmac-sha1 none Mar 11 19:02:35 kex: server->client aes256-ctr hmac-sha1 none Mar 11 19:02:35 SSH2_MSG_KEX_DH_GEX_REQUEST_OLD received Mar 11 19:02:35 SSH2_MSG_KEX_DH_GEX_GROUP sent Mar 11 19:02:35 expecting SSH2_MSG_KEX_DH_GEX_INIT Mar 11 19:02:38 SSH2_MSG_KEX_DH_GEX_REPLY sent Mar 11 19:02:38 set_newkeys: rekeying Mar 11 19:02:38 SSH2_MSG_NEWKEYS sent Mar 11 19:02:38 expecting SSH2_MSG_NEWKEYS Mar 11 19:02:38 set_newkeys: rekeying Mar 11 19:02:38 SSH2_MSG_NEWKEYS received In the failure case, sshd gets stuck during key exchange. The SSH session had been going fine for many hours, and then these were the last messages it logged: Mar 11 20:02:38 SSH2_MSG_KEXINIT received Mar 11 20:02:38 SSH2_MSG_KEXINIT sent Mar 11 20:02:38 kex: client->server aes256-ctr hmac-sha1 none Mar 11 20:02:38 kex: server->client aes256-ctr hmac-sha1 none Mar 11 20:02:38 SSH2_MSG_KEX_DH_GEX_REQUEST_OLD received Mar 11 20:02:38 SSH2_MSG_KEX_DH_GEX_GROUP sent Mar 11 20:02:38 expecting SSH2_MSG_KEX_DH_GEX_INIT The user was idle when this happened, but had a program running that was generating output. That program became tty-blocked after about 30 minutes, presumably because sshd wasn't draining its output, and that's when I noticed the user's sshd was stuck and got a backtrace: (gdb) where #0 0x.. in select () from /lib/libc.so.6 #1 0x.. in packet_read_seqnr () from /usr/lib/libssh.so.3 #2 0x.. in packet_read () from /usr/lib/libssh.so.3 #3 0x.. in packet_read_expect () from /usr/lib/libssh.so.3 #4 0x.. in kexgex_server (kex=0x538900) at kexgexs.c:99 #5 0x.. in kex_setup () from /usr/lib/libssh.so.3 #6 0x.. in kex_input_kexinit () from /usr/lib/libssh.so.3 #7 0x.. in dispatch_run () from /usr/lib/libssh.so.3 #8 0x.. in process_buffered_input_packets () at serverloop.c:475 #9 0x.. in server_loop2 (authctxt=0x4) at serverloop.c:760 #10 0x.. in do_authenticated2 (authctxt=0x4) at session.c:2456 #11 0x.. in do_authenticated (authctxt=0x53a400) at session.c:227 #12 0x.. in main at sshd.c:1749 This backtrace agrees with the debug messages: it's in kexgex_server(), calling packet_read_expect(SSH2_MSG_KEX_DH_GEX_INIT), which ultimately calls select() from packet_read_seqnr(). The select call in packet_read_seqnr passes NULL for a timeout, meaning it will wait forever. That explains why the comment above it says "Note that no other data is processed until this returns, so this function should not be used during the interactive session." But, this was an interactive session. I've set ClientAliveInterval in sshd_config so that SSH sessions die in a timely manner when networking problems arise, but the keepalive is apparently not sent during key exchange. The default TCP keepalive on FreeBSD is unhelpful here; it only kicks in after 2 hours, and I need stuck SSH sessions to die a lot sooner. I want to keep the FreeBSD TCP keepalive defaults. Would it be possible for the select() in packet_read_seqnr to use an optional timeout? Similarly, I believe the select() in packet_write_wait has the same problem. Upon timeout, it would be fine with me if the session died with an error logged. Alternatively, if SSH keepalives were sent during key exchange, that would suffice. -- Configure bugmail: http://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2007-Sep-17 08:37 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
http://bugzilla.mindrot.org/show_bug.cgi?id=1363 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |djm at mindrot.org, | |dtucker at zip.com.au -- Configure bugmail: http://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2007-Sep-17 08:54 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
http://bugzilla.mindrot.org/show_bug.cgi?id=1363 --- Comment #1 from Damien Miller <djm at mindrot.org> 2007-09-17 18:53:59 --- (From update of attachment 1348)>+ while ((ret = select(connection_in + 1, setp, NULL, NULL, >+ packet_wait_tvp)) == -1 &&I think you need to reset packet_wait_tv before each select call. Linux at least will modify the timeout parameter to the contain the amount of time remaining, and apparently this behaviour is permitted by POSIX. -- Configure bugmail: http://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2007-Sep-17 12:17 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
http://bugzilla.mindrot.org/show_bug.cgi?id=1363 Tomas Mraz <t8m at centrum.cz> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |t8m at centrum.cz -- Configure bugmail: http://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2007-Sep-17 23:30 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
http://bugzilla.mindrot.org/show_bug.cgi?id=1363 --- Comment #2 from Matt Day <openssh at fjarlq.com> 2007-09-18 09:30:37 --- (In reply to comment #1)> I think you need to reset packet_wait_tv before each select call. Linux > at least will modify the timeout parameter to the contain the amount of > time remaining, and apparently this behaviour is permitted by POSIX.Good observation -- I agree. In addition to the patch, this sort of select-loop with timeout appears in the following OpenSSH 4.7 places: * conloop() (ssh-keyscan.c) * timeout_connect (sshconnect.c) I'm not familiar with that code, but must not be portable for the same reason. In the patch, the purpose of the select-loop is to wait for the socket to become ready. In OpenSSH 4.7, the packet.c routines wait forever. So the patch introduces a timeout which is enabled on systems using the SSH keepalive. So, I think the patch is actually *correct* on Linux. On systems like FreeBSD that do not change the select-timeval, the select-loop would start the timer over from the beginning each time EAGAIN or EINTR occurred. So if signals kept going off, sshd could still get stuck indefinitely in that loop. So I believe the correct fix should keep track of its own time and pass the correct amount of remaining time period into select(). -- Configure bugmail: http://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2007-Sep-17 23:41 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
http://bugzilla.mindrot.org/show_bug.cgi?id=1363 --- Comment #3 from Damien Miller <djm at mindrot.org> 2007-09-18 09:41:14 ---> In addition to the patch, this sort of select-loop with timeout appears > in the following OpenSSH 4.7 places: > * conloop() (ssh-keyscan.c)I'm not so fussed about this one.> * timeout_connect (sshconnect.c)This is fixed in CVS -current. See: http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ssh/sshconnect.c.diff?r1=1.201&r2=1.202> So, I think the patch is actually *correct* on Linux. On systems like > FreeBSD that do not change the select-timeval, the select-loop would > start the timer over from the beginning each time EAGAIN or EINTR > occurred. So if signals kept going off, sshd could still get stuck > indefinitely in that loop.Actually, the patch is incorrect everywhere and possibly badly broken on Linux. On Linux, the timeout will get smaller and smaller and will eventually timeout an active connection. On other platforms the behavior is undefined. The logic that needs to be implemented is: 1. Decrease the timeout on EINTR or EAGAIN and retry the select 2. Reset the timeout on a successful read -- Configure bugmail: http://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2007-Sep-17 23:54 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
http://bugzilla.mindrot.org/show_bug.cgi?id=1363 --- Comment #4 from Matt Day <openssh at fjarlq.com> 2007-09-18 09:54:03 --- (In reply to comment #3)> Actually, the patch is incorrect everywhere and possibly badly broken > on Linux. On Linux, the timeout will get smaller and smaller and will > eventually timeout an active connection. On other platforms the > behavior is undefined.Ah, you're right of course. I forgot the timeout value is being stored in a global by packet_set_timeout, which is only being called at session establishment time. -- Configure bugmail: http://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2007-Sep-18 00:50 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
http://bugzilla.mindrot.org/show_bug.cgi?id=1363 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED AssignedTo|bitbucket at mindrot.org |djm at mindrot.org Attachment #1348 is|0 |1 obsolete| | Attachment #1350| |ok? Flag| | --- Comment #5 from Damien Miller <djm at mindrot.org> 2007-09-18 10:50:15 --- Created an attachment (id=1350) --> (http://bugzilla.mindrot.org/attachment.cgi?id=1350) revised patch This implements the logic I described in comment #3 -- Configure bugmail: http://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching someone on the CC list of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2007-Sep-18 06:41 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
http://bugzilla.mindrot.org/show_bug.cgi?id=1363 --- Comment #6 from Tomas Mraz <t8m at centrum.cz> 2007-09-18 16:41:46 --- Shouldn't the ms_to_timeval(&timeout, ms_remain); be called inside the for(;;) loop? As you said the contents of the timeout is undefined on some platforms after returning from the select(). -- Configure bugmail: http://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2007-Sep-18 06:53 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
http://bugzilla.mindrot.org/show_bug.cgi?id=1363 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1350 is|0 |1 obsolete| | Attachment #1351| |ok? Flag| | Attachment #1350|ok? | Flag| | --- Comment #7 from Damien Miller <djm at mindrot.org> 2007-09-18 16:53:24 --- Created an attachment (id=1351) --> (http://bugzilla.mindrot.org/attachment.cgi?id=1351) Set remain_ms correctly You are right: that diff only reset the timeout on successful reads, but not on interrupted select()s. One concern though: does this impose a timelimit of (ClientAliveInterval * ClientAliveCountMax) on userauth? I'm not sure whether that is desirable, because we already have LoginGraceTime for that. I'm not sure what the solution is there, perhaps set the timeout to MAX(LoginGraceTime, ClientAliveInterval * ClientAliveCountMax), turn the timeout off after KEX, or never turn it on in the server to begin with? -- Configure bugmail: http://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2007-Nov-03 00:43 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
https://bugzilla.mindrot.org/show_bug.cgi?id=1363 --- Comment #8 from Damien Miller <djm at mindrot.org> 2007-11-03 11:43:00 --- (In reply to comment #7) I think the solution is not to enable the timeout in the server at all - it already has LoginGraceTime that kicks off right after accept(). So, just ignore the sshd.c hunk of this patch. -- Configure bugmail: https://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2007-Nov-03 01:06 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
https://bugzilla.mindrot.org/show_bug.cgi?id=1363 --- Comment #9 from Darren Tucker <dtucker at zip.com.au> 2007-11-03 12:06:20 --- (In reply to comment #8)> (In reply to comment #7) > I think the solution is not to enable the timeout in the server at all > - it already has LoginGraceTime that kicks off right after accept(). > > So, just ignore the sshd.c hunk of this patch.The problem also exists in the session (eg during rekey) and LoginGraceTime doesn't help there. We could set the timeout to MAX(ClientAliveInterval * ClientAliveCountMax, LoginGraceTime) before auth, and ClientAliveInterval * ClientAliveCountMax afterward. -- Configure bugmail: https://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2007-Dec-31 11:59 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
https://bugzilla.mindrot.org/show_bug.cgi?id=1363 Darren Tucker <dtucker at zip.com.au> changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1353 -- Configure bugmail: https://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2008-Jan-20 01:17 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
https://bugzilla.mindrot.org/show_bug.cgi?id=1363 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1351 is|0 |1 obsolete| | Attachment #1446| |ok? Flag| | Attachment #1351|ok? | Flag| | --- Comment #10 from Damien Miller <djm at mindrot.org> 2008-01-20 12:17:53 --- Created an attachment (id=1446) --> (http://bugzilla.mindrot.org/attachment.cgi?id=1446) activate server-side timeout after auth completes Almost identical patch, but activate the sshd timeout right after the LoginGraceTime alarm is cancelled. I.e. LoginGraceTime will cover initial KEX and auth, and this new timeout will cover the rest of the running protocol (including re-KEX). client behaviour is unchanged -- Configure bugmail: https://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2008-Mar-31 04:09 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
https://bugzilla.mindrot.org/show_bug.cgi?id=1363 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks|1353 | -- Configure bugmail: https://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2008-Mar-31 04:11 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
https://bugzilla.mindrot.org/show_bug.cgi?id=1363 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1452 -- Configure bugmail: https://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2008-Jun-12 19:23 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
https://bugzilla.mindrot.org/show_bug.cgi?id=1363 Darren Tucker <dtucker at zip.com.au> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1446| |ok+ Flag| | -- Configure bugmail: https://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2008-Jun-12 20:45 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
https://bugzilla.mindrot.org/show_bug.cgi?id=1363 Darren Tucker <dtucker at zip.com.au> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED --- Comment #11 from Darren Tucker <dtucker at zip.com.au> 2008-06-13 06:45:17 --- Thanks all. This has been applied and will be in the 5.1 release as well as tomorrow's snapshop. -- Configure bugmail: https://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the assignee of the bug.
bugzilla-daemon at bugzilla.mindrot.org
2008-Jul-22 02:19 UTC
[Bug 1363] sshd gets stuck: select() in packet_read_seqnr waits indefinitely
https://bugzilla.mindrot.org/show_bug.cgi?id=1363 Damien Miller <djm at mindrot.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED --- Comment #12 from Damien Miller <djm at mindrot.org> 2008-07-22 12:19:44 --- Mass update RESOLVED->CLOSED after release of openssh-5.1 -- Configure bugmail: https://bugzilla.mindrot.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching someone on the CC list of the bug. You are watching the assignee of the bug.