Guy Helmer
2000-May-15 21:15 UTC
OpenSSH (1.2.3) sshd hanging when using rsync over ssh (retry)
Now that the list is said to be open again, I'm resending this. I've merged my changes into OpenSSH 2.1.0 as Kris imported it into FreeBSD over the weekend. ---------- Forwarded message ---------- Date: Thu, 4 May 2000 08:40:22 -0500 (CDT) From: Guy Helmer <ghelmer at cs.iastate.edu> To: openssh-unix-dev at mindrot.org Subject: OpenSSH (1.2.3) sshd hanging when using rsync over ssh I have debugged a problem with OpenSSH's sshd (as found in FreeBSD, based on OpenSSH 1.2.3) that has been bugging me ever since I switched from ssh-1.2.27. I use rsync (FreeBSD port ports/net/rsync) over ssh to synchronize and backup my main home directory and development directories to other systems. rsync always worked great with ssh-1.2.2[67]. Since I switched my machines to run OpenSSH's sshd, rsync over ssh would randomly hang (although the hangs were very persistent when synchronizing large files). I noticed from netstat that the connection to ssh on the sshd server machine showed waiting data in the Recv-Q, but no waiting data in the Send-Q, so I decided to look into sshd. I grabbed a core from sshd when this hang happened, and gdb showed this stack trace: #0 0x281e20c4 in write () from /usr/lib/libc.so.4 #1 0x804fb18 in process_output (writeset=0xbfbfed04) at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/serverloop.c:366 #2 0x8050029 in server_loop (pid=43486, fdin_arg=9, fdout_arg=9, fderr_arg=11) at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/serverloop.c:563 #3 0x8053b60 in do_exec_no_pty ( command=0x80750c0 "rsync --server --sender -vlgtpr --delete . /home/ghelmer/ ", pw=0xbfbfef80, display=0x806c0a0 "mocha.cs.iastate.edu:10.0", auth_proto=0x806c100 "MIT-MAGIC-COOKIE-1", auth_data=0x8075000 "cdf4b6cb730310be3d51a8abf77303fc") at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/sshd.c:2211 #4 0x805386c in do_authenticated (pw=0xbfbfef80) at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/sshd.c:2037 #5 0x80527b4 in do_authentication () at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/sshd.c:1408 #6 0x8051b43 in main (ac=1, av=0xbfbff624) at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/sshd.c:970 #7 0x804aae1 in _start () The code around frame #1 was 361 { 362 int len; 363 364 /* Write buffered data to program stdin. */ 365 if (fdin != -1 && FD_ISSET(fdin, writeset)) { 366 len = write(fdin, buffer_ptr(&stdin_buffer), 367 buffer_len(&stdin_buffer)); 368 if (len <= 0) { 369 #ifdef USE_PIPES 370 close(fdin); and stdin_buffer contains $2 = {buf = 0x80b1000 "?\004\212D\204?c?", alloc = 45056, offset = 0, end = 8192} So, it appears sshd was stuck in a write() that wouldn't complete. (Even when I kill the ssh client, sshd hangs around and never notices that the connection has gone away.) I figured this was probably something that was fixed in ssh-1.2.27, and sure enough, fdin was set to be nonblocking and errno was checked for the value EWOULDBLOCK in process_output. I added similar code to serverloop.c, and now rsync over ssh works great. I'm worried that my code is tainted, though, since I looked at the ssh-1.2.27 sources. If you don't think it is a problem, and if you are interested, I can send you my diffs... I don't have ties to OpenBSD, so I'm not sure who in particular I should contact about this. Thanks, Guy Guy Helmer, Ph.D. Candidate, Iowa State University Dept. of Computer Science Research Assistant, Dept. of Computer Science --- ghelmer at cs.iastate.edu http://www.cs.iastate.edu/~ghelmer