Some new data on my rsync hangs: I run about 1500 rsync sessions over ssh daily. In the last 8 days that adds up to about 12k rsync sessions. Of those 12k sessions, 10 are right now sitting in a hung state. The rsync process on the destination has exited, but both rsync processes on the source are still running/waiting/hung. I use a timeout of 3600 (but this doesn't seem to work for this failure mode) It is interesting to note that the 10 rsyncs that are hung are all solaris 8 destinations, yet about 1/2 of my solaris destinations work just fine. Source is linux 2.4.18 rsync 2.5.5 + generator/timeout patch (http://lists.samba.org/pipermail/rsync/2002-April/006976.html) All suns are running solaris8 OpenSSH_2.9p2. Suns that seem to fail are rsync 2.5.5 (no generator patch) and NFS destination for files. Suns that work are rsync 2.5.3 and local file destination. Is the NFS difference a clue? It seems like some delay/glitch/issue with NFS on the destination might be causing ocassional/random troubles for my rsync processes. It seems this NFS factor is something that people are bringing up more and more lately. Ideas? I'll try 2.5.5 with the generator patch on the destinations. SendQ and RecvQ are 0 on the source sockets. strace shows the parent rsync process on source is stuck in this endless loop: gettimeofday({1022796482, 605543}, NULL) = 0 wait4(8783, 0xbffffc48, WNOHANG, NULL) = 0 gettimeofday({1022796482, 605602}, NULL) = 0 gettimeofday({1022796482, 605626}, NULL) = 0 select(0, NULL, NULL, NULL, {0, 20000}) = 0 (Timeout) gettimeofday({1022796482, 625224}, NULL) = 0 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) gettimeofday({1022796482, 635262}, NULL) = 0 wait4(8783, 0xbffffc48, WNOHANG, NULL) = 0 gettimeofday({1022796482, 635316}, NULL) = 0 gettimeofday({1022796482, 635350}, NULL) = 0 strace on the child rsync process triggers both to exit: windriver:/home/tisadmin/bin # strace -p 8783 select(7, [3 4], [], NULL, NULL) = 1 (in [4]) read(4, "", 16384) = 0 close(4) = 0 select(7, [3], [3], NULL, NULL) = 1 (out [3]) write(3, "\200\300G\223K\355\2322#\220~Y5\0\210x\206~1e\240M\250"..., 32) = 32 select(7, [3], [], NULL, NULL) = 1 (in [3]) read(3, "\226f\301\271\220\200\t6\\\177\"%\3477\336^\255\n\255I"..., 8192) = 96 brk(0x809d000) = 0x809d000 close(6) = 0 select(7, [3], [3], NULL, NULL) = 1 (out [3]) write(3, "\204\355\2058\330\360\242<\313\233QMc\311\307?\322\351"..., 32) = 32 ioctl(0, TCGETS, 0xbffffa18) = -1 EINVAL (Invalid argument) fcntl64(0, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) fcntl64(0, F_SETFL, O_RDWR) = 0 ioctl(1, TCGETS, 0xbffffa18) = -1 EINVAL (Invalid argument) fcntl64(1, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) fcntl64(1, F_SETFL, O_RDWR) = 0 ioctl(2, TCGETS, 0xbffffa18) = -1 ENOTTY (Inappropriate ioctl for device) fcntl64(2, F_GETFL) = 0x8801 (flags O_WRONLY|O_NONBLOCK|O_LARGEFILE) fcntl64(2, F_SETFL, O_WRONLY|O_LARGEFILE) = 0 gettimeofday({1022796581, 661010}, NULL) = 0 shutdown(3, 2 /* send and receive */) = 0 close(3) = 0 _exit(0) = ?