Dear all, This is just a placeholder really - I've spent quite a lot of time trying to track down the cause of a problem I'm seeing here, and have eventually got an answer. Thought it might be useful to have the solution googleable. Obviously, if someone has better ideas, I'd welcome comments. Similarly, if anyone wants detailed logs, tcpdump or strace output, I'd be happy to oblige. I'm pretty sure it's not rsync that's to blame though. The problem: I have a bunch of Linux servers, one of which is going to act as a backup repository for the others. All are running Gentoo Linux with 2.6.x kernels. The server is running 2.6.12-gentoo-r6 and my test client is running 2.6.11-gentoo-r9. I'm not really in a position to reboot any of these machines to test other kernel versions. The backup server is an old 450MHz Pentium with 100MBit Intel EEPro card while the clients are all much higher spec (Xeons, typically, with Intel E1000 Gigabit Ethernet). I'm running rsync version 2.6.0 at both ends (though have tested 2.6.6 with the same results). I'm using rsync in daemon mode at the backup server (no ssh involved) and rsync tends to die with an error of the form: rsync: writefd_unbuffered failed to write 4096 bytes: phase "unknown": Connection reset by peer rsync error: error in rsync protocol data stream (code 12) at io.c(666) Both server and client logs report "Connection reset by peer". The backup always seems to break when handling a large directory (in terms of number of files - one particular directory with 26,000 smallish files seems reliably to trigger the problem). If I do the rsync to a local disk, the error doesn't occur. When I look at tcpdump's output, I see the window size dropping to zero, indicating that the backup server is receiving data faster than it can handle it. Presumably the sending machine should then back off, but what actually appears to happen is that the connection gets dropped, hence the rsync errors at both ends. I've tried messing with --bwlimit, but the problem even occurs when I drop it down to 80K/second, which is patently ridiculous (the backup server can handle 100x that amount of network I/O). All my investigations seem to point to a TCP stack problem, but that's about as far as I can get. I've found a solution that works for me - turn off TCP window scaling on the backup server. This can be done using sysctl -w net.ipv4.tcp_window_scaling=0 There is some discussion of TCP window scaling problems at: http://lwn.net/Articles/92727/ but these discuss broken routers. My server and client are on the same subnet, so it's not exactly the same issue. Turning off window scalling is probably not a good solution in general, but it's fine for me because the backup server doesn't have any other high-throughput requirement. Even with window scaling disabled, I'm still getting 7MBytes/second from rsync, which is pretty close to saturation of the server's card. Cheers, Alun. -- Alun Jones auj@aber.ac.uk Systems Support, (01970) 62 2494 Information Services, University of Wales, Aberystwyth