I'm not sure keepalive is working the way I think it's supposed to. Here's my test. I've got a large file (approx 1 gig) on the rsync server, and I append a character on the end to make it slightly different. I fire up rsync on the client side and everything works fine until I hit the timeout value. I have this problems when I have a timeout of below 90 seconds. I've tried 30, and 60 second timeout values, and still get this problem. After I increase the timeout to 90 seconds or above, it works fine. Now my understanding was that the keepalive feature was supposed to prevent this from ever happening. Am I doing something wrong? Is 60 seconds just way too short of a time to expect to put for a timeout value for very large files? Or do I miss-understand what the keepalive function is supposed to do? I've tried 2.6.4, and 2.6.5pre1 on both the client and server.
On Thu, May 12, 2005 at 04:43:55PM -0500, Steve Sether wrote:> Now my understanding was that the keepalive feature was supposed to > prevent this from ever happening.Yes, that's certainly what it's supposed to prevent.> Am I doing something wrong?Maybe. You must be using at least 2.6.4 on both ends of the connection and not overriding the protocol version below 29. (You can see what protocol was negotiated by specifying at least four -v options.) If the remote system has multiple versions of rsync installed, perhaps it is running an older one unbeknownst to you. Even with keep-alive, if you run something that takes up too much time, rsync could conceivably still timeout. For instance, using --checksum with really large files might not get to the keep-alive check often enough to make a difference. Or using --fuzzy with a lot of missing files into a large directory could be pretty slow too. (You don't mention what options you're using, so I'm going to stop guessing at what might be slowing things down.) In any case, it would be good to know at what point in the transfer it was timing out. You might try setting larger levels of verbosity, and if it still times out, let me know at what was going on at the time of the failure (perhaps attach strace to the generator process too -- it's the first (which usually means lowest) PID of the two processes on the receiving side). One potential "fix" you could try is to change the initialization of the "lull_mod" value in generator.c to 1. That would make it call maybe_send_keepalive() after every file. ..wayne..
That patch works perfectly. Thanks! On Wed, May 18, 2005 at 04:36:55PM -0700, Wayne Davison wrote:> On Wed, May 18, 2005 at 11:23:38AM -0500, Steve Sether wrote: > > But it looks to me like last_io is reset on the sender side every > > time it receives a packet (as well as when it sends a packet). > > Ah yes -- how silly of me. Yes, that will prevent my patch from working > right. I was fooled into thinking I had fixed something when I kluged > up a test using sleep calls to simulate a slow connection -- this must > have fortuitously caused the last_io value to age and allow the keep- > alive messages to happen (because I verified that the messages were sent > and received, and they did actually fix a timeout in that kluged test- > case). > > CVS now has an updated keepalive.diff in the patches dir that keeps a > separate time for the sending and receiving of data (timeouts are based > on a lack of received data, and keep-alives on a lack of sent data). > This should hopefully make things work for you: > > http://rsync.samba.org/ftp/unpacked/rsync/patches/keepalive.diff > > It did fix a timeout in a (different) test case I tried out. > > ..wayne..