I've been getting frequent io errors trying to synchronise a local CPAN mirror with the master on ftp.funet.fi, the symptoms being the dreaded rsync: connection unexpectedly closed (0 bytes read so far) rsync error: error in rsync protocol data stream (code 12) at io.c(165) message at the client end. I've replicated this when mirroring from a local CPAN mirror, and the issue seems to be that the server is timing out after it has sent the file list to the client but before the client has started transferring files. Despite what the documentation says about the default IO timeout being infinite (0), inspection of the code would seem to indicate otherwise: [io.c] /** If no timeout is specified then use a 60 second select timeout */ #define SELECT_TIMEOUT 60 : tv.tv_sec = io_timeout?io_timeout:SELECT_TIMEOUT; tv.tv_usec = 0; I haven't crawled through the initialisation code to find out exactly how io_timeout gets set, but examination of rsync in daemon mode with a debugger reveals that it is using a timeout of 60 seconds when no --timeout is specified by the client and there is no timeout value in rsyncd.conf. The consequence of this is that is the client doesn't respond within 60 seconds (and as CPAN contains >34,000 files it often doesn't), the server process exits, and the client then gets an unexpected EOF. I've checked with the admin of ftp.funet.fi, and he doesn't have a timeout set in rsyncd.conf, so it seems that the actual value being used is 60 seconds, hence the failures. Closer examination of the select code reveals other breakage even if the 60 second default problem is fixed. The manpage for select says (solaris): If the timeout argument is not a null pointer, it points to an object of type struct timeval that specifies a maximum interval to wait for the selection to complete. If the timeout argument points to an object of type struct timeval whose members are 0, select() does not block. If the timeout argument is a null pointer, select() blocks until an event causes one of the masks to be returned with a valid (non- zero) value. If the time limit expires before any event occurs that would cause one of the masks to be set to a non-zero value, select() completes successfully and returns 0. so if an infinite timeout *is* required, the struct timeval* argument to select should be NULL when io_timeout==0, and I see no code in place to do that. I'm also not clear exactly how the client and server timeout values interact, the rsyncd.conf entry says: The "timeout" option allows you to override the clients choice for IO timeout for this module, which implies that the client timeout value (if specified) is passed across the wire and is used by the server - is this really what is supposed to happen? If so, experimentation suggests that it might be broken as well. I'm happy to fix these problems if someone can confirm that I'm on the right track and my understanding is correct. I'm currently completely unable to use rsync to reliably mirror CPAN to the inside of our corporate firewall, so I have a strong vested interest in fixing these issues. Once again, please reply direct as I'm not on the list. -- Alan Burlison --
On Wed, Jul 30, 2003 at 11:23:17PM +0100, Alan Burlison wrote:> I've been getting frequent io errors trying to synchronise a local CPAN > mirror with the master on ftp.funet.fi, the symptoms being the dreaded > > rsync: connection unexpectedly closed (0 bytes read so far) > rsync error: error in rsync protocol data stream (code 12) at io.c(165) > > message at the client end. I've replicated this when mirroring from a > local CPAN mirror, and the issue seems to be that the server is timing out > after it has sent the file list to the client but before the client has > started transferring files. > > Despite what the documentation says about the default IO timeout being > infinite (0), inspection of the code would seem to indicate otherwise: > > [io.c] > /** If no timeout is specified then use a 60 second select timeout */ > #define SELECT_TIMEOUT 60 > : > tv.tv_sec = io_timeout?io_timeout:SELECT_TIMEOUT; > tv.tv_usec = 0;Look further.> I haven't crawled through the initialisation code to find out exactly how > io_timeout gets set, but examination of rsync in daemon mode with a > debugger reveals that it is using a timeout of 60 seconds when no --timeout > is specified by the client and there is no timeout value in rsyncd.conf. > > The consequence of this is that is the client doesn't respond within 60 > seconds (and as CPAN contains >34,000 files it often doesn't), the server > process exits, and the client then gets an unexpected EOF. I've checked > with the admin of ftp.funet.fi, and he doesn't have a timeout set in > rsyncd.conf, so it seems that the actual value being used is 60 seconds, > hence the failures. >The 60 second timeout is only on select. The internal io_timeout is only evaluated for connection termination in check_timeout() which does not use the SELECT_TIMEOUT. [snip]> > I'm also not clear exactly how the client and server timeout values > interact, the rsyncd.conf entry says: > > The "timeout" option allows you to override the clients choice for IO > timeout for this module, > > which implies that the client timeout value (if specified) is passed across > the wire and is used by the server - is this really what is supposed to > happen? If so, experimentation suggests that it might be broken as well.That is what really happens. The client specified timeout is passed over the wire for use by the server but if the server has a value specified in rsyncd.conf that value will override the client.> I'm happy to fix these problems if someone can confirm that I'm on the > right track and my understanding is correct. I'm currently completely > unable to use rsync to reliably mirror CPAN to the inside of our corporate > firewall, so I have a strong vested interest in fixing these issues. > > Once again, please reply direct as I'm not on the list.I'd suggest you join the list. There isn't that much traffic on it and you can un-subscribe easily. I'd also suggest the admin of that site set a timeout value. An unlimited timeout invites DOS attacks. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: jw@pegasys.ws Remember Cernan and Schmitt
> The 60 second timeout is only on select. The internal > io_timeout is only evaluated for connection termination in > check_timeout() which does not use the SELECT_TIMEOUT.Ah! That makes sense - thanks. It appears that io_timeout is set to the same value as lp_timeout(i) [clientserver.c:465], so the value that is passed to select is that specified via --timeout or rsyncd.conf, or 60 if no timeout value is specified.>> I'm also not clear exactly how the client and server timeout values >> interact, the rsyncd.conf entry says: >> >> The "timeout" option allows you to override the clients choice for IO >> timeout for this module, >> >> which implies that the client timeout value (if specified) is passed across >> the wire and is used by the server - is this really what is supposed to >> happen? If so, experimentation suggests that it might be broken as well. > > That is what really happens. The client specified timeout > is passed over the wire for use by the server but if the > server has a value specified in rsyncd.conf that value will > override the client.OK, right - thanks for the clarification. Still leaves me with the puzzle of why my rsyncs are timing out though... More investigation is obviously needed.> I'd suggest you join the list. There isn't that much > traffic on it and you can un-subscribe easily.Done.> I'd also suggest the admin of that site set a timeout value. > An unlimited timeout invites DOS attacks.I've already done that (and he has set one), but thanks for the suggestion anyway. -- Alan Burlison --
Apparently Analagous Threads
- rsync 2.5.6 globbing bug
- unexpected EOF in read_timeout (was Re[2]: [Fwd: Re: meaning of "IO Error: skipping the delete...."]])
- [Bug 2654] timeout is always triggered with 2.6.4
- unexpected EOF in read_timeout (was Re[2]: [Fwd: Re: meaning of "IO Error: skipping the delete...."]])
- Request for exclude syntax assistance