FYI, We've been using the 2.4.7pre1 release for several days now, with nary a hang problem. We haven't seen the EOF bug at all, which was what we upgraded for. This is with transfers of as much as 50GB to set up an initial mirror. The only thing we did was set timeout=0 -- which I guess is unnecessary. The semantics of this flag are a bit unclear. We thought was 'time since response from a server', but it seems to be total runtime of rsync. So, if it's set at 2 hours and it takes more than that to perform a sync, it will crap out, even if the sync is proceeeding normally. Is this all by design? If so, the documentation could be a tad clearer. Thanks, though, to everyone involved in the latest fixes. Regards, Carey Jung
Carey Jung [carey@itfreedom.com] writes:> The only thing we did was set timeout=0 -- which I guess is unnecessary. > The semantics of this flag are a bit unclear. We thought was 'time since > response from a server', but it seems to be total runtime of rsync.Yes, it's supposed to represent an I/O timeout (that is lack of communication for that long), not overall runtime of rsync. There was an old bug that did have the behavior of effectively making this into an overall process timeout though. In receive mode, it's really the child doing the I/O and thus needing to check the timeout. But the parent process used read_int() to wait for the child which in turn applied the same timeout and was effectively a timeout for the overall process execution. A small patch to main.c was proposed by Neil Schellenberger [nschelle@crosskeys.com] on this list back in June of 2000 - perhaps it never actually made it into the development tree. Or perhaps it was resolved some other way, although from your comments I'm guessing not. I've been running with it locally applied ever since then without a problem (and we definitely run with timeouts on all of our uses). I don't have Neil's original mail handy at the moment but have enclosed a context diff from when I applied it to my 2.4.3 based tree. It still applies cleanly to the latest CVS sources. -- David /-----------------------------------------------------------------------\ \ David Bolen \ E-mail: db3l@fitlinxx.com / | FitLinxx, Inc. \ Phone: (203) 708-5192 | / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \-----------------------------------------------------------------------/ - - - - - - - - - - - - - - - - - - - - - - - - - RCS file: e:/binaries/cvs/ni/bin/rsync/main.c,v retrieving revision 1.2 retrieving revision 1.3 diff -c -r1.2 -r1.3 *** main.c 2000/05/30 18:34:55 1.2 --- main.c 2000/06/30 19:10:32 1.3 *************** *** 279,289 **** --- 279,291 ---- int status=0; int recv_pipe[2]; int error_pipe[2]; + int io_timeout_save = -1; extern int preserve_hard_links; extern int delete_after; extern int recurse; extern int delete_mode; extern int remote_version; + extern int io_timeout; if (preserve_hard_links) init_hard_links(flist); *************** *** 339,344 **** --- 341,349 ---- io_set_error_fd(error_pipe[0]); + io_timeout_save = io_timeout; + io_timeout = 0; /* child is managing timeouts */ + generate_files(f_out,flist,local_name,recv_pipe[0]); read_int(recv_pipe[0]); *************** *** 348,353 **** --- 353,360 ---- write_int(f_out, -1); } io_flush(); + + io_timeout = io_timeout_save; kill(pid, SIGUSR2); wait_process(pid, &status);
Thanks. I don't feel so alone, now. :-)> -----Original Message----- > From: David Bolen [mailto:db3l@fitlinxx.com] > Sent: Wednesday, September 05, 2001 9:54 PM > To: 'Carey Jung' > Cc: rsync list > Subject: RE: Feedback on 2.4.7pre1 > > > Carey Jung [carey@itfreedom.com] writes: > > > The only thing we did was set timeout=0 -- which I guess is unnecessary. > > The semantics of this flag are a bit unclear. We thought was > 'time since > > response from a server', but it seems to be total runtime of rsync. > > Yes, it's supposed to represent an I/O timeout (that is lack of > communication for that long), not overall runtime of rsync. There was > an old bug that did have the behavior of effectively making this into > an overall process timeout though. In receive mode, it's really the > child doing the I/O and thus needing to check the timeout. But the > parent process used read_int() to wait for the child which in turn > applied the same timeout and was effectively a timeout for the overall > process execution. > > A small patch to main.c was proposed by Neil Schellenberger > [nschelle@crosskeys.com] on this list back in June of 2000 - perhaps > it never actually made it into the development tree. Or perhaps it > was resolved some other way, although from your comments I'm guessing > not. I've been running with it locally applied ever since then without > a problem (and we definitely run with timeouts on all of our uses). > > I don't have Neil's original mail handy at the moment but have > enclosed a context diff from when I applied it to my 2.4.3 based tree. > It still applies cleanly to the latest CVS sources. > > -- David > > /-----------------------------------------------------------------------\ > \ David Bolen \ E-mail: db3l@fitlinxx.com / > | FitLinxx, Inc. \ Phone: (203) 708-5192 | > / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ > \-----------------------------------------------------------------------/ > > - - - - - - - - - - - - - - - - - - - - - - - - - > > RCS file: e:/binaries/cvs/ni/bin/rsync/main.c,v > retrieving revision 1.2 > retrieving revision 1.3 > diff -c -r1.2 -r1.3 > *** main.c 2000/05/30 18:34:55 1.2 > --- main.c 2000/06/30 19:10:32 1.3 > *************** > *** 279,289 **** > --- 279,291 ---- > int status=0; > int recv_pipe[2]; > int error_pipe[2]; > + int io_timeout_save = -1; > extern int preserve_hard_links; > extern int delete_after; > extern int recurse; > extern int delete_mode; > extern int remote_version; > + extern int io_timeout; > > if (preserve_hard_links) > init_hard_links(flist); > *************** > *** 339,344 **** > --- 341,349 ---- > > io_set_error_fd(error_pipe[0]); > > + io_timeout_save = io_timeout; > + io_timeout = 0; /* child is managing timeouts */ > + > generate_files(f_out,flist,local_name,recv_pipe[0]); > > read_int(recv_pipe[0]); > *************** > *** 348,353 **** > --- 353,360 ---- > write_int(f_out, -1); > } > io_flush(); > + > + io_timeout = io_timeout_save; > > kill(pid, SIGUSR2); > wait_process(pid, &status); > >