samba-bugs@samba.org
2005-Apr-27 09:22 UTC
[Bug 2654] timeout is always triggered with 2.6.4
https://bugzilla.samba.org/show_bug.cgi?id=2654 wayned@samba.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Additional Comments From wayned@samba.org 2005-04-27 02:21 ------- The generator's setting of the ignore_timeout variable is what allows the I/O code to continue updating last_io (the old method of clearing io_timeout in the generator used to leave last_io unchanged, which is no longer desireable). How short is your timeout? Are you using a delete option? I tried some tests, and couldn't duplicat the problem, so I'll need more details. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2005-Apr-27 10:01 UTC
[Bug 2654] timeout is always triggered with 2.6.4
https://bugzilla.samba.org/show_bug.cgi?id=2654 ------- Additional Comments From f+samba@congenio.de 2005-04-27 02:58 ------- (In reply to comment #1) Oh, and one more thing: I am not longer sure if the bug did not exist in pre-2.6.4 versions. I did not see it before, but I had a mass-change with the side effect of the data being appended to my files. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2005-Apr-27 10:01 UTC
[Bug 2654] timeout is always triggered with 2.6.4
https://bugzilla.samba.org/show_bug.cgi?id=2654 ------- Additional Comments From f+samba@congenio.de 2005-04-27 02:54 ------- (In reply to comment #1)> The generator's setting of the ignore_timeout variable is what allows the I/O > code to continue updating last_io (the old method of clearing io_timeout in the > generator used to leave last_io unchanged, which is no longer desireable). >Yup, I was wrong with my first guess. I tried modifying the source to reset last_io when ignore_timeout is set to 1 and the error did not go away. last_io DOES get modified, but there is more than one last_io, as I found out.> How short is your timeout? Are you using a delete option? I tried some tests, > and couldn't duplicat the problem, so I'll need more details.The timeout was big enough (3600), there was no delete, but a fuzzy option specified. The files transferred were huge (800 MByte over a 512 KBit/s link) and had been changed only in the last portion (data was appended). The timeout did not matter, BTW, when I used 300 (and 1200) seconds, the timeout occured shortly after that time, too. To isolate the problem, I added rprintf() to the statements were last_io is being modified. Since there are always two rsync client processes, which hold an instance of last_io each, it looks that there were long sequences of read_timeout() activity which were not interspersed by writefd_unbuffered() with my kind of files. I could see that last_io was set in read_timeout() over and over again and that when check_timeout() was getting called, its instance of last_io (which definitely had been set by writefd_unbuffered, as I could glean from the value) was outdated. I think this has to do with these statements, which I don't completely understand: read_timeout(): if (io_timeout && fd == sock_f_in) last_io = time(NULL); writefd_unbuffered(): if (fd == sock_f_out) { if (io_timeout) last_io = time(NULL); sleep_for_bwlimit(ret); } Each "side" (i.e. process) seems to watch only one direction of the data flow? It seems since both sides are separated, too long pauses can arise in special situations (such as mine) because data flows unidirectionally into the "unwatched" direction for the specific process side and its last_io instance. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2005-Apr-27 15:41 UTC
[Bug 2654] timeout is always triggered with 2.6.4
https://bugzilla.samba.org/show_bug.cgi?id=2654 ------- Additional Comments From wayned@samba.org 2005-04-27 08:39 ------- You mentioned using --fuzzy: are there a lot of files in your destination directories? One thing I've noticed with --fuzzy is that if rsync puts a lot of new files into a directory with thousands of old files, the name-scoring algorithm can chew up a lot of CPU. It might be possible for the connection to timeout if this slowed rsync down enough. (The --fuzzy performance into large dirs should be improved at some point.) As for who counts what, the generator only sends data down the socket, so it only notes that data in the last_io time. The receiver only reads data from the socket, so it only notes that data in the last_io time. The sender both reads and writes data from/to the socket, so it notes both. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2005-Apr-27 16:41 UTC
[Bug 2654] timeout is always triggered with 2.6.4
https://bugzilla.samba.org/show_bug.cgi?id=2654 ------- Additional Comments From f+samba@congenio.de 2005-04-27 09:26 ------- (In reply to comment #4) About --fuzzy: I doubt that this was the case. There are only up to 50 files in each directory (and from what I understood, the fuzzy matching takes place only in the file's own directory). Also, there was ongoing socket activity a few seconds before the actual timeout occured. I saw that last_io was modified every second or so in read_timeout() while the received file got bigger. Then at some point, the check_timeout failed with an older last_io time. That timestamp matched that of the last modification in write_unbuffered().> > As for who counts what, the generator only sends data down the socket, so it > only notes that data in the last_io time. The receiver only reads data from the > socket, so it only notes that data in the last_io time. The sender both reads > and writes data from/to the socket, so it notes both.Yes. The sender was an rsync daemon in my case, whereas the client was separated into a generator and a receiver. Could that lead to a problem when only binary data (i.e. no deltas) are being sent (or the opposite, where just copying on the receiving side was neccessary)? Is there a possibility that the receiver is busy (modifying its last_io all the time) but the generator sits and waits for the next file to get processed or a select() timeout occurs and then the generator runs into the timeout in check_timeout() because the last time that write_unbuffered() modified its last_io was too long ago? This would mean that effectively, --timeout ought to large enough to accommodate not only pauses arising from directory scans, checksum calculations or fuzzy matching, but even data transfer itself if no (or maybe only) deltas are present. So, --timeout must in fact be longer than the expected transfer time of the largest file over a given connection in the worst case. I would expect --timeout to be a kind of "inactivity" counter to recognize stalled IP connections. But it seems that by separating the generator and receiver, there can be LONG pauses in one direction which trigger the timeout, although the connection is alive and well in the other direction (but that fact goes unnoticed). I guess in order to enable such bahaviour, the condition "io_timeout && fd =sock_f_in" respectively "io_timeout && fd == sock_f_out" should read something like "io_timeout && (fd == sock_f_in || fd == sock_f_out)" in order to catch "ACK" packets? If the receiver and generator see only one direction of the same socket, however, only a kind of shared-memory variable "last_io" could do that trick. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2005-Apr-27 18:01 UTC
[Bug 2654] timeout is always triggered with 2.6.4
https://bugzilla.samba.org/show_bug.cgi?id=2654 ------- Additional Comments From wayned@samba.org 2005-04-27 10:47 ------- What may have happened is that the outgoing socket from the generator may have filled up completely, and the timeout on outputting new data then got exceeded waiting for the other two processes to clear out enough space to write more. I think my decision to partially enable timeouts in the generator now that we're generating ACK packets was not the right choice. Try changing this code: if (protocol_version < 29) ignore_timeout = 1; to remove the "if" and I think the timeout will go away. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2005-Apr-27 21:20 UTC
[Bug 2654] timeout is always triggered with 2.6.4
https://bugzilla.samba.org/show_bug.cgi?id=2654 ------- Additional Comments From f+samba@congenio.de 2005-04-27 14:08 ------- (In reply to comment #6)> generating ACK packets was not the right choice. Try changing this code: > > if (protocol_version < 29) > ignore_timeout = 1; > > to remove the "if" and I think the timeout will go away.That did it for me. No timeouts any more. So the bug was present only since 2.6.4 in fact. I think this one may have gone unnoticed by now since few people synchronize that much data over semi-permanent connections (i.e. dynamic IPs), so that line drops have to be detected by the --timeout option. Also, the up- and downstreams differ greatly (ADSL) in my setup, so your assumptions about queues seem logical. Thanks for the prompt fixing! -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2005-Apr-27 23:02 UTC
[Bug 2654] timeout is always triggered with 2.6.4
https://bugzilla.samba.org/show_bug.cgi?id=2654 wayned@samba.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Additional Comments From wayned@samba.org 2005-04-27 15:59 ------- Excellent! I've checked in a fix for this (and a big comment about why we like turning off the timeouts in the generator). -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.