samba-bugs at samba.org
2010-Oct-27 05:22 UTC
DO NOT REPLY [Bug 7757] New: with big file, rsync times out out when it should not; the sender is still responsive
https://bugzilla.samba.org/show_bug.cgi?id=7757 Summary: with big file, rsync times out out when it should not; the sender is still responsive Product: rsync Version: 3.0.7 Platform: Other OS/Version: Linux Status: NEW Severity: normal Priority: P3 Component: core AssignedTo: wayned at samba.org ReportedBy: tim.liim at alcatel-lucent.com QAContact: rsync-qa at samba.org Description of problem: When sending big file (eg. > 500MB) With --timeout=xx option, the server side of rsync timeouts out when it should not. The components of rsync (generator or receiver or server) should exchange periodic keepalive msg, so they will not timeout when the connection is perfectly healthy. Version-Release number of selected component (if applicable): rsync-3.0.7-3.fc13.i686 How reproducible: With --timeout=10: sometimes with 500MB file; almost always with 2GB file. Steps to Reproduce: 1. Assume timeout=10 sec. generate the source file, eg. i=/tmp/t55 time dd if=/dev/zero of=$i bs=1M count=1000 If it takes less than <timeout> x 3 (eg. 30 sec), increase the file size. Bigger file better illustrates this issue. 2. j=/tmp/t56 rsync --timeout=10 localhost:$i $j Actual results: Saw the following errors, depending on the mode of failure. #1 [sender] io timeout after 10 seconds -- exiting rsync error: timeout in data send/receive (code 30) at io.c(140) [sender=3.0.7] #2 rsync: writefd_unbuffered failed to write 6 bytes to socket [generator]: Broken pipe (32) rsync error: timeout in data send/receive (code 30) at io.c(1530) [generator=3.0.7] rsync error: received SIGUSR1 (code 19) at main.c(1288) [receiver=3.0.7] Expected results: The sync should complete without error; certainly no timeout. Additional info: 1. The issue is more prounced with two hosts of different speed, eg. h1 (dest) is much slower than h2 (src) (eg. h1 could be an slower machine (especially I/O bound) with LUKS; this could be simulated by strace on h1:receiver). i=/tmp/t55 j=/tmp/t56 h2: dd if=/dev/zero of=$i bs=1M count=1000 h1: rsync --timeout 10 h2:$i $j 2. From strace, the scenario with error msg in #2: - generator <g> read <dest> and sends checksum to sender <s> (--server in this case). - <g> spawns receiver <r> - <r> opens <dest> and <tmp> file, start copying from <dest> to <tmp> - <s> reads <src> file and sends msgs to <r> - when <s> finishes, it starts to wait, for (timeout/2) sec. - Assume <r> is slower, <s> timeouts out after <timeout> sec. - <g> probably needs to send keepalive msg to <s>. 3. strace for #1 is similar to #2: - after <s> finishes, <r> takes more than 10 sec to finish copying <dest> to <tmp> and then rename <tmp> <dest>, so <s> times out. 4. The issue is somewhat reproducible on one single host. But with strace on both <s> and <r>, it becomes harder to reproduce, probably because strace slows both <s> and <r> down. I need to use a much bigger file (2.5GB), and strace <r> from beginning, but strace <s> only torwards the end, to show this error with strace. 5. There are several existing bugs that may be related to this unexpected timeout issue. bug2783 Random high loads during syncs (server side) / client stream errors rsync: connection unexpectedly closed (2761332 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(365) bug5478 rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32) rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32) io timeout after 30 seconds -- exiting rsync error: timeout in data send/receive (code 30) at io.c(239) [sender=3.0.2] bug5695 improve keep-alive code to handle long-running directory scans ./io.c:void maybe_send_keepalive(void) bug6175 write last transfer status when timeout or other error happens rsync: writefd_unbuffered failed to write 4 bytes [sender]: Broken pipe (32) rsync: connection unexpectedly closed (99113 bytes received so far) [sender] rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.5] bug7195 timeout reached while sending checksums for very large files -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2010-Oct-27 05:24 UTC
DO NOT REPLY [Bug 7757] with big file, rsync times out out when it should not; the sender is still responsive
https://bugzilla.samba.org/show_bug.cgi?id=7757 ------- Comment #1 from tim.liim at alcatel-lucent.com 2010-10-27 00:24 CST ------- This issues was originally reported in Fedora 13 https://bugzilla.redhat.com/show_bug.cgi?id=645236 -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2012-Oct-23 11:26 UTC
[Bug 7757] with big file, rsync times out out when it should not; the sender is still responsive
https://bugzilla.samba.org/show_bug.cgi?id=7757 --- Comment #2 from Lo?c Gomez <samba-bugz at kyoshiro.org> 2012-10-23 11:26:55 UTC --- I ran into a similar issue recently while transferring large files (>40GB). After a few tests, it seems - in my case at least - to be related to the delta-xfer algorithm. The bug does not happen anymore with the -W option. I don't know if this will resolve your issue, but you can also try looking into these options : --no-checksum --no-compress --blocking-io. These were not the source of my problems, but the functions they're related to might rise a network timeout. I hope it helps, anyways, good luck solving your issue. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2012-Oct-31 01:32 UTC
[Bug 7757] with big file, rsync times out out when it should not; the sender is still responsive
https://bugzilla.samba.org/show_bug.cgi?id=7757 --- Comment #3 from Justin Patrin <papercrane at reversefold.com> 2012-10-31 01:32:18 UTC --- FWIW, this problem went away for me by just removing -z and adding --no-compress. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2014-Dec-26 09:18 UTC
[Bug 7757] with big file, rsync times out out when it should not; the sender is still responsive
https://bugzilla.samba.org/show_bug.cgi?id=7757 --- Comment #4 from Dominic Raferd <dominic at timedicer.co.uk> --- I can confirm that this bug still exists when using compression (-z) with rsync 3.1.0 (server and dest), and that removing compression is a good workaround. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2016-Sep-03 06:34 UTC
[Bug 7757] with big file, rsync times out out when it should not; the sender is still responsive
https://bugzilla.samba.org/show_bug.cgi?id=7757 Christopher Meng <i at cicku.me> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |i at cicku.me --- Comment #5 from Christopher Meng <i at cicku.me> --- Bug still exists in rsync 3.1.2, comfirmed with -z added. -- You are receiving this mail because: You are the QA Contact for the bug.
samba-bugs at samba.org
2017-Nov-14 19:15 UTC
[Bug 7757] with big file, rsync times out out when it should not; the sender is still responsive
https://bugzilla.samba.org/show_bug.cgi?id=7757 --- Comment #6 from roland <devzero at web.de> --- i can confirm this still exists. needed to adjust the timeout as it times out on our database backup file (90gb) when there is higher load on the database server. -- You are receiving this mail because: You are the QA Contact for the bug.
Reasonably Related Threads
- Instruction selection problems due to SelectionDAGBuilder
- Node deletion during DAG Combination ?
- [SelectionDAG] Assertion due to MachineMemOperand flags difference.
- DO NOT REPLY [Bug 2783] Random high loads during syncs (server side) / client stream errors
- DO NOT REPLY [Bug 6175] New: write last transfer status when timeout or other error happens