samba-bugs@samba.org
2008-May-21 08:21 UTC
DO NOT REPLY [Bug 5478] New: rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 Summary: rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32) Product: rsync Version: 3.0.3 Platform: Other URL: https://bugzilla.samba.org/show_bug.cgi?id=1959 OS/Version: Linux Status: NEW Severity: normal Priority: P3 Component: core AssignedTo: wayned@samba.org ReportedBy: development--bugzilla.samba.org@medium.net QAContact: rsync-qa@samba.org After years, bug 1959 still exists in rsync 3.0.2 :-( rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32) io timeout after 30 seconds -- exiting rsync error: timeout in data send/receive (code 30) at io.c(239) [sender=3.0.2] My rsync options are "--one-file-system --archive --xattrs --delete-before --update --timeout=0". The job is to backup an entire filesystem onto a USB drive. Generating the directories-list takes nearly an hour. And some time after that, I receive this error. My working hypothesis is that the receiver-rsync cannot cope with a write() call to need more than 30 seconds, but I'm not sure. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2008-May-21 14:14 UTC
DO NOT REPLY [Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 wayned@samba.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #1 from wayned@samba.org 2008-05-21 09:14 CST ------- Bug 1959 has nothing whatsoever to do with an "io timeout". That is a value specified either by the client or by the daemon. Since the --timeout=0 value on the client didn't turn off timeouts, I have to imagine that you're using a daemon, and the daemon has a 30-second timeout specified (which the client is not allowed to override). If so, you must specify the same (or lower, but not 0) timeout to the client for the rsync client to be able to help out with keep-alive activites. A timeout of 30 seconds is also rather short, so I'd suggest increasing it in the daemon's config. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2008-May-21 14:25 UTC
DO NOT REPLY [Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 development--bugzilla.samba.org@medium.net changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID | ------- Comment #2 from development--bugzilla.samba.org@medium.net 2008-05-21 09:24 CST ------- (In reply to comment #1)> Bug 1959 has nothing whatsoever to do with an "io timeout". That is a value > specified either by the client or by the daemon. Since the --timeout=0 value > on the client didn't turn off timeouts, I have to imagine that you're using a > daemon,No, I'm not using a daemon. I'm using a foreground rsync command, where both the source and the destination are directories in the local filesystem.> and the daemon has a 30-second timeout specified (which the client is > not allowed to override). If so, you must specify the same (or lower, but not > 0) timeout to the client for the rsync client to be able to help out with > keep-alive activites. A timeout of 30 seconds is also rather short, so I'd > suggest increasing it in the daemon's config. >-- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2008-May-21 14:30 UTC
DO NOT REPLY [Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 ------- Comment #3 from development--bugzilla.samba.org@medium.net 2008-05-21 09:30 CST ------- My intuition says that the bug more lies in "io.c" ( http://git.samba.org/?p=rsync.git;a=blob;f=io.c;hb=73cb6738b33846130c21f2903b2200fa3f1903ab ): #define SELECT_TIMEOUT 30 ... void set_io_timeout(int secs) { io_timeout = secs; if (!io_timeout || io_timeout > SELECT_TIMEOUT) select_timeout = SELECT_TIMEOUT; else select_timeout = io_timeout; allowed_lull = read_batch ? 0 : (io_timeout + 1) / 2; } "secs" has only an effect if (secs!=0)&&(secs<=30) Thus, supplying a timeout greater 30 or a timeout of 0 actually does not have any effect. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2008-May-21 14:55 UTC
DO NOT REPLY [Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 ------- Comment #4 from wayned@samba.org 2008-05-21 09:55 CST ------- Interesting. The 30-second timeout is therefore the one turned on in the error handling to try to get a message back from the receiving side. Usually the receiving side vanishes before the end of the 30 seconds, so it is unusual to see that error in this context. You will need to diagnose why one of the receiving processes (presumably the receiver, not the generator) is vanishing. I'd suggest attaching strace (strace -o FILENAME -p PID) and/or enabling core dumps (ulimit -c unlimited). -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs@samba.org
2008-Jul-29 01:43 UTC
DO NOT REPLY [Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 ------- Comment #5 from wayned@samba.org 2008-07-28 20:43 CST ------- Any more info on this? You could try the latest 3.1.0dev version and try using the --msgs2stderr option to see if it reveals a remote error that was getting lost when things closed down. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2010-Oct-27 05:28 UTC
DO NOT REPLY [Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 tim.liim at alcatel-lucent.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tim.liim at alcatel-lucent.com ------- Comment #6 from tim.liim at alcatel-lucent.com 2010-10-27 00:28 CST ------- These bugs seem to be related (having similar error messages: connection unexpectedly closed, broken pipe, timeout). bug7757 with big file, rsync times out out when it should not; the sender is still responsive bug2783 Random high loads during syncs (server side) / client stream errors rsync: connection unexpectedly closed (2761332 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(365) bug5478 rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32) rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32) io timeout after 30 seconds -- exiting rsync error: timeout in data send/receive (code 30) at io.c(239) [sender=3.0.2] bug5695 improve keep-alive code to handle long-running directory scans ./io.c:void maybe_send_keepalive(void) bug6175 write last transfer status when timeout or other error happens rsync: writefd_unbuffered failed to write 4 bytes [sender]: Broken pipe (32) rsync: connection unexpectedly closed (99113 bytes received so far) [sender] rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.5] bug7195 timeout reached while sending checksums for very large files -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
samba-bugs at samba.org
2011-Sep-20 01:16 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 Eric Shubert <ejs at shubes.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ejs at shubes.net --- Comment #7 from Eric Shubert <ejs at shubes.net> 2011-09-20 01:16:05 UTC --- I've been running 3.0.7 setup for months with no problem, until June 2011 when this bug showed up. I subsequently upgraded to 3.0.8, but the error persists. Seems to be triggered by certain file content. The error happened most frequently with files that were individual email messages (Maildir format) containing attached pictures. When I remove the offending source file and restart, more files are processed ok until another offending file is reached. If I restart w/out removing the offending file, it always gets stuck on the same file until it's removed from the list to be processed. Then it happened with a quickbooks backup (.qbb) file as well. My guess is that there's something in the data stream that's causing something i/o wise to seize up. I have kept examples of files that cause this. I set up the rsync-debug script on the target server (not running daemon mode, just ssh), and the strace around the error at the end is such: 31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 31031 read(0, "H\10\26}\23565\6\300RV\331\23\262M\357Q\210ngTQ\373\3523\326\6\36 2\246\"\1\375"..., 4092) = 4092 31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 31031 read(0, "\374\17\0\7", 4) = 4 31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 31031 read(0, "+R~\344\374I\204\344_<\311\235\307\340iT|q\32z=\373(\243\320\332\ 32527x\336b"..., 4092) = 4092 31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 31031 read(0, "\374\17\0\7", 4) = 4 31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 31031 read(0, "\206/\227-W\370+\337\367\251\\h\247\346\330\233\356s\353\254\276\356\365a\206%wS\357\234:\377"..., 4092) = 4092 31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 31031 read(0, "\374\17\0\7", 4) = 4 31031 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 31031 read(0, "\262\231\222\225%\370\374\262.\262\21A\222V\300\336\0226\262\246\325=*\314\234\314\204Z[(\351w"..., 4092) = 4092 31031 select(1, [0], [], NULL, {60, 0} <unfinished ...> 31030 <... select resumed> ) = 0 (Timeout) 31030 select(4, [3], [], NULL, {60, 0} <unfinished ...> 31031 <... select resumed> ) = 0 (Timeout) 31031 select(1, [0], [], NULL, {60, 0} <unfinished ...> 31030 <... select resumed> ) = 0 (Timeout) 31030 select(4, [3], [], NULL, {60, 0} <unfinished ...> 31031 <... select resumed> ) = 0 (Timeout) 31031 select(1, [0], [], NULL, {60, 0} <unfinished ...> 31030 <... select resumed> ) = 0 (Timeout) 31030 select(4, [3], [], NULL, {60, 0} <unfinished ...> 31031 <... select resumed> ) = 0 (Timeout) etc. If there's anything I can do to help further to get this bug fixed, please let me know. Thanks. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Sep-20 02:12 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #8 from Tim Taiwanese Liim <tim.liim at alcatel-lucent.com> 2011-09-20 02:12:09 UTC --- Re: Comment #7> The error happened most frequently with files ... containing > attached pictures.Eric, By any chance do you consider these files "big" files? I am asking because I saw similar timeout when sending big files (bug7757, bug7195). Also in your next strace output, could you use "-tt" to show timestamps? Thanks. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Sep-20 14:42 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #9 from Eric Shubert <ejs at shubes.net> 2011-09-20 14:42:37 UTC --- (In reply to comment #8)> By any chance do you consider these files "big" files? I am asking > because I saw similar timeout when sending big files (bug7757, > bug7195).Not really. Most problem files appear to be 20-30MB, but a few are <10M and there's one which is only 25K. I'll keep this in mind though.> Also in your next strace output, could you use "-tt" to > show timestamps? Thanks.Here's one with timestamps: 1636 07:15:20.138388 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 1636 07:15:20.138646 read(0, "\374\17\0\7", 4) = 4 1636 07:15:20.138883 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 1636 07:15:20.139121 read(0, "\211\313Wq\361\315ogBR\371\346\351\327\335\r@\20\373\276\303\372<F\373\222\333\340\353\272\261\3"..., 4092) = 4092 1636 07:15:20.139466 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 1636 07:15:20.139714 read(0, "\374\17\0\7", 4) = 4 1636 07:15:20.139956 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 1636 07:15:20.140195 read(0, "\307\371e\245(K\17\351\201O\357\2139\221\322%\345)\201.\252G\364\221\326\256\306u(\2353\346"..., 4092) = 4092 1636 07:15:20.140475 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 1636 07:15:20.140716 read(0, "\374\17\0\7", 4) = 4 1636 07:15:20.140943 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 1636 07:15:20.141180 read(0, "\221\367\307\26yg0\31\t\314\250Gr\366e\301\231]\211~\253F-\320\240T\177T.\201\206\341"..., 4092) = 1223 1636 07:15:20.141502 select(1, [0], [], NULL, {60, 0} <unfinished ...> 1635 07:15:51.190750 <... select resumed> ) = 0 (Timeout) 1635 07:15:51.190975 select(4, [3], [], NULL, {60, 0} <unfinished ...> 1636 07:16:20.139240 <... select resumed> ) = 0 (Timeout) 1636 07:16:20.139459 select(1, [0], [], NULL, {60, 0} <unfinished ...> 1635 07:16:51.194224 <... select resumed> ) = 0 (Timeout) 1635 07:16:51.194447 select(4, [3], [], NULL, {60, 0} <unfinished ...> Thanks for having a look at this Tim. I'm going to put a problem/test file on the target machine's hdd and continue testing with a local2local transfer. I should have mentioned that the target in this case is a pair of USB2 HDDs in a raid-1 configuration. I seem to recall another posting of this error involved writing to a USB drive as well. I expect at this point that USB might be a contributing factor. I'm stumped though as to why it has worked fine for over year before coming across this problem. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Sep-20 15:39 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #10 from Eric Shubert <ejs at shubes.net> 2011-09-20 15:39:39 UTC --- (In reply to comment #9)> > I'm going to put a problem/test file on the target machine's hdd and continue > testing with a local2local transfer.I'm holding off on the local test for the moment. After closer look at the strace, the problem appears to be with the transport/ssh mechanism (input side on the target). Early on in the strace, after a series (10x) of 4092-byte reads, I see this: 1636 07:12:20.682841 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 1636 07:12:20.683211 read(0, "\374\17\0\7", 4) = 4 1636 07:12:20.683964 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 1636 07:12:20.684226 read(0, "\222E\23\272R\224<\256\0I`\365\177\247\204\207Jf\370\277=\\q\24q\357\202\"z\307\207H"..., 4092) = 4092 1636 07:12:20.684653 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 1636 07:12:20.685065 read(0, "\374\17\0\7", 4) = 4 1636 07:12:20.685291 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 1636 07:12:20.685627 read(0, "v5z\260\341\346\260\335w\22\311P\37|%:2N\2177\304\0264\215#\344\274P^\307u\313"..., 4092) = 1223 1636 07:12:20.685984 select(1, [0], [], NULL, {60, 0} <unfinished ...> 1635 07:12:20.686201 <... getdents64 resumed> /* 2 entries */, 32768) = 48 1635 07:12:20.687710 getdents64(0, /* 0 entries */, 32768) = 0 1635 07:12:20.687953 close(0) = 0 This looks like the same situation as the timeout, except that there is "getdents64 resumed" instead of "select resumed" after the select statement. Don't know if it's related or not. I'll keep looking to see what I can tell from the strace. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Sep-22 16:54 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #11 from Eric Shubert <ejs at shubes.net> 2011-09-22 16:54:14 UTC --- (In reply to comment #9)> (In reply to comment #8) > > By any chance do you consider these files "big" files? I am asking > > because I saw similar timeout when sending big files (bug7757, > > bug7195). > > Not really. Most problem files appear to be 20-30MB, but a few are <10M and > there's one which is only 25K. I'll keep this in mind though. >I need to change my answer here. The files that cause it to choke are larger than most, so I would have to say that as a general observation the files that fail are large(r). Not huge, but large. I also need to correct my earlier statement/thinking. While the process did fail on the same file on consecutive runs, leading me to believe that the content of the file was causing the problem, I've determined that this is not always the case. Subsequent runs have transferred these files successfully, so it's not the content of the data that's a problem. I've been making many (10s of) runs with strace on the target, and have noticed a pattern, which I'll attempt to describe (not having looked at the source code). When transmitting these large files, the maximum size read at once is 4092 bytes. In most cases of failure, there are 7 of these 4092 byte chunks, followed by 2 smaller chunks, which vary in size. In the sample below, they are 3699, then 393. Other value pairs I've seen are 3697 and 395, 2514 and 1578, 2758 and 1334, 1039 and 3053, 1085 and 3007, 4087 and 5, etc (all pair sums are 4092). These are followed by another 7 4092 byte chunks, then another 3699 and 393, and so on. When the process hangs, it always times out waiting for the first of the two smaller chunks (after 7 reads of 4092 bytes). On one failure, all reads were 4092 bytes, with no smaller chunks. In this case, the failure occurred after 7 reads of 4092 bytes which followed a write. I hope that this gives a clue as to where the problem might be. Please let me know if there's something else I can provide. I'll continue to investigate. Thanks. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Sep-22 18:35 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #12 from Eric Shubert <ejs at shubes.net> 2011-09-22 18:35:06 UTC --- (In reply to comment #11)> When the process hangs, it always times out waiting for the > first of the two smaller chunks (after 7 reads of 4092 bytes). >Correction. It times out on the select function following the successful read of the first of the two smaller chunks: 13018 10:38:47.708267 read(0, "\374\17\0\7", 4) = 4 13018 10:38:47.708522 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 13018 10:38:47.708757 read(0, "\6B#\377a\357\223\360\237\251J\2477q\216+\252s\\\363X\223\264\244pC\241y\350\321\240R"..., 4092) = 4092 13018 10:38:47.709034 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 13018 10:38:47.709264 read(0, "\374\17\0\7", 4) = 4 13018 10:38:47.709619 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 13018 10:38:47.709862 read(0, "`\337\24f\220&\0379\223\"y\340\10\247\240\3702\333\360\311e/\362$\333\215\216r\24sw\361"..., 4092) = 3699 13018 10:38:47.710137 select(1, [0], [], NULL, {60, 0} <unfinished ...> 13017 10:39:22.695271 <... select resumed> ) = 0 (Timeout) 13017 10:39:22.695484 select(4, [3], [], NULL, {60, 0} <unfinished ...> 13018 10:39:47.701076 <... select resumed> ) = 0 (Timeout) 13018 10:39:47.701266 select(1, [0], [], NULL, {60, 0} <unfinished ...> 13017 10:40:22.696550 <... select resumed> ) = 0 (Timeout) This is what happens in the normal flow: 13018 10:38:47.498236 read(0, "\374\17\0\7", 4) = 4 13018 10:38:47.498460 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 13018 10:38:47.498710 read(0, "\266\222\6\301\257q\224\0)\302\363R\273\rJ\264g\235\"0\230\311sq\277\243\260\277\322\32\313\213"..., 4092) = 4092 13018 10:38:47.499589 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 13018 10:38:47.499852 read(0, "\374\17\0\7", 4) = 4 13018 10:38:47.500054 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 13018 10:38:47.500278 read(0, "z\33\0\276\304\177o\246\274\225gH\30\271|\317\365(h\234\314)\223\306\260?\370\274\355\272@\21"..., 4092) = 3699 13018 10:38:47.500599 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 810000}) 13018 10:38:47.698206 read(0, "\274&x\365\1\247\252Y\177\7e\17\227\325\207BL\177{\354_\317\2655\352~\20\303&\240\226\303"..., 393) = 393 1 I'm presently attempting to capture a corresponding strace on the source side to see what's going on there. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Sep-22 19:25 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #13 from Eric Shubert <ejs at shubes.net> 2011-09-22 19:25:19 UTC --- (In reply to comment #12)> I'm presently attempting to capture a corresponding strace on the source side > to see what's going on there.Here are matching straces from the sending and receiving sides. First the receiving (time on this host is slightly ahead): 13265 12:01:56.418051 read(0, "\374\17\0\7", 4) = 4 13265 12:01:56.418251 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 13265 12:01:56.418544 read(0, "\341\2373}P\226ZL\316n\273\220\344\246\r\n\215r\245I\241Z\351\225\273t\224\302p\375\375\316"..., 4092) = 4092 13265 12:01:56.418855 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 13265 12:01:56.419119 read(0, "\374\17\0\7", 4) = 4 13265 12:01:56.419319 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 13265 12:01:56.419564 read(0, "vmb/\216L\360\227\343\31&\240x\306[T\376Dx-\354\360v0\263\f0\266#\305b\363"..., 4092) = 4092 13265 12:01:56.421406 select(1, [0], [], NULL, {60, 0} <unfinished ...> 13264 12:02:55.255790 <... select resumed> ) = 0 (Timeout) 13264 12:02:55.256043 select(4, [3], [], NULL, {60, 0} <unfinished ...> 13265 12:02:56.421901 <... select resumed> ) = 0 (Timeout) 13265 12:02:56.422114 select(1, [0], [], NULL, {60, 0} <unfinished ...> 13264 12:03:55.257239 <... select resumed> ) = 0 (Timeout) 13264 12:03:55.257426 select(4, [3], [], NULL, {60, 0} <unfinished ...> 13265 12:03:56.420390 <... select resumed> ) = 0 (Timeout) Now the sending: 2611 12:01:55.348752 write(4, "\374\17\0\7", 4) = 4 2611 12:01:55.348862 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0}) 2611 12:01:55.348983 write(4, "\17m\304\365\313M\344:\26\253Ogu\311\200Q\2\303\254\373\233T<;\221\351\375\207\264\305\365z"..., 4092) = 4092 2611 12:01:55.349115 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0}) 2611 12:01:55.349236 write(4, "\374\17\0\7", 4) = 4 2611 12:01:55.349344 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0}) 2611 12:01:55.349465 write(4, "\10\2074\tC\36\336r\364\233\0_\336]\311\17\"N\352\22\241O\2450\250\271\305\204Q\202\316\344"..., 4092) = 4092 2611 12:01:55.353308 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0}) 2611 12:01:55.353471 write(4, "\374\17\0\7", 4) = 4 2611 12:01:55.353586 select(5, NULL, [4], [4], {60, 0} <unfinished ...> 2612 12:01:56.089897 <... select resumed> ) = 1 (out [3]) 2612 12:01:56.090021 write(3, "\326\n\324q\327dR\6\215\201?\227\234\217K:\221c\336\204|@\273\2431f9.\17KTr"..., 158192) = 97920 2612 12:01:56.091620 select(7, [3], [3], NULL, NULL <unfinished ...> 2611 12:02:55.349271 <... select resumed> ) = 0 (Timeout) 2611 12:02:55.349381 select(5, NULL, [4], [4], {60, 0}) = 0 (Timeout) 2611 12:03:55.347803 select(5, NULL, [4], [4], {60, 0}) = 0 (Timeout) 2611 12:04:55.346412 select(5, NULL, [4], [4], {60, 0} <unfinished ...> 2612 12:05:06.185778 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted) 2611 12:05:06.185871 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted) 2612 12:05:06.185926 --- SIGINT (Interrupt) @ 0 (0) --- 2611 12:05:06.186867 --- SIGINT (Interrupt) @ 0 (0) --- 2612 12:05:06.187011 sigreturn( <unfinished ...> I don't know what's supposed to be happening here, so I'll be grateful for any pointers. I suspect at this point that the problem is with the sending side, but that's not saying much. Thanks. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Sep-23 02:19 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #14 from Wayne Davison <wayned at samba.org> 2011-09-23 02:19:04 UTC --- It's a bit unclear what pid corresponds to what function, especially with both sides having 2 pids in the trace output. What is the command-line used? So, looking at your trace, it looks like the sender is waiting to write more data (its select lists the write and error FDs). The receiving side's 2 pids are waiting to read more data. So, something in between has either failed to deliver the data, or blocked, or something (it would appear). Are there other processes in between the 2 sides (e.g. ssh) that could have blocked? Or perhaps a failed pipe (or unix domain socket)? -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Sep-23 15:55 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #15 from Eric Shubert <ejs at shubes.net> 2011-09-23 15:55:27 UTC --- (In reply to comment #14)> It's a bit unclear what pid corresponds to what function, especially with both > sides having 2 pids in the trace output. What is the command-line used?strace -tt -f -o /tmp/rsync-$$.out rsync \ --verbose \ --exclude "stor/hosts" \ --exclude "var/run" \ --exclude "var/spool" \ --rsh="ssh -p $backoff_port" \ --rsync-path="sudo /usr/local/bin/rsync-debug" \ --recursive \ --links \ --perms \ --times \ --owner \ --group \ --numeric-ids \ --xattrs \ --delete \ --compress \ --human-readable \ --stats \ --exclude='udat/backup*/*' \ --exclude='udat/migrate/*' \ /mnt/stor \ $userid@$backoff_url:$destdir 2>&1 As you can see, it's pushing from the local host via ssh to the remote host.> So, looking at your trace, it looks like the sender is waiting to write more > data (its select lists the write and error FDs). The receiving side's 2 pids > are waiting to read more data. So, something in between has either failed to > deliver the data, or blocked, or something (it would appear). Are there other > processes in between the 2 sides (e.g. ssh) that could have blocked? Or > perhaps a failed pipe (or unix domain socket)?Here is another run, with processes. Both sides are rsync-3.0.8-1.el5.rfx. First the sending side processes: root 4921 4914 5 07:38 tty1 00:00:13 strace -tt -f -o /tmp/rsync-4914.out rsync --verbose --exclude stor/hosts --exclude var/run --exclude var/spool --rsh=ssh -p 17113 --rsync-path=sudo /usr/local/bin/rsync-debug --recursive --links --perms --times --owner --group --numeric-ids --xattrs --delete --compress --human-readable --stats --exclude=udat/backup*/* --exclude=udat/migrate/* /mnt/stor r3i at backoff.shubes.net:/home/r3i/backup/tacs-udat root 4923 4921 7 07:38 tty1 00:00:17 rsync --verbose --exclude stor/hosts --exclude var/run --exclude var/spool --rsh=ssh -p 17113 --rsync-path=sudo /usr/local/bin/rsync-debug --recursive --links --perms --times --owner --group --numeric-ids --xattrs --delete --compress --human-readable --stats --exclude=udat/backup*/* --exclude=udat/migrate/* /mnt/stor r3i at backoff.shubes.net:/home/r3i/backup/tacs-udat root 4924 4923 0 07:38 tty1 00:00:01 ssh -p 17113 -l r3i backoff.shubes.net sudo /usr/local/bin/rsync-debug --server -vlogtpXrze.isf --delete --numeric-ids . /home/r3i/backup/tacs-udat The sending side strace: 4923 07:47:21.380396 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0}) 4923 07:47:21.380537 write(4, "4\25\t\20\342Q\353\177A\0039rC\272\301\372\335\364\17?\313g\241\215\313\370%\222s\367F|"..., 4092) = 4092 4923 07:47:21.380683 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0}) 4923 07:47:21.380806 write(4, "\374\17\0\7", 4) = 4 4923 07:47:21.380914 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0}) 4923 07:47:21.381034 write(4, "\351\4n\317\346;f\321Pn>~[\333\337\200\214#\33\233\365\3\5\322P\1\221\364\255\230<\16"..., 4092) = 4092 4923 07:47:21.382583 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0}) 4923 07:47:21.382777 write(4, "\374\17\0\7", 4) = 4 4923 07:47:21.382897 select(5, NULL, [4], [4], {60, 0} <unfinished ...> 4924 07:47:21.499524 <... select resumed> ) = 1 (out [3]) 4924 07:47:21.499685 write(3, "D\3143?I\322+\362S\231\311Xe\302\264\1\235|R\231Wyw\214\302\211\206\365^\337;\24"..., 131552) = 66240 4924 07:47:21.500711 select(7, [3], [3], NULL, NULL) = 1 (out [3]) 4924 07:47:22.518796 write(3, "\350l\250\255\361YB\241`__\302oM\333\346\247\223\tdcM\203O\217aA\260\t\377gm"..., 98128) = 54720 4924 07:47:22.519874 select(7, [3], [3], NULL, NULL <unfinished ...> 4923 07:48:21.377865 <... select resumed> ) = 0 (Timeout) 4923 07:48:21.377989 select(5, NULL, [4], [4], {60, 0}) = 0 (Timeout) 4923 07:49:21.379500 select(5, NULL, [4], [4], {60, 0}) = 0 (Timeout) 4923 07:50:21.378390 select(5, NULL, [4], [4], {60, 0}) = 0 (Timeout) The receiving side processes: root 16837 16836 0 07:39 ? 00:00:00 /bin/sh /usr/local/bin/rsync-debug --server -vlogtpXrze.i --delete --numeric-ids . /home/r3i/backup/tacs-udat root 16848 16837 18 07:39 ? 00:01:03 strace -tt -f -o /tmp/rsync-16837.out rsync --server -vlogtpXrze.i --delete --numeric-ids . /home/r3i/backup/tacs-udat root 16849 16848 15 07:39 ? 00:00:51 rsync --server -vlogtpXrze.i --delete --numeric-ids . /home/r3i/backup/tacs-udat root 16850 16849 1 07:39 ? 00:00:04 rsync --server -vlogtpXrze.i --delete --numeric-ids . /home/r3i/backup/tacs-udat The receiving side strace: 16850 07:47:22.332291 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 16850 07:47:22.332556 read(0, "\374\17\0\7", 4) = 4 16850 07:47:22.332808 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 16850 07:47:22.333036 read(0, "&\f]c\254\306\20\326mK\0\r\365\36k\236IQ\365l\23\337Ne\322p\361\207\237\320'\257"..., 4092) = 4092 16850 07:47:22.333328 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 16850 07:47:22.333664 read(0, "\374\17\0\7", 4) = 4 16850 07:47:22.333867 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 16850 07:47:22.334091 read(0, "\212N@\347\31\261\\\4\271g\216\nQ;\331C\34)\365\321\205\363CW\306\313;\322\0226\242\342"..., 4092) = 4092 16850 07:47:22.335051 select(1, [0], [], NULL, {60, 0} <unfinished ...> 16849 07:48:11.629178 <... select resumed> ) = 0 (Timeout) 16849 07:48:11.629374 select(4, [3], [], NULL, {60, 0} <unfinished ...> 16850 07:48:22.335875 <... select resumed> ) = 0 (Timeout) 16850 07:48:22.336060 select(1, [0], [], NULL, {60, 0} <unfinished ...> Thanks for looking at this. Please let me know what I can do next to help nail this down. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Sep-28 20:40 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #16 from Eric Shubert <ejs at shubes.net> 2011-09-28 20:40:15 UTC --- I've done some further testing and would like to report my findings. I tried the rsync to a different target host on a different IP address via a different ISP, and lo and behold, it worked. I conclude from this that the problem is outside of the source host, either in the network transport somewhere or in the the target host. I have a quickbooks backup (.qbb) file that fails pretty regularly, so I took a closer look at the strace on the target host for this file. On one failure, I get: 3060 19:43:19.737510 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 3060 19:43:19.737768 read(0, "\252\220\346\2\f\216\t3\\\256(\t\202\3\222\211\16\0246\310H\1\342\202\203\2\374\346\0273l\317"..., 4092) = 4092 3060 19:43:19.738449 select(1, [0], [], NULL, {60, 0} <unfinished ...> 3059 19:43:33.847680 <... select resumed> ) = 0 (Timeout) 3059 19:43:33.847918 select(4, [3], [], NULL, {60, 0} <unfinished ...> 3060 19:44:19.730067 <... select resumed> ) = 0 (Timeout) On a subsequent run a few minutes later, I also get a failure, but approximately 3 seconds later. Curiously, I see this in the 2nd run at the point of failure in the first run (notice the identical data): 3170 20:08:03.378981 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 3170 20:08:03.379237 read(0, "\252\220\346\2\f\216\t3\\\256(\t\202\3\222\211\16\0246\310H\1\342\202\203\2\374\346\0273l\317"..., 4092) = 4092 3170 20:08:03.379744 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {59, 640000}) 3170 20:08:03.728257 read(0, "\374\17\0\7", 4) = 4 The 2nd run continues for 3 seconds, then times out like the previous run: 3170 20:08:07.624052 select(1, [0], [], NULL, {60, 0}) = 1 (in [0], left {60, 0}) 3170 20:08:07.624424 read(0, "\3,\234\37J\341\374\240\v\347\227W\30\277\220\342\377\222\v\347\27W\30\277\214\342\377\202\v\347\247\366"..., 4092) = 4092 3170 20:08:07.625000 select(1, [0], [], NULL, {60, 0} <unfinished ...> 3169 20:08:48.205338 <... select resumed> ) = 0 (Timeout) 3169 20:08:48.205547 select(4, [3], [], NULL, {60, 0} <unfinished ...> 3170 20:09:07.627769 <... select resumed> ) = 0 (Timeout) I thought perhaps the "left {59, 640000}" was a clue given that it happened at the exact same place as a previous failure, but I see the same sort of thing after every 8 reads, so I don't expect it's indicative of a problem. So now I'm beginning to think that the problem is (indeed) outside of the rsync code. I've examined the 2 target hosts (one that worked and one that doesn't), and tried to determine "what's different?". These hosts are all VMs that are running under VMware Server2. The host that worked is running CentOS 5.6, while the one that's failing is running CentOS 5.4 (450 days uptime). I'm going to upgrade the failing host OS to CentOS 5.7 and see if that changes anything. I know that some versions of glibc (> 2.5-34, < 2.5.58) had mysterious problems with VMware Server 2 (guests terminating for no apparent reason). I'm wondering if there's something else in glibc that's perhaps having a problem with VMware Server 2. (I realize that Server 2 is essentially dead and will be migrating to KVM in the near future, although Server2 has been quite stable for me.) I'll report my findings when the upgrade is complete. Any pointers toward how to troubleshoot this thing would be more than welcome. TIA. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Sep-29 17:16 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #17 from Eric Shubert <ejs at shubes.net> 2011-09-29 17:16:38 UTC --- (In reply to comment #16)> > So now I'm beginning to think that the problem is (indeed) outside of the rsync > code. I've examined the 2 target hosts (one that worked and one that doesn't), > and tried to determine "what's different?". These hosts are all VMs that are > running under VMware Server2. The host that worked is running CentOS 5.6, while > the one that's failing is running CentOS 5.4 (450 days uptime). I'm going to > upgrade the failing host OS to CentOS 5.7 and see if that changes anything. I > know that some versions of glibc (> 2.5-34, < 2.5.58) had mysterious problems > with VMware Server 2 (guests terminating for no apparent reason). I'm wondering > if there's something else in glibc that's perhaps having a problem with VMware > Server 2. (I realize that Server 2 is essentially dead and will be migrating to > KVM in the near future, although Server2 has been quite stable for me.) >I've upgrade the target host, and still no dice. Same failure. It seems odd to me that the transmission fails at different points in the process, anywhere from 3.5 minutes to 14.75 minutes. I'm presently thinking that the problem is not with rsync itself, but with some aspect of the network connection. I'm seeing no corresponding ethernet errors on either end though, and the network seems to be fine otherwise. I'd have to say at this point that the failures appear to be happening only with large files. I could sure use some pointers with identifying what the cause of this failure is. I have a test file that fails regularly (if not consistently), and should be able to tell easily when it's fixed. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Sep-29 20:39 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #18 from Eric Shubert <ejs at shubes.net> 2011-09-29 20:39:54 UTC --- (In reply to comment #17)> > I'm presently thinking that the problem is not with rsync itself, but with some > aspect of the network connection. I'm seeing no corresponding ethernet errors > on either end though, and the network seems to be fine otherwise. > > I'd have to say at this point that the failures appear to be happening only > with large files. >I've (re)read through all 5 bug reports that Tim identified. A general consensus is that this is happening with relatively large files, and relatively slow writes (ie loaded targets). Upon closer examination of a pair of straces, I see this on the target side: 6814 11:34:25.018774 read(0, "\3037sE\303{\264\16\377\333\357\375\7\274}1c\21\306\377\215\256\202\7\376\245\26\24\302{\16c"..., 4092) = 4092 6814 11:34:25.020201 select(1, [0], [], NULL, {60, 0} <unfinished ...> 6814 11:35:25.009953 <... select resumed> ) = 0 (Timeout) I found the corresponding block of data in the strace on the source side: 940 11:33:57.494543 write(4, "\3037sE\303{\264\16\377\333\357\375\7\274}1c\21\306\377\215\256\202\7\376\245\26\24\302{\16c"..., 4092 <unfinished ...> 940 11:33:57.494681 <... write resumed> ) = 4092 I had previously set both hosts to use ntp with the same time server, so their times should be very close. In the source side strace, I counted ~510 (successful) writes of data between the one listed above, and the point at which the source host timed out: 940 11:34:25.073394 select(5, NULL, [4], [4], {60, 0}) = 1 (out [4], left {60, 0}) 940 11:34:25.073514 write(4, "\374\17\0\7", 4) = 4 940 11:34:25.073620 select(5, NULL, [4], [4], {60, 0}) = 0 (Timeout) Notice, the last successful read on the target side happened .06 seconds before the last write on the source side, which is pretty much at the same time. At this point, there were ~510 x 4K blocks of data (or ~2MB) "in the pipe", that had been written, but not read. So I'm wondering, is it possible that "the pipe" (buffers) get filled up, and when they do, that the select function is unable to handle it? I don't (yet) know how these things work, so I could be totally off base. At the same time, this would seem to explain things. I'll continue to poke at this to see what I can come up with. In the meantime, any feedback from Wayne and Tim will be appreciated. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Oct-06 15:28 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #19 from Eric Shubert <ejs at shubes.net> 2011-10-06 15:28:32 UTC --- (In reply to comment #14)> So, looking at your trace, it looks like the sender is waiting to write more > data (its select lists the write and error FDs). The receiving side's 2 pids > are waiting to read more data. So, something in between has either failed to > deliver the data, or blocked, or something (it would appear). Are there other > processes in between the 2 sides (e.g. ssh) that could have blocked? Or > perhaps a failed pipe (or unix domain socket)?After further testing, I'm tending to agree that the problem lies outside of rsync. Somehow, the select() is getting stuck (not receiving a signal from the kernel?). I'm not yet certain which side gets stuck first, but I would guess that it's the target, given that in one test there was about 2MB of data "in the pipe", that had been written by the source but not read yet by the target. Working on the idea that the failure happens when the pipe is saturated, I decided to try the bwlimit option to slow things down a bit. The upload speed at the source is limited to about 100KBPS (DSL, shared with other users), so I tried bwlimit=64. This worked noticeably better, although it also failed after a few hours. There may have been usage spikes that maxed out the available bw at the time, but I can't tell if that happened or not. So I reduced it to bwlimit=32, and it's been running for 7 hours now without failure (hasn't finished yet, but I'm optimistic). So I'm inclined to think that this problem lies in the transport somewhere. It also appears to come to light when the transport is saturated. Does anyone have any ideas regarding how to go about tracking this down further, or where to go for further help? While this bug is definitely affecting rsync, I think the solution is going to end up being elsewhere, in glibc or the kernel perhaps. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2011-Oct-07 21:57 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #20 from Tim Taiwanese Liim <tim.liim at alcatel-lucent.com> 2011-10-07 21:57:54 UTC --- I agree with Wayne and Eric that Eric's issue is outside of rsync, somewhere in the transport. Eric, Have you tried to check the TCP buffers of the ssh process on both ends? For example, p=192.168.51.98:22 while true; do date; netstat -tn | grep $p; sleep 1; done Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 19328 192.168.51.98:22 192.168.51.51:53010 ESTABLISHED In this example, the sender has 19328 bytes in its TCP sending buffer. You can also use tcpdump and wireshark to graph how well the tcp pipe goes: # catch 100 bytes of each packet on eth0, write to t.pcap. We # need only first 100 bytes because we care only about the TCP # sequence number, but not the actual file content. # need root access to sniff packets. tcpdump -i eth0 -s 100 -w t.pcap then feed the trace to wireshark: wireshark t.pcap # then select menu "Statistics" --> TCP Stream Graph --> # Time-Sequence Graph (Stevens) With this you can visualize how the TCP flow goes (smooth or stalled or fluctuates or excessive retries). With tcpdump from both ends, you can also check for lost packets. (A few years ago I worked on a case of stalled ssh; turned out the NIC firmware was defective, causing excessive packet loss) Comparison between the two targets (good and bad one) may show the difference. Do your two targets run on the same host machine? Or host machines of the same configuration (same NIC etc)? Could one has bad NIC (eg. working but excessive packet loss in bursts)? As Wayne pointed out, there is pipe (or unix domain socket) between rsync and ssh as well. I don't know how to track the queue size in pipe yet, so let's track the known ones (TCP) first.> Notice, the last successful read on the target side happened .06 > seconds before the last write on the source side, which is pretty > much at the same time.This is an important clue, although I don't know what to make out of it yet; maybe a few lost tcp acks in a row? How did your "bwlimit=32" test go? BTW, I am not rsync developer; I do use rsync a lot and tried to help when possible, so please do not take my response as official. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2012-Jul-13 19:14 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #21 from Martin Scherbaum <samba at scherbaum.org> 2012-07-13 19:14:03 UTC --- Hello together, although the last posting has been already done a while ago, out of respect for the meticulous work of Eric (and Tim and Wayne) - which was a pleasure to read for me - I'd like to add an observation I made for the very same strange rsync behaviour. I tried to rsync some 225GB of image data (mainly 8 to 12Mpix JPG files) from my notebook (openSUSE 11.4) over a WLAN (ca. 120MBit raw) to a RaidSonic NAS (connected to my router with 100Mbit cable, has officially a 1GBit interfacw) which writes the data to a RAID1 configuration of 3.5" hard drives. So, no real large files. But a not so fast NAS. It hung up again and again with the above mentioned message "rsync: writefd_unbuffered failed to write 4092 bytes [sender..." for different files. After all the discussion of network matters that I saw from your thread I thought (as I'm working in my LAN) I could turn of openSUSE's firewall for a try. And: since that moment everything runs fine! And even fast. rsync doesn't stall and files arrive properly, independent from size (I also got some larger files). Although I'm not able to make a deep analysis as the writers before, I hope this observation helps one or the other who googles for the error message... -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2012-Oct-01 10:20 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #22 from Gaute Hope <eg at gaute.vetsj.com> 2012-10-01 10:20:42 UTC --- (In reply to comment #21)> After all the discussion of network matters that I saw from your thread I > thought (as I'm working in my LAN) I could turn of openSUSE's firewall for a > try. And: since that moment everything runs fine! And even fast. rsync doesn't > stall and files arrive properly, independent from size (I also got some larger > files).I experienced a plethora of errors which all seemed to be related to this one. The server is running on a router and when I turned off the firewall and addressed it on its LAN interface (assuming the outside address also resulted in the rsync connection, through ssh, passing through the firewall) it has been working nicely. If I can address it on LAN and turn the firewall back on everything should be fine. I noticed that when the error happens client side rsync hangs for the timeout and when checking the remote, before timeout, there is no rsync process running. Perhaps the connection is severed by the firewall. I assume rsync has no mechanism for 'retrying' the connection without restarting the entire process? - Gaute -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2012-Oct-22 16:23 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #23 from Chip Schweiss <chip at innovates.com> 2012-10-22 16:23:10 UTC --- I came here when I was experiencing this problem. I later found the problem. If you are running on Redhat/Centos 5/6, there is a long standing bug in the kernel distributed by Redhat that will cause these broken pipes when rsyncing over a latent network. There are ack packets that are getting rejected frequently, that should be allow via connection tracking. Insert a rule before the reject line on the input table: iptables -I {line} -p tcp --tcp-flags SYN,RST,ACK,FIN ACK -j ACCEPT do an 'iptables -L -n -v --line-numbers' to figure out which line to insert at. This affects all TCP connections not just rsync over ssh. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2012-Oct-23 11:24 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #24 from Lo?c Gomez <samba-bugz at kyoshiro.org> 2012-10-23 11:24:52 UTC --- I also had the same issue and it was(In reply to comment #11)> I need to change my answer here. The files that cause it to choke are larger > than most, so I would have to say that as a general observation the files that > fail are large(r). Not huge, but large. > > I also need to correct my earlier statement/thinking. While the process did > fail on the same file on consecutive runs, leading me to believe that the > content of the file was causing the problem, I've determined that this is not > always the case. Subsequent runs have transferred these files successfully, so > it's not the content of the data that's a problemI ran into a similar issue recently while transferring large files (>40GB). After a few tests, it seems - in my case at least - to be related to the delta-xfer algorithm. The bug does not happen anymore with the -W option. I don't know if this will resolve your issue, but you can also try looking into these options : --no-checksum --no-compress --blocking-io. These were not the source of my problems, but the functions they're related to might rise a network timeout. I hope it helps, anyways, good luck solving your issue. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2012-Oct-31 01:32 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #25 from Justin Patrin <papercrane at reversefold.com> 2012-10-31 01:32:36 UTC --- FWIW, this problem went away for me by just removing -z and adding --no-compress. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2012-Nov-01 19:40 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #26 from Justin Patrin <papercrane at reversefold.com> 2012-11-01 19:40:37 UTC --- Correction: this problem *mostly* went away by turning off compression. I still get this error, 100% reproducable right around the same point in certain large file transfers. Adding --inplace --partial allowed me to keep retrying the rsync and have it eventually finish. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2012-Nov-02 09:06 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 Sebastian <shop at open-t.co.uk> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |shop at open-t.co.uk --- Comment #27 from Sebastian <shop at open-t.co.uk> 2012-11-02 09:06:27 UTC --- I have a number of setups running backups using rsync to either usb memory sticks or usb external hard drives. I have only encountered this problem on the sites which use usb memory sticks. Would this be related to the sender not being able to cope properly with the slower speed of the receiver - governed by the slower write speed of usb memory sticks? Anyway, after trying all sorts of things, what has cured it for me is using "--bwlimit". So far, with a value of 1000 to 2000 (KBytes/second) it has stopped the errors completely. Of course, this slows down the backup dramatically - but I've got all night :-) I haven't tested to see how far I can push the "--bwlimit" before the errors show up again. I use the following options "--delete --recursive --timeout=60 --max-delete=200 --links --perms --owner --group" Hope the above helps somebody. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2013-May-22 03:43 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 Scott Wood <woodystrash at hotmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |woodystrash at hotmail.com --- Comment #28 from Scott Wood <woodystrash at hotmail.com> 2013-05-22 03:43:41 UTC --- I believe I am a recent victim of this bug. I am doing an rsync via ssh of large files (from 4GB, up to a couple hundred GB). I have the -W and have disabled compression on the file type that is failing most often. Here is a rough example of the rsync push command: rsync -avW --timeout=0 --rsync-path='/bin/rsync_3.0.9' --skip-compress=ext -e "ssh -c arcfour -i /home/user/.ssh/CUSTOM_id_rsa" /path/to/bigfile.ext remoteuser at theserver:/path/to/destination And the connection is terminated with an ACK,RST packet from the server resulting in the following on the client: Read from remote host theserver: Connection reset by peer rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32) rsync: connection unexpectedly closed (28 bytes received so far) [sender] rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9] I have reproduced the result with a multitude of versions, including 3.0.4, 3.0.7 and 3.0.9 on the client side, and 3.04 and 3.09 on the server side. I have the following questions: 1) Chip Scwheiss, re: comment 23. Do you have a RHEL bug reference so I can see if this is resolved, or pursue it further with them under our support contract? 2) What is the consensus on the security implications and effectiveness of Chip's iptables suggestion? 3) If it boils down to needing to --bwlimit, does anyone have any suggestions other than trial and error to calculate the maximum it should be able to handle? As we have many TB to move, this would not be the ideal solution. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2013-May-22 06:05 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #29 from Scott Wood <woodystrash at hotmail.com> 2013-05-22 06:05:43 UTC --- As a follow up to my second question, we added the firewall rules to allow SYN,RST,ACK,FIN and ACK traffic form the server in question and it did not solve the problem. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2013-May-22 07:02 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #30 from Lo?c Gomez <samba-bugz at kyoshiro.org> 2013-05-22 07:02:09 UTC --- Did you try with one of these options : --no-checksum --no-compress --blocking-io ? Just a check since you have a 4G limit : I suppose you're not rsync-ing to a FAT32 filesystem ? -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2013-May-23 02:21 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #31 from Scott Wood <woodystrash at hotmail.com> 2013-05-23 02:21:10 UTC --- (In reply to comment #30)> Did you try with one of these options : --no-checksum --no-compress > --blocking-io ? > > Just a check since you have a 4G limit : I suppose you're not rsync-ing to a > FAT32 filesystem ?Hey Lo?c, I haven't tried the --no-ckecksum or --blocking-io yet. I'll give them a whirl. I have tried with the --no-compress, though and that did not resolve the matter. I'll let you know how it works with combinations of --no-ckecksum and --blocking-io. Thanks! Regards, Scott -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2013-May-23 03:50 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #32 from Scott Wood <woodystrash at hotmail.com> 2013-05-23 03:50:20 UTC --- I tried the --no-ckecksum or --blocking-io, both to no avail. I now have an admin at the other end of the connection onboard for troubleshooting. If/when we find the cause, I'll update the list. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2013-Jul-10 19:26 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #33 from michaeljoya <login at michaeljoya.com> 2013-07-10 19:26:56 UTC --- I got this error while creating a backup copy between two drives on the same machine. It seems that most people on this list are wrongly attributing the problem to differing rsync versions on client and server, but it happens in the context of a single machine. In my own case, it turned out that the destination filesystem was simply full. I had used an absolute path instead of a relative path in the destination, and I was working in a systemrescue environment whose root fs was completely in memory. So, when the destination fs decides that it can't accept any more writes, it will also produce this error. When I fixed the path to write to the correct path with a greater capacity, the problem immediately went away. -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
samba-bugs at samba.org
2018-Apr-17 16:47 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
https://bugzilla.samba.org/show_bug.cgi?id=5478 --- Comment #34 from David Nelson <oertls98 at gmail.com> --- FYI. Although when rsync ver. 3.0.9 is called in Ubuntu 12.04LTS to "push" files to the server, e.g., rsync <pathtobackup>/ server::Backups/ it fails with the error: rsync error: error in rsync protocol data stream (code 12) at io.c(441) [sender=3.0.9 [NB server rsync version is 3.1.0 running in Ubuntu 14.04LTS] However, curiously, when issued as a "pull" from the server as: rsync client::Sharetobackup/<pathto>/Backups/ it succeeds. Furthermore, pushes from the client work when issued one-letter-as-a-time; e.g., for i in {A..Z}; do \ rsync <pathtobackup>/${i}* server::Backups/ \ done I discovered this while trying to isolate which, if any, of the files were causing it to choke. Turns out none of them were, but identifying them all with a wildcard * was. -- You are receiving this mail because: You are the QA Contact for the bug.
Dave Gordon
2018-May-16 18:45 UTC
[Bug 5478] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Broken pipe (32)
On 17/04/18 17:47, just subscribed for rsync-qa from bugzilla via rsync wrote:> https://bugzilla.samba.org/show_bug.cgi?id=5478 > > --- Comment #34 from David Nelson <oertls98 at gmail.com> --- > FYI. Although when rsync ver. 3.0.9 is called in Ubuntu 12.04LTS to "push" > files to the server, e.g., > > rsync <pathtobackup>/ server::Backups/ > > it fails with the error: > > rsync error: error in rsync protocol data stream (code 12) at io.c(441) > [sender=3.0.9 [NB server rsync version is 3.1.0 running in Ubuntu 14.04LTS] > > However, curiously, when issued as a "pull" from the server as: > > rsync client::Sharetobackup/<pathto>/Backups/ > > it succeeds. > > Furthermore, pushes from the client work when issued one-letter-as-a-time; > e.g., > > for i in {A..Z}; do \ > rsync <pathtobackup>/${i}* server::Backups/ \ > done > > I discovered this while trying to isolate which, if any, of the files were > causing it to choke. Turns out none of them were, but identifying them all with > a wildcard * was.Hi, did you mean to add this comment to Bug 5478, which had been closed for ~5 years? It looks like it should belong to some other (more recently active) bug - maybe 2957, 12732, or 13109? .Dave.