I am running the following rsync command to synchronize directories between two servers: rsync -axvz --delete-after -e ssh /SRCDIR/ blabla at DEST:/DESTDIR The transfer starts and after a short while it appears to hang after some files have been transferred. The process establishes connection on both sides so I did an strace from the remote machine (using the rsync-debug script as described in the troubleshooting procedures). The strace output is here http://www.adrive.com/public/142cbf351c4b73a47c6e54ec3302b856041957d612ca68f440d16eccf950225c.html rsync --version on both machines shows 'rsync version 2.6.9 protocol version 29' and both machines are running Debian. I would appreciate some ideas on where to look next. Thanks in advance, Dimitar Dimitrov The content of this e-mail and accompanying communications and attachments (collectively, this ?e-mail?) are confidential to Markit Group Holdings Limited, its subsidiaries and affiliates (collectively, ?Markit?) and may contain information which is legally privileged or protected from disclosure under applicable law or agreement. This email may be read and used only by the intended recipient, and any disclosure, printing, copying, distribution (including forwarding), use, saving, or taking any action based on, the information contained herein (including any reliance thereon) is expressly prohibited. If you received this email in error, please contact the sender immediately by return e-mail or by telephoning +44 20 7260 2000 and delete it. You agree to take full responsibility for checking this email for viruses, and Markit shall not be responsible or liable for any damages arising from or relating to its use. Markit reserves the right to monitor all e-mail communications through its networks. Markit makes no warranty as to the accuracy or completeness of this email and hereby disclaims any liability of any kind for the information contained herein. Any opinions expressed in this email are those of the author and do not necessarily reflect the opinions of Markit. For full details about Markit, its offerings and legal terms and conditions, please see Markit's website at http://www.markit.com <http://www.markit.com/> . -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20091211/edefb213/attachment.html>
On Fri, 2009-12-11 at 08:58 +0100, Dimitar Dimitrov wrote:> I am running the following rsync command to synchronize directories > between two servers: > rsync -axvz --delete-after -e ssh /SRCDIR/ blabla at DEST:/DESTDIR > > The transfer starts and after a short while it appears to hang after > some files have been transferred. The process establishes connection > on both sides so I did an strace from the remote machine (using the > rsync-debug script as described in the troubleshooting procedures). > The strace output is here > http://www.adrive.com/public/142cbf351c4b73a47c6e54ec3302b856041957d612ca68f440d16eccf950225c.html > > rsync --version on both machines shows 'rsync version 2.6.9 protocol > version 29' and both machines are running Debian. > > I would appreciate some ideas on where to look next.I don't know about the hang. I see rsync working for a few minutes and then exiting when the receiver encounters an error: write(1, "\34\0\0\10inflate (token) returned -5\n", 32) = 32 This is the "compression error" that has occasionally come up before: http://lists.samba.org/archive/rsync/2004-December/011119.html http://lists.samba.org/archive/rsync/2006-May/015621.html http://lists.samba.org/archive/rsync/2008-September/021668.html http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=528730 And after some study, I think I figured out why it is happening. The error message comes from "see_deflate_token" in token.c, and the code -5 is Z_BUF_ERROR, meaning that the inflate call didn't make progress. But a CHUNK_SIZE output buffer is always provided, and input is provided unless len == 0 at the start of the loop. If len == 0, the loop would have exited unless the previous "inflate" filled the output buffer, in which case we want to call it again to obtain any remaining output. But if the data block was exactly CHUNK_SIZE (32816), it would fill the output buffer with nothing remaining, and the next call to "inflate" would return Z_BUF_ERROR. This case is in fact mentioned in the zlib FAQ (http://www.zlib.net/zlib_faq.html#faq05): "A Z_BUF_ERROR may in fact be unavoidable depending on how the functions are used, since it is not possible to tell whether or not there is more output pending when strm.avail_out returns with zero." The block size is indeed 32816, as one can see in the second 32-bit field of the sum head. The sum head is shown in the following line after two 6-byte itemizations: [pid 5213] write(1, "\252\26\0\0\10\0\262\26\0\0\f\2002\200\0\0000\200\0\0\3"..., 4092) = 4092 Here's a simple script to reproduce the problem: #!/bin/bash head -c 32816 /dev/zero >srcfile cp srcfile destfile rsync -I -z --no-whole-file --block-size=32816 srcfile destfile So, we need to do something to "see_deflate_token". It would probably work to ignore the Z_BUF_ERROR and let the loop exit because the output buffer wasn't filled. It seems inconsistent that the sender uses Z_INSERT_ONLY while the receiver uses this hack of synthesizing part of the compressed stream. (Previously, I had imagined Z_INSERT_ONLY worked on both sides.) -- Matt
On Thu, Dec 10, 2009 at 11:58 PM, Dimitar Dimitrov <dimitar.dimitrov at markit.com> wrote:> script as described in the troubleshooting procedures). The strace output is hereWhat this reveals is that the receiver is getting an inflate error, sending the message to the generator, exiting, and then the generator hangs reading the message pipe. Why it does not get a EOF as it should, I have no idea, but that seems like an OS issue. Since the reason for things falling apart is that there was an error in the compress code, you should be able to avoid the hang (for now) by avoiding the -z option. If you discover anything about why that pipe read doesn't return an EOF, let me know. (You may want to ask some debian folks about that.) ..wayne..
On Mon, 2009-12-21 at 10:00 -0800, Wayne Davison wrote:> Since the reason for things falling apart is that there was an error > in the compress code, you should be able to avoid the hang (for now) > by avoiding the -z option.The compression error has been explained: http://lists.samba.org/archive/rsync/2009-December/024392.html -- Matt
On Sun, Dec 13, 2009 at 8:47 PM, Matt McCutchen <matt at mattmccutchen.net> wrote:>?But if the data block was exactly CHUNK_SIZE (32816), it would fill > the output buffer with nothing remaining, and the next call to "inflate" > would return Z_BUF_ERROR.Thanks for tracking that down! I have fixed this issue in both the 3.1.0dev git and the b3.0.x branch that will be used for the 3.0.7 release. ..wayne..