Hey list, I am having problems as of late with my rsync backup. On the client side I am using the following: OPTS="-avvvrz --compress-level=9 --itemize-changes --delete --delete-excluded --human-readable --files-from=$FILES --include-from=$INCLUDES --exclude-from=$EXCLUDES --partial --progress --owner --perms --progress --timeout=0 --times --stats" sudo rsync -e "ssh -i ${IDENTITY_FILE} -v -p ${REMOTE_PORT}" $OPTS / $REMOTE_USER@$REMOTE_HOST:$REMOTE_PATH Note that I have temporarily disabled timeouts and added extra verbosity. The transfer to the remote host via SSH works fine, up until it gets to a 30+ GB file (a VM image). It gets about 90+ percent of the way through, hangs, and then times out. On the client side I see the following: ... rsync: connection unexpectedly closed (3542035 bytes received so far) [sender] rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.1] [sender] _exit_cleanup(code=12, file=io.c, line=226): about to call exit(255) On the server side if I attach to the rsync process via strace, I see the following: $ strace -f -p 3095 ... 3095 select(4, [3], [1], [1], {60, 0}) = 1 (out [1], left {59, 999971}) 3095 write(1, "\3\0\0\7\1\0\0", 7) = 7 3095 gettimeofday({1476036967, 673095}, NULL) = 0 3095 select(4, [3], [1], [1], {60, 0}) = 1 (out [1], left {59, 999970}) 3095 write(1, "H\0\0\trecv_files(home/kip/.Virtual"..., 76) = 76 3095 gettimeofday({1476036967, 680312}, NULL) = 0 3095 select(4, [3], [], NULL, {60, 0}) = 0 (Timeout) 3095 select(4, [3], [], NULL, {60, 0}) = 0 (Timeout) 3095 select(4, [3], [], NULL, {60, 0}) = 0 (Timeout) 3095 select(4, [3], [], NULL, {60, 0}) = 0 (Timeout) 3095 select(4, [3], [], NULL, {60, 0}) = 1 (in [3], left {40, 364402}) 3095 read(3, "B\0\0\trecv_files(home/kip/.Virtual"..., 8184) = 160 3095 select(4, [3], [1], [1], {60, 0}) = 1 (out [1], left {59, 999973}) 3095 write(1, "B\0\0\trecv_files(home/kip/.Virtual"..., 70) = 70 3095 gettimeofday({1476037227, 506412}, NULL) = 0 3095 select(4, [3], [1], [1], {60, 0}) = 1 (out [1], left {59, 999971}) 3095 write(1, "V\0\0\trecv mapped home/kip/.Virtua"..., 90) = 90 3095 gettimeofday({1476037227, 512591}, NULL) = 0 3095 select(4, [3], [], NULL, {60, 0}) = 0 (Timeout) ... a couple hundred times or so repeats ... 3095 select(4, [3], [], NULL, {60, 0}) = 0 (Timeout) 3095 select(4, [3], [], NULL, {60, 0} Note that it looks like the select() call is timing out for what I presume is a regular file descriptor (4 since stdin, stdout, and stderr are 0-3 respectively). This could have nothing to do with rsync at all and could be a file system issue, but I figured I'd ask. The server the data is being uploaded to with the strace running on it has rsync version: $ rsync --version rsync version 3.0.9 protocol version 30 The client reported: $ rsync --version rsync version 3.1.1 protocol version 31 Any help appreciated. Regards, -- Kip Warner -- Senior Software Engineer OpenPGP encrypted/signed mail preferred http://www.thevertigo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 163 bytes Desc: not available URL: <http://lists.samba.org/pipermail/rsync/attachments/20161010/9dec2e37/signature.sig>
> I am having problems as of late with my rsync backup.Have you tried performing a copy to a known good local device? If a local copy fails, then I would start checking the file system of the source and also the hardware of that system. ---------------------------- HTRAX : http://www.htrax.xyz
On Mon 10 Oct 2016, Kip Warner wrote:> > The server the data is being uploaded to with the strace running on it > has rsync version: > > $ rsync --version > rsync version 3.0.9 protocol version 30 > > The client reported: > > $ rsync --version > rsync version 3.1.1 protocol version 31As always it's best to first upgrade to the current version (3.1.3) if at all possible, as there's always the chance that the cause of your problems has already been fixed. Paul
On Wed, 2016-10-12 at 13:30 +1300, Henri Shustak wrote:> Have you tried performing a copy to a known good local device? If a > local copy fails, then I would start checking the file system of the > source and also the hardware of that system.That's a good idea. I just tried that and it copied no problem. -- Kip Warner -- Senior Software Engineer OpenPGP encrypted/signed mail preferred http://www.thevertigo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 163 bytes Desc: This is a digitally signed message part URL: <http://lists.samba.org/pipermail/rsync/attachments/20161012/dae67bc9/signature.sig>
On Wed, 2016-10-12 at 08:36 +0200, Paul Slootman wrote:> As always it's best to first upgrade to the current version (3.1.3) > if at all possible, as there's always the chance that the cause of > your problems has already been fixed.Good call, but I believe I may have ruled this out. I didn't upgrade to 3.1.3, but both sides are running 3.1.1 protocol version 31 now. Same problem. I think the key insight was in the strace log which showed the select() call was timed out. If I knew what type of file descriptor it was being fed, I might have a clue. It might have been a socket or something on disk. I don't know. -- Kip Warner -- Senior Software Engineer OpenPGP encrypted/signed mail preferred http://www.thevertigo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 163 bytes Desc: This is a digitally signed message part URL: <http://lists.samba.org/pipermail/rsync/attachments/20161012/54739acc/signature.sig>
On 2016-10-10 17:24, Kip Warner wrote:> Note that I have temporarily disabled timeouts and added extra > verbosity. The transfer to the remote host via SSH works fine, up until > it gets to a 30+ GB file (a VM image). It gets about 90+ percent of the > way through, hangs, and then times out.I have a similar but different problem. I make a regular download from a site that always errors out on a particular large file. However, my rsync error symptoms are different. Unfortunately, the server admins seem to be the strong, silent types who have repeatedly changed their minds about what they think is wrong and who may or may not be attempting to solve the problem in isolation - I've failed to get any meaningful communication going with them. Fortunately, excluding the problem file is a reasonable workaround for me. Have you tried excluding the problem file from the transfer? One possibility is that the problem is not caused directly by rsync but because of some underlying filesystem glitch. What OS & filesystems are you using? Cheers, Dave
On Thu, 2016-10-13 at 10:09 +0100, Dave Howorth wrote:> Have you tried excluding the problem file from the transfer?Hey Dave. All the other files appear to sync, up until it gets to that one large file. Then it stalls, and finally times out. I could tell it to exclude that important file, but that would defeat the purpose of my backup.> One possibility is that the problem is not caused directly by rsync > but because of some underlying filesystem glitch. What OS & > filesystems are you using?That could well be, but how to know? Client side: $ lsb_release -a LSB Version: security-9.20160110ubuntu5-amd64:security-9.20160110ubuntu5-noarch Distributor ID: Ubuntu Description: Ubuntu 16.10 Release: 16.10 Codename: yakkety Server side: $ lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 7.11 (wheezy) Release: 7.11 Codename: wheezy -- Kip Warner -- Senior Software Engineer OpenPGP encrypted/signed mail preferred http://www.thevertigo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 163 bytes Desc: This is a digitally signed message part URL: <http://lists.samba.org/pipermail/rsync/attachments/20161013/d766b072/signature.sig>