Mailing list subscriptions
2006-Oct-11 11:35 UTC
"Invalid file index" failures, suspicious index numbers
I've trying to do a backup from a remote host to a local machine via rsync using the following configuration: Remote host (Red Hat Enterprise Linux ES3): $ uname -srmpi Linux 2.4.21-47.EL i686 athlon i386 $ cat /etc/redhat-release Red Hat Enterprise Linux ES release 3 (Taroon Update 8) $ rsync --version rsync version 2.6.8 protocol version 29 Local machine (Mac OS X 10.4.8, Intel): $ uname -v Darwin Kernel Version 8.8.1: Mon Sep 25 19:42:00 PDT 2006; root:xnu-792.13.8.obj~1/RELEASE_I38 $ rsync --version rsync version 2.6.8 protocol version 29 The backup is started from the local machine with root privs (using sudo), encrypted transport provided by SSH with public-key authentication; the invocation looks like this: sudo rsync -e "ssh -i full_path_to_key_file" -avzx \ --numeric-ids --delete --progress \ root@example.com:/ /var/root/backups/example.com/ On the remote host, rsync logs in with root privileges and a forced command appears in the authorized keys file; logging shows that this is the actual command being sent to the server: rsync --server --sender -vlogDtprxz --numeric-ids . / Now, I think the SSH side of things is working fine because dry runs (with the -n switch) work fine but as soon as I try to do a real backup I always bail with an error like this: Invalid file index: -1610612736 (count=9406) [sender] rsync error: protocol incompatibility (code 2) at sender.c(169) [sender=2.6.8] rsync: writefd_unbuffered failed to write 4092 bytes [generator]: Broken pipe (32)rsync: connection unexpectedly closed (195901 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c (463) [receiver=2.6.8] rsync error: error in rsync protocol data stream (code 12) at io.c (1119) [generator=2.6.8] Sometimes the numbers are different (-1610612736, 1610612736 or 524288) but the error is always the same: an "Invalid file index" error and rsync subsequently dies. In a previous post to the list (http://lists.samba.org/archive/rsync/ 2006-June/015828.html) Wayne wrote this:> This number is exactly 0x60000000, so another possibility that > comes to mind is that the byte-order messed up somehow. Look in > byteorder.h and make sure that CAREFUL_ALIGNMENT is being defined > on any system that uses most-significant-byte-first ordering.I'm getting the same number (1610612736), and I also note that 524288 is a suspicious-looking number as well (0x80000). I don't think there is a byte-ordering issue at play here. The configure script correctly identifies the build machine as little-endian (i386) and even if I override things and set CAREFUL_ALIGNMENT rsync still bails with those invalid file index errors. Some warnings were produced during the build but they all look harmless to me. Things like: rsync.c: In function ?sig_int?: rsync.c:231: warning: unused parameter ?val? This is because __attribute__((__unused__)) is omitted on the Mac OS X builds because of a bug in GCC; incidentally, that bug is no longer present in the current version: $ gcc --version i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5363) And some other warnings like this: log.c: In function ?rwrite?: log.c:307: warning: passing argument 2 of ?libiconv? from incompatible pointer type But that's just complaining because it expects a "const char*" and it's being passed a "char *", and it looks harmless enough to me. "make test" succeeds with no failures. I also tried all this out with a copy of rsync built from yesterday's CVS and still the same problems... Any ideas for how I could troubleshoot this further? I've tried running the backup with super verbosity (-vvv) at both ends; not sure if this will be of any use, but here are some relevant bits of output: * It transfers the files list like so; this continues on for many thousands of lines: receiving file list ... server_sender starting pid=8651 [sender] make_file(.,*,2) recv_file_name(.) [sender] make_file(lost+found,*,2) recv_file_name(lost+found) [sender] make_file(boot,*,2) recv_file_name(boot) [sender] make_file(dev,*,2) recv_file_name(dev) [sender] make_file(proc,*,2) recv_file_name(proc) [sender] make_file(tmp,*,2) recv_file_name(tmp) [sender] make_file(var,*,2) recv_file_name(var) [sender] make_file(etc,*,2) recv_file_name(etc) [sender] make_file(root,*,2) recv_file_name(root) [sender] make_file(usr,*,2) recv_file_name(usr) [sender] make_file(lib,*,2) recv_file_name(lib) [sender] make_file(bin,*,2) recv_file_name(bin) [sender] make_file(home,*,2) recv_file_name(home) [sender] make_file(initrd,*,2) recv_file_name(initrd) [sender] make_file(mnt,*,2) recv_file_name(mnt) [sender] make_file(opt,*,2) recv_file_name(opt) [sender] make_file(sbin,*,2) recv_file_name(sbin) [sender] make_file(misc,*,2) recv_file_name(misc) [sender] make_file(nsr,*,2) recv_file_name(nsr) [sender] make_file(.autofsck,*,2) recv_file_name(.autofsck) [sender] make_file(dev/pts,*,2) recv_file_name(dev/pts) * After about 320,000 lines this phase comes to an end and the next one starts: [sender] make_file(nsr/cores/nsrexecd,*,2) recv_file_name(nsr/cores/nsrexecd) [sender] make_file(nsr/cores/nsrexecd/.nsr,*,2) recv_file_name(nsr/cores/nsrexecd/.nsr) received 161416 names 161416 files to consider recv_file_list done get_local_name count=161416 /var/root/backups/example.com/ generator starting pid=7608 count=161416 deleting in . recv_files(161416) starting delete_in_dir(.) send_file_list done send_files starting [generator] make_file(..autofsck.RLTEQ5,*,2) [generator] make_file(bin,*,2) [generator] make_file(boot,*,2) [generator] make_file(dev,*,2) [generator] make_file(etc,*,2) [generator] make_file(home,*,2) [generator] make_file(lib,*,2) deleting ..autofsck.RLTEQ5 delete_in_dir(bin) [generator] make_file(bin/awk,*,2) [generator] make_file(bin/bash2,*,2) [generator] make_file(bin/bsh,*,2) [generator] make_file(bin/csh,*,2) [generator] make_file(bin/dnsdomainname,*,2) [generator] make_file(bin/domainname,*,2) [generator] make_file(bin/egrep,*,2) [generator] make_file(bin/ex,*,2) [generator] make_file(bin/fgrep,*,2) [generator] make_file(bin/gtar,*,2) [generator] make_file(bin/nisdomainname,*,2) [generator] make_file(bin/rbash,*,2) * This continues up to about the 348,000-line mark and we see "delta- transmission enabled": delete_in_dir(lib) delta-transmission enabled recv_generator(.,0) set modtime of . to (1153412202) Thu Jul 20 18:16:42 2006 recv_generator(.autofsck,1) recv_generator(bin,2) recv_generator(bin/arch,3) recv_generator(bin/ash,4) recv_generator(bin/ash.static,5) recv_generator(bin/awk,6) bin/awk is uptodate recv_generator(bin/basename,7) recv_generator(bin/bash,8) recv_generator(bin/bash2,9) bin/bash2 is uptodate recv_generator(bin/bsh,10) bin/bsh is uptodate * After a while there are a whole bunch of "mknod" calls; up to this point we produce about 350,000 lines of log statements: recv_generator(dev/hdo19,974) mknod(dev/hdo19,060660,0x59000013) set modtime of dev/hdo19 to (1088100278) Thu Jun 24 20:04:38 2004 set gid of dev/hdo19 from 0 to 6 recv_generator(dev/hdo2,975) mknod(dev/hdo2,060660,0x59000002) set modtime of dev/hdo2 to (1088100278) Thu Jun 24 20:04:38 2004 set gid of dev/hdo2 from 0 to 6 recv_generator(dev/hdo20,976) * At this point it appears that the directories and device nodes have all been created, so it starts to actually send/receive regular files and very soon runs into the "Invalid file index" failure: send_files(1, /.autofsck) recv_files(.autofsck) mknod(dev/hdo20,060660,0x59000014) .autofsck set modtime of dev/hdo20 to (1088100278) Thu Jun 24 20:04:38 2004 set gid of dev/hdo20 from 0 to 6 recv_generator(dev/hdo21,977) send_files(0, /.) send_files(0, /.) mknod(dev/hdo21,060660,0x59000015) send_files(0, /.) set modtime of dev/hdo21 to (1088100278) Thu Jun 24 20:04:38 2004 set gid of dev/hdo21 from 0 to 6 recv_generator(dev/hdo22,978) 0 100% 0.00kB/s 0:00:00 (xfer#1, to- check=161414/161416) send_files(0, /.) mknod(dev/hdo22,060660,0x59000016) Invalid file index: -1610612736 (count=161416) [sender] set modtime of dev/hdo22 to (1088100278) Thu Jun 24 20:04:38 2004 rsync error: protocol incompatibility (code 2) at sender.c(169) [sender=2.6.8] _exit_cleanup(code=2, file=sender.c, line=169): about to call exit(2) set gid of dev/hdo22 from 0 to 6 recv_generator(dev/hdo23,979) mknod(dev/hdo23,060660,0x59000017) * Finally we see a bunch more mknod calls (about 3,000 lines' worth): set modtime of dev/hdo23 to (1088100278) Thu Jun 24 20:04:38 2004 set gid of dev/hdo23 from 0 to 6 recv_generator(dev/hdo24,980) mknod(dev/hdo24,060660,0x59000018) set modtime of dev/hdo24 to (1088100278) Thu Jun 24 20:04:38 2004 set gid of dev/hdo24 from 0 to 6 recv_generator(dev/hdo25,981) mknod(dev/hdo25,060660,0x59000019) * And we ultimately die like this: set modtime of dev/ptydd to (1088100278) Thu Jun 24 20:04:38 2004 set gid of dev/ptydd from 0 to 5 recv_generator(dev/ptyde,1822) mknod(dev/ptyde,020666,0x20000ee) set modtime of dev/ptyde to (1088100278) Thu Jun 24 20:04:38 2004 set gid of dev/ptyde from 0 to 5 rsync: connection unexpectedly closed (3208995 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c (463) [generator=2.6.8] _exit_cleanup(code=12, file=io.c, line=463): about to call exit(12) So, not sure if there's anything in there which may explain the failure. Please let me know if there's any more information that would be useful.
Mailing list subscriptions
2006-Oct-14 17:33 UTC
"Invalid file index" failures, suspicious index numbers
El 11/10/2006, a las 13:01, Mailing list subscriptions escribi?:> Now, I think the SSH side of things is working fine because dry > runs (with the -n switch) work fine but as soon as I try to do a > real backup I always bail with an error like this: > > Invalid file index: -1610612736 (count=9406) [sender] > rsync error: protocol incompatibility (code 2) at sender.c(169) > [sender=2.6.8] > rsync: writefd_unbuffered failed to write 4092 bytes [generator]: > Broken pipe (32)rsync: connection unexpectedly closed (195901 bytes > received so far) [receiver] > rsync error: error in rsync protocol data stream (code 12) at io.c > (463) [receiver=2.6.8] > rsync error: error in rsync protocol data stream (code 12) at io.c > (1119) [generator=2.6.8] > > Sometimes the numbers are different (-1610612736, 1610612736 or > 524288) but the error is always the same: an "Invalid file index" > error and rsync subsequently dies.For what it's worth, I just tried the new 2.6.9pre1 release (on both remote server and local machine) and the backup run completed successfully. The was about 3 gig of data (150,000 items). Not sure what the difference is between 2.6.9pre1 and the version I tried from CVS only a couple of days ago, but it's working now. Perhaps a freak coincidence.
Wayne Davison
2006-Oct-15 16:33 UTC
"Invalid file index" failures, suspicious index numbers
On Wed, Oct 11, 2006 at 01:01:33PM +0200, Mailing list subscriptions wrote:> sudo rsync -e "ssh -i full_path_to_key_file" -avzx \ > --numeric-ids --delete --progress \ > root@example.com:/ /var/root/backups/example.com/ > > On the remote host, rsync logs in with root privileges and a forced > command appears in the authorized keys file; logging shows that this > is the actual command being sent to the server: > > rsync --server --sender -vlogDtprxz --numeric-ids . /If the remote rsync doesn't get the same options that the local rsync sent, the protocol can easily get out of whack, since it's fragile in that way: both sides must know what's happening for each byte in the protocol to be interpreted correctly. When using a forced ssh command, you should take a look at the script rrsync in the support directory (which can be used to limit the rsync command that is run without corrupting the options) and also check into using a single-instance daemon setup (because the demon protocol sends the options over the socket, and you use the daemon config file to limit what options are accepted). This was very recently discussed in this bug: https://bugzilla.samba.org/show_bug.cgi?id=4163 ..wayne..
Mailing list subscriptions
2006-Oct-15 16:59 UTC
"Invalid file index" failures, suspicious index numbers
El 15/10/2006, a las 18:33, Wayne Davison escribi?:> When using a forced ssh command, you should take a look at the script > rrsync in the support directory (which can be used to limit the rsync > command that is run without corrupting the options)One possible enhancement for the rrsync script: instead of explicitly disallowing options and implicitly accepting all others, it would probably be more robust/secure/future-proof to instead explicitly allow certain options and deny all others.
Wayne Davison
2006-Oct-15 17:55 UTC
"Invalid file index" failures, suspicious index numbers
On Sun, Oct 15, 2006 at 06:58:40PM +0200, Mailing list subscriptions wrote:> One possible enhancement for the rrsync script: instead of explicitly > disallowing options and implicitly accepting all others, it would > probably be more robust/secure/future-proof to instead explicitly > allow certain options and deny all others.The script already generates an error for any unknown option. ..wayne..