Mailing list subscriptions
2006-Oct-11 11:35 UTC
"Invalid file index" failures, suspicious index numbers
I've trying to do a backup from a remote host to a local machine via
rsync using the following configuration:
Remote host (Red Hat Enterprise Linux ES3):
$ uname -srmpi
Linux 2.4.21-47.EL i686 athlon i386
$ cat /etc/redhat-release
Red Hat Enterprise Linux ES release 3 (Taroon Update 8)
$ rsync --version
rsync version 2.6.8 protocol version 29
Local machine (Mac OS X 10.4.8, Intel):
$ uname -v
Darwin Kernel Version 8.8.1: Mon Sep 25 19:42:00 PDT 2006;
root:xnu-792.13.8.obj~1/RELEASE_I38
$ rsync --version
rsync version 2.6.8 protocol version 29
The backup is started from the local machine with root privs (using
sudo), encrypted transport provided by SSH with public-key
authentication; the invocation looks like this:
sudo rsync -e "ssh -i full_path_to_key_file" -avzx \
--numeric-ids --delete --progress \
root@example.com:/ /var/root/backups/example.com/
On the remote host, rsync logs in with root privileges and a forced
command appears in the authorized keys file; logging shows that this
is the actual command being sent to the server:
rsync --server --sender -vlogDtprxz --numeric-ids . /
Now, I think the SSH side of things is working fine because dry runs
(with the -n switch) work fine but as soon as I try to do a real
backup I always bail with an error like this:
Invalid file index: -1610612736 (count=9406) [sender]
rsync error: protocol incompatibility (code 2) at sender.c(169)
[sender=2.6.8]
rsync: writefd_unbuffered failed to write 4092 bytes [generator]:
Broken pipe (32)rsync: connection unexpectedly closed (195901 bytes
received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c
(463) [receiver=2.6.8]
rsync error: error in rsync protocol data stream (code 12) at io.c
(1119) [generator=2.6.8]
Sometimes the numbers are different (-1610612736, 1610612736 or
524288) but the error is always the same: an "Invalid file index"
error and rsync subsequently dies.
In a previous post to the list (http://lists.samba.org/archive/rsync/
2006-June/015828.html) Wayne wrote this:
> This number is exactly 0x60000000, so another possibility that
> comes to mind is that the byte-order messed up somehow. Look in
> byteorder.h and make sure that CAREFUL_ALIGNMENT is being defined
> on any system that uses most-significant-byte-first ordering.
I'm getting the same number (1610612736), and I also note that 524288
is a suspicious-looking number as well (0x80000). I don't think there
is a byte-ordering issue at play here. The configure script correctly
identifies the build machine as little-endian (i386) and even if I
override things and set CAREFUL_ALIGNMENT rsync still bails with
those invalid file index errors.
Some warnings were produced during the build but they all look
harmless to me. Things like:
rsync.c: In function ?sig_int?:
rsync.c:231: warning: unused parameter ?val?
This is because __attribute__((__unused__)) is omitted on the Mac OS
X builds because of a bug in GCC; incidentally, that bug is no longer
present in the current version:
$ gcc --version
i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build
5363)
And some other warnings like this:
log.c: In function ?rwrite?:
log.c:307: warning: passing argument 2 of ?libiconv? from
incompatible pointer type
But that's just complaining because it expects a "const char*" and
it's being passed a "char *", and it looks harmless enough to me.
"make test" succeeds with no failures. I also tried all this out with
a copy of rsync built from yesterday's CVS and still the same
problems...
Any ideas for how I could troubleshoot this further? I've tried
running the backup with super verbosity (-vvv) at both ends; not sure
if this will be of any use, but here are some relevant bits of output:
* It transfers the files list like so; this continues on for many
thousands of lines:
receiving file list ...
server_sender starting pid=8651
[sender] make_file(.,*,2)
recv_file_name(.)
[sender] make_file(lost+found,*,2)
recv_file_name(lost+found)
[sender] make_file(boot,*,2)
recv_file_name(boot)
[sender] make_file(dev,*,2)
recv_file_name(dev)
[sender] make_file(proc,*,2)
recv_file_name(proc)
[sender] make_file(tmp,*,2)
recv_file_name(tmp)
[sender] make_file(var,*,2)
recv_file_name(var)
[sender] make_file(etc,*,2)
recv_file_name(etc)
[sender] make_file(root,*,2)
recv_file_name(root)
[sender] make_file(usr,*,2)
recv_file_name(usr)
[sender] make_file(lib,*,2)
recv_file_name(lib)
[sender] make_file(bin,*,2)
recv_file_name(bin)
[sender] make_file(home,*,2)
recv_file_name(home)
[sender] make_file(initrd,*,2)
recv_file_name(initrd)
[sender] make_file(mnt,*,2)
recv_file_name(mnt)
[sender] make_file(opt,*,2)
recv_file_name(opt)
[sender] make_file(sbin,*,2)
recv_file_name(sbin)
[sender] make_file(misc,*,2)
recv_file_name(misc)
[sender] make_file(nsr,*,2)
recv_file_name(nsr)
[sender] make_file(.autofsck,*,2)
recv_file_name(.autofsck)
[sender] make_file(dev/pts,*,2)
recv_file_name(dev/pts)
* After about 320,000 lines this phase comes to an end and the next
one starts:
[sender] make_file(nsr/cores/nsrexecd,*,2)
recv_file_name(nsr/cores/nsrexecd)
[sender] make_file(nsr/cores/nsrexecd/.nsr,*,2)
recv_file_name(nsr/cores/nsrexecd/.nsr)
received 161416 names
161416 files to consider
recv_file_list done
get_local_name count=161416 /var/root/backups/example.com/
generator starting pid=7608 count=161416
deleting in .
recv_files(161416) starting
delete_in_dir(.)
send_file_list done
send_files starting
[generator] make_file(..autofsck.RLTEQ5,*,2)
[generator] make_file(bin,*,2)
[generator] make_file(boot,*,2)
[generator] make_file(dev,*,2)
[generator] make_file(etc,*,2)
[generator] make_file(home,*,2)
[generator] make_file(lib,*,2)
deleting ..autofsck.RLTEQ5
delete_in_dir(bin)
[generator] make_file(bin/awk,*,2)
[generator] make_file(bin/bash2,*,2)
[generator] make_file(bin/bsh,*,2)
[generator] make_file(bin/csh,*,2)
[generator] make_file(bin/dnsdomainname,*,2)
[generator] make_file(bin/domainname,*,2)
[generator] make_file(bin/egrep,*,2)
[generator] make_file(bin/ex,*,2)
[generator] make_file(bin/fgrep,*,2)
[generator] make_file(bin/gtar,*,2)
[generator] make_file(bin/nisdomainname,*,2)
[generator] make_file(bin/rbash,*,2)
* This continues up to about the 348,000-line mark and we see "delta-
transmission enabled":
delete_in_dir(lib)
delta-transmission enabled
recv_generator(.,0)
set modtime of . to (1153412202) Thu Jul 20 18:16:42 2006
recv_generator(.autofsck,1)
recv_generator(bin,2)
recv_generator(bin/arch,3)
recv_generator(bin/ash,4)
recv_generator(bin/ash.static,5)
recv_generator(bin/awk,6)
bin/awk is uptodate
recv_generator(bin/basename,7)
recv_generator(bin/bash,8)
recv_generator(bin/bash2,9)
bin/bash2 is uptodate
recv_generator(bin/bsh,10)
bin/bsh is uptodate
* After a while there are a whole bunch of "mknod" calls; up to this
point we produce about 350,000 lines of log statements:
recv_generator(dev/hdo19,974)
mknod(dev/hdo19,060660,0x59000013)
set modtime of dev/hdo19 to (1088100278) Thu Jun 24 20:04:38 2004
set gid of dev/hdo19 from 0 to 6
recv_generator(dev/hdo2,975)
mknod(dev/hdo2,060660,0x59000002)
set modtime of dev/hdo2 to (1088100278) Thu Jun 24 20:04:38 2004
set gid of dev/hdo2 from 0 to 6
recv_generator(dev/hdo20,976)
* At this point it appears that the directories and device nodes have
all been created, so it starts to actually send/receive regular files
and very soon runs into the "Invalid file index" failure:
send_files(1, /.autofsck)
recv_files(.autofsck)
mknod(dev/hdo20,060660,0x59000014)
.autofsck
set modtime of dev/hdo20 to (1088100278) Thu Jun 24 20:04:38 2004
set gid of dev/hdo20 from 0 to 6
recv_generator(dev/hdo21,977)
send_files(0, /.)
send_files(0, /.)
mknod(dev/hdo21,060660,0x59000015)
send_files(0, /.)
set modtime of dev/hdo21 to (1088100278) Thu Jun 24 20:04:38 2004
set gid of dev/hdo21 from 0 to 6
recv_generator(dev/hdo22,978)
0 100% 0.00kB/s 0:00:00 (xfer#1, to-
check=161414/161416)
send_files(0, /.)
mknod(dev/hdo22,060660,0x59000016)
Invalid file index: -1610612736 (count=161416) [sender]
set modtime of dev/hdo22 to (1088100278) Thu Jun 24 20:04:38 2004
rsync error: protocol incompatibility (code 2) at sender.c(169)
[sender=2.6.8]
_exit_cleanup(code=2, file=sender.c, line=169): about to call exit(2)
set gid of dev/hdo22 from 0 to 6
recv_generator(dev/hdo23,979)
mknod(dev/hdo23,060660,0x59000017)
* Finally we see a bunch more mknod calls (about 3,000 lines' worth):
set modtime of dev/hdo23 to (1088100278) Thu Jun 24 20:04:38 2004
set gid of dev/hdo23 from 0 to 6
recv_generator(dev/hdo24,980)
mknod(dev/hdo24,060660,0x59000018)
set modtime of dev/hdo24 to (1088100278) Thu Jun 24 20:04:38 2004
set gid of dev/hdo24 from 0 to 6
recv_generator(dev/hdo25,981)
mknod(dev/hdo25,060660,0x59000019)
* And we ultimately die like this:
set modtime of dev/ptydd to (1088100278) Thu Jun 24 20:04:38 2004
set gid of dev/ptydd from 0 to 5
recv_generator(dev/ptyde,1822)
mknod(dev/ptyde,020666,0x20000ee)
set modtime of dev/ptyde to (1088100278) Thu Jun 24 20:04:38 2004
set gid of dev/ptyde from 0 to 5
rsync: connection unexpectedly closed (3208995 bytes received so far)
[generator]
rsync error: error in rsync protocol data stream (code 12) at io.c
(463) [generator=2.6.8]
_exit_cleanup(code=12, file=io.c, line=463): about to call exit(12)
So, not sure if there's anything in there which may explain the
failure. Please let me know if there's any more information that
would be useful.
Mailing list subscriptions
2006-Oct-14 17:33 UTC
"Invalid file index" failures, suspicious index numbers
El 11/10/2006, a las 13:01, Mailing list subscriptions escribi?:> Now, I think the SSH side of things is working fine because dry > runs (with the -n switch) work fine but as soon as I try to do a > real backup I always bail with an error like this: > > Invalid file index: -1610612736 (count=9406) [sender] > rsync error: protocol incompatibility (code 2) at sender.c(169) > [sender=2.6.8] > rsync: writefd_unbuffered failed to write 4092 bytes [generator]: > Broken pipe (32)rsync: connection unexpectedly closed (195901 bytes > received so far) [receiver] > rsync error: error in rsync protocol data stream (code 12) at io.c > (463) [receiver=2.6.8] > rsync error: error in rsync protocol data stream (code 12) at io.c > (1119) [generator=2.6.8] > > Sometimes the numbers are different (-1610612736, 1610612736 or > 524288) but the error is always the same: an "Invalid file index" > error and rsync subsequently dies.For what it's worth, I just tried the new 2.6.9pre1 release (on both remote server and local machine) and the backup run completed successfully. The was about 3 gig of data (150,000 items). Not sure what the difference is between 2.6.9pre1 and the version I tried from CVS only a couple of days ago, but it's working now. Perhaps a freak coincidence.
Wayne Davison
2006-Oct-15 16:33 UTC
"Invalid file index" failures, suspicious index numbers
On Wed, Oct 11, 2006 at 01:01:33PM +0200, Mailing list subscriptions wrote:> sudo rsync -e "ssh -i full_path_to_key_file" -avzx \ > --numeric-ids --delete --progress \ > root@example.com:/ /var/root/backups/example.com/ > > On the remote host, rsync logs in with root privileges and a forced > command appears in the authorized keys file; logging shows that this > is the actual command being sent to the server: > > rsync --server --sender -vlogDtprxz --numeric-ids . /If the remote rsync doesn't get the same options that the local rsync sent, the protocol can easily get out of whack, since it's fragile in that way: both sides must know what's happening for each byte in the protocol to be interpreted correctly. When using a forced ssh command, you should take a look at the script rrsync in the support directory (which can be used to limit the rsync command that is run without corrupting the options) and also check into using a single-instance daemon setup (because the demon protocol sends the options over the socket, and you use the daemon config file to limit what options are accepted). This was very recently discussed in this bug: https://bugzilla.samba.org/show_bug.cgi?id=4163 ..wayne..
Mailing list subscriptions
2006-Oct-15 16:59 UTC
"Invalid file index" failures, suspicious index numbers
El 15/10/2006, a las 18:33, Wayne Davison escribi?:> When using a forced ssh command, you should take a look at the script > rrsync in the support directory (which can be used to limit the rsync > command that is run without corrupting the options)One possible enhancement for the rrsync script: instead of explicitly disallowing options and implicitly accepting all others, it would probably be more robust/secure/future-proof to instead explicitly allow certain options and deny all others.
Wayne Davison
2006-Oct-15 17:55 UTC
"Invalid file index" failures, suspicious index numbers
On Sun, Oct 15, 2006 at 06:58:40PM +0200, Mailing list subscriptions wrote:> One possible enhancement for the rrsync script: instead of explicitly > disallowing options and implicitly accepting all others, it would > probably be more robust/secure/future-proof to instead explicitly > allow certain options and deny all others.The script already generates an error for any unknown option. ..wayne..