Anthony Sadler
2006-May-03  02:04 UTC
Rsync error on client end: unexpected tag 3 [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(843) [sender]
Hi guys:
First post for me on this list. 
THE PROBLEM
==========I have several linux servers (ranging from Redhat 7.3 to FC3) that use
Rsync (version 2.6.8 and 2.6.7, read on for reasons) to backup their data. A
typical rsync backup script looks like this:
------------------------------------------------------------------------------------------
#!/bin/sh
RSYNC_PASSWORD=<PASSWORD>
rsync -a --delete --password-file=<PASSWORD FILE> --ignore-errors
--exclude '/dev/*' --exclude '/sys/*' --exclude
'/proc/*' --exclude '/var/run/*' --exclude '/tmp/*'
--exclude '/home/downloads/*' /
<USERNAME>@<ADDRESS>::<MODULE> >> /var/log/rsync.log
rsync -a --progress --delete --ignore-errors --password-file=<PASSWORD
FILE> /home/downloads/ <USERNAME>@<ADDRESS>::<MODULE>
>> /var/log/rsync.log
------------------------------------------------------------------------------------------
(Some things have been replaced to prevent people doing mean things to our
servers :P)
For a long time rsync has worked, occasionly barfing due to lack of harddrive
space or for other external reasons. When we upgraded to version 2.6.8 however,
things started to die.
Here is a typical error we get:
------------------------------------------------------------------------------------------
unexpected tag 3 [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(828)
[sender=2.6.8]
------------------------------------------------------------------------------------------
SERVERS
======For the purpose of this email, I will talk about two servers in
particular. One is an apache server, which I will refer to as APACHE, the other
our backup server, which we will call RSERVER.
APACHE is a Fedora Core release 3 (Heidelberg) box, running Linux
2.6.12-1.1381_FC3 with Rsync 2.6.8, installed from an RPM. All software is up to
date to my knowledge.
RSERVER is also a Fedora Core release 3 (Heidelberg) box, also running Linux
2.6.12-1.1381_FC3 with Rsync 2.6.8, again installed from RPM. Again, all
software is up to date.
SETUP
====The script you see above is the script used to copy the data between APACHE
and RSERVER. It runs once a night, starting at 21:00 and goes for some hours.
The amount of data it has to think about amounts to is some 100 gig. What it has
to sync over the course of an evening can be siginificantly less (I see two
reports that say 46 meg and 32 meg).
The two machines talk to each other over a 100 megabit switched LAN. APACHE is a
server that has port forwards to it and has no software firewall running (well
Iptables almost always runs, but it has no rules and all chains are set to
ACCEPT).
The same with RSERVER.
The script is kicked off by a cronjob running as root.
DETAILS
======Around about 10 minutes after the script runs, it will fail with the above
error message. With small amounts of data (eg 10 meg, 20 meg, no known boundary)
the script will work. In fact, I see that some files do work with the cron
script. However when I start to sync more data (1.6 gig, 1.7 gig etc), it fails
part of the way through with the above.
Interestingly, CWRsync seems to work (ie Rsync running under Cygwin on Windows).
However this is not exclusive. Some do, some do not. I cannot work out why (it
may be because they are under the size threshold?)
TROUBLESHOOTING
==============- Small Rsyncs will work. No known boundary between small and
large though (ie I cant say "More than this amount of data it will
fail")
- Tried running Rsync in Xinetd.
- Tried running Rsync as a daemon, as root
- Using blocking-io
- Using no-blocking-io
- Pings and SSH sessions to watch for network problems.
- Packet dumps (very limited, I have trouble with TCPDump... give me ethereal
any day :P)
- Straces
- Replacing 2.6.8 with 2.6.7. 2.6.7 used to work. (that's why I have said
above two seperate version):
	We get a marginally different error message:
	------------------------------------------------------------------------------------------
	unexpected tag 3 [sender]
	rsync error: error in rsync protocol data stream (code 12) at io.c(843)
[sender]
	------------------------------------------------------------------------------------------
	The difference is that the sender on the 2nd line does not have (2.6.8).
That's the only difference I can find.
STRACE #1
========I can provide whole straces if people request, but here are some
interesting things I have found.
This is on server RSERVER, the rsync server side, running 2.6.8 as xinetd:
------------------------------------------------------------------------------------------
21219 open("/etc/localtime", O_RDONLY)  = -1 ENOENT (No such file or
directory)
21219 write(3, "2006/05/02 05:10:36 [21219] recv"..., 143) = 143
21219 select(5, [4], [0], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 select(5, [4], [], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 read(4, "s\0\0\n", 4)             = 4
21219 select(5, [4], [], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 read(4, "recv 51.32.168.192.in-addr.arpa "..., 115) = 115
21219 time(NULL)                        = 1146546636
21219 open("/etc/localtime", O_RDONLY)  = -1 ENOENT (No such file or
directory)
21219 write(3, "2006/05/02 05:10:36 [21219] recv"..., 143) = 143
21219 select(5, [4], [0], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 select(5, [4], [], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 read(4, "j\0\0\n", 4)             = 4
21219 select(5, [4], [], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 read(4, "recv 51.32.168.192.in-addr.arpa "..., 106) = 106
21219 time(NULL)                        = 1146546636
21219 open("/etc/localtime", O_RDONLY)  = -1 ENOENT (No such file or
directory)
21219 write(3, "2006/05/02 05:10:36 [21219] recv"..., 134) = 134
21219 select(5, [4], [0], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 select(5, [4], [], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 read(4, "f\0\0\n", 4)             = 4
21219 select(5, [4], [], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 read(4, "recv 51.32.168.192.in-addr.arpa "..., 102) = 102
21219 time(NULL)                        = 1146546636
21219 open("/etc/localtime", O_RDONLY)  = -1 ENOENT (No such file or
directory)
21219 write(3, "2006/05/02 05:10:36 [21219] recv"..., 130) = 130
21219 select(5, [4], [0], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 select(5, [4], [], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 read(4, "2\0\0\r", 4)             = 4
21219 select(5, [4], [], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 read(4, "rsync: read error: Connection re"..., 50) = 50
21219 time(NULL)                        = 1146546636
21219 open("/etc/localtime", O_RDONLY)  = -1 ENOENT (No such file or
directory)
21219 write(3, "2006/05/02 05:10:36 [21219] rsyn"..., 78) = 78
21219 select(5, [4], [0], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 select(5, [4], [], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 read(4, "Y\0\0\10", 4)            = 4
21219 select(5, [4], [], NULL, {60, 0}) = 1 (in [4], left {60, 0})
21219 read(4, "rsync error: error in rsync prot"..., 89) = 89
21219 time(NULL)                        = 1146546636
21219 open("/etc/localtime", O_RDONLY)  = -1 ENOENT (No such file or
directory)
21219 write(3, "2006/05/02 05:10:36 [21219] rsyn"..., 117) = 117
------------------------------------------------------------------------------------------
Notice /etc/localtime not being able to open? 
------------------------------------------------------------------------------------------
[root@RSERVER tmp]# ls -lah /etc/localtime
lrwxrwxrwx  1 root root 36 Mar 27 13:50 /etc/localtime ->
/usr/share/zoneinfo/Australia/Sydney
[root@RSERVER tmp]# ls -lah /usr/share/zoneinfo/Australia/Sydney
-rw-r--r--  1 root root 785 Mar 27 11:05 /usr/share/zoneinfo/Australia/Sydney
[root@RSERVER tmp]#
------------------------------------------------------------------------------------------
Another weird thing is that it does the same for /etc/fstab! (The below is
significantly further back in the STRACE than the above)
------------------------------------------------------------------------------------------
21219 open("/etc/mtab", O_RDONLY)       = -1 ENOENT (No such file or
directory)
21219 open("/etc/fstab", O_RDONLY)      = -1 ENOENT (No such file or
directory)
21219 open("/proc/meminfo", O_RDONLY)   = -1 ENOENT (No such file or
directory)
------------------------------------------------------------------------------------------
Yet at other places in the same STRACE, it does open the file:
------------------------------------------------------------------------------------------
21219 open("/etc/localtime", O_RDONLY)  = 3
21219 fstat64(3, {st_mode=S_IFREG|0644, st_size=785, ...}) = 0
21219 fstat64(3, {st_mode=S_IFREG|0644, st_size=785, ...}) = 0
------------------------------------------------------------------------------------------
I assume that is exit code 0, which means it works. Either way it doesn't
throw an error I can see.
STRACE #2
========This is an strace run on APACHE (the rsync client). I apologise, I
cannot say whether it was 2.6.8 or 2.6.7. I'm pretty sure it's 2.6.8.
Here is the interesting section:
------------------------------------------------------------------------------------------
5503  gettimeofday({1146614119, 369460}, NULL) = 0
5503  ioctl(1, TIOCGPGRP, [48])         = -1 ENOTTY (Inappropriate ioctl for
device)
5503  write(1, "         583 100%    0.94kB/s   "..., 41) = 41
5503  select(4, NULL, [3], NULL, {60, 0}) = 1 (out [3], left {60, 0})
5503  write(3, "\333\323\361\276[\177\376\33\3\34\302\4\235.\237\301",
16) = 16
5503  gettimeofday({1146614119, 369703}, NULL) = 0
5503  write(1, "         583 100%    0.94kB/s   "..., 75) = 75
5503  close(4)                          = 0
5503  select(4, [3], [], NULL, {60, 0}) = 1 (in [3], left {60, 0})
5503  read(3, "Y\0\0\n", 4)             = 4
5503  write(2, "unexpected tag 3 [sender]", 25) = 25
5503  write(2, "\n", 1)                 = 1
5503  rt_sigaction(SIGUSR1, {SIG_IGN}, NULL, 8) = 0
5503  rt_sigaction(SIGUSR2, {SIG_IGN}, NULL, 8) = 0
5503  write(2, "rsync error: error in rsync prot"..., 80) = 80
5503  write(2, "\n", 1)                 = 1
5503  munmap(0xb7d67000, 4096)          = 0
5503  exit_group(12)                    = ?
------------------------------------------------------------------------------------------
Specifically the 2nd line: 
5503  ioctl(1, TIOCGPGRP, [48])         = -1 ENOTTY (Inappropriate ioctl for
device)
TCPDUMPS
=======I played with this for all of about 3 minutes. I didn't get far.
RSYNC LOG FILE
=============2006/05/03 00:28:31 [22725] rsync: writefd_unbuffered failed to
write 131 bytes [generator]: Connection reset by peer (104)
2006/05/03 00:28:31 [22725] rsync error: error in rsync protocol data stream
(code 12) at io.c(1119) [generator=2.6.8]
OTHER INFO
=========------------------------------------------------------------------------------------------
[root@celery tmp]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda1             9.7G  3.6G  5.6G  39% /
none                  379M     0  379M   0% /dev/shm
/dev/hda4              17G  675M   15G   5% /home
/dev/hda2             9.7G  360M  8.8G   4% /var
/dev/hdb1             276G  176G   87G  68% /disk2
/dev/hdc1             276G  175G   87G  67% /disk3
/dev/sda1             276G  225G   37G  87% /media/usbdisk
------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------
/etc/rsyncd.conf:
#global paramaters
log file = /var/log/rsyncd.log
read only = no
uid = root
[APACHE]
comment = Backup of APACHE
path = /disk3/backup/0/APACHE
use chroot = yes
auth users = <USERNAME>
secrets file = /etc/rsyncd.secrets
hosts allow = <SUBNET> 127.0.0.1
transfer logging = yes
#log file = /var/log/rsync.APACHE.log
------------------------------------------------------------------------------------------
MY THOUGHTS
==========It seems that rsync is reaching some random file and then going, nope,
I've done enough work. Go away. I've tried looking for bad files, but I
have found nothing.
FROM HERE
========I'm just about out of ideas what to do... Returning to the old
version doesn't seem to fix it. I'm more than happy to provide more
information if requested. I've put all that I can think into the above
email. Let me know if you need anything else and I'll provide it.
Many thanks,
Anthony Sadler
Far Edge Technology
w: (02) 8425 1400
Wayne Davison
2006-May-03  05:31 UTC
Rsync error on client end: unexpected tag 3 [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(843) [sender]
On Wed, May 03, 2006 at 11:58:07AM +1000, Anthony Sadler wrote:> unexpected tag 3 [sender]My analysis shows that some deferred messages (which are those that arrive in the generator when the generator is already trying to flush a partial message down the socket) are not being handled by rwrite() like they should be. Furthermore, the MSG_LOG message doesn't actually need to be deferred, since they cannot result in any data being written to the socket. This bug can only affect an rsync daemon that is receiving files, as all other scenarios behave correctly with the existing code. The attached patch should fix the problem. I've also checked the fix into CVS (and it will appear in the next nightly tar file that gets generated). Thanks for the detailed report, it was very helpful. ..wayne.. -------------- next part -------------- --- io.c 25 Apr 2006 23:51:15 -0000 1.193 +++ io.c 3 May 2006 05:13:31 -0000 @@ -294,6 +294,7 @@ static void read_msg_fd(void) exit_cleanup(RERR_STREAMIO); } close_multiplexing_out(); + defer_forwarding_messages = 0; /* FALL THROUGH */ case MSG_INFO: case MSG_ERROR: @@ -303,7 +304,8 @@ static void read_msg_fd(void) if (n >= sizeof buf) n = sizeof buf - 1; read_loop(fd, buf, n); - if (am_generator && am_server && defer_forwarding_messages) + if (am_generator && am_server + && defer_forwarding_messages && tag != MSG_LOG) msg_list_add(&msg2sndr, tag, buf, n); else rwrite((enum logcode)tag, buf, n); @@ -1134,11 +1136,20 @@ static void msg2sndr_flush(void) while (msg2sndr.head && io_multiplexing_out) { struct msg_list_item *m = msg2sndr.head; + int tag = (IVAL(m->buf, 0) >> 24) - MPLEX_BASE; if (!(msg2sndr.head = m->next)) msg2sndr.tail = NULL; - stats.total_written += m->len; defer_forwarding_messages = 1; - writefd_unbuffered(sock_f_out, m->buf, m->len); + switch (tag) { + case MSG_INFO: + case MSG_ERROR: + rwrite((enum logcode)tag, m->buf + 4, m->len - 4); + break; + default: + stats.total_written += m->len; + writefd_unbuffered(sock_f_out, m->buf, m->len); + break; + } defer_forwarding_messages = 0; free(m); }
Wayne Davison
2006-May-03  06:15 UTC
Rsync error on client end: unexpected tag 3 [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(843) [sender]
On Wed, May 03, 2006 at 11:58:07AM +1000, Anthony Sadler wrote:> Notice /etc/localtime not being able to open?That happens because you're using "use chroot = yes". When rsync is running inside a chroot jail, it cannot access files that exist outside the rsync module's path. Rsync attempts to handle the timezone situation by using localtime() prior to the chroot call, but on some systems this is not enough to keep the library code really happy. I've checked in a change that makes rsync also call strftime() prior to the chroot (for those systems that are configured to use strftime()). ..wayne.. -------------- next part -------------- --- log.c 25 Apr 2006 23:51:15 -0000 1.146 +++ log.c 3 May 2006 06:06:19 -0000 @@ -154,17 +154,14 @@ static void logfile_open(void) void log_init(void) { - time_t t; - if (log_initialised) return; log_initialised = 1; /* this looks pointless, but it is needed in order for the * C library on some systems to fetch the timezone info * before the chroot */ - t = time(NULL); - localtime(&t); + timestring(time(NULL)); /* optionally use a log file instead of syslog */ logfname = lp_log_file();