Timothy J. Massey
2007-Feb-04 15:45 UTC
Reproducable failure with rsync, iptables and RHEL4
Hello! I have a consistent, reproducable failure performing an rsync of an RHEL4 system running rsync in daemon mode with iptables enabled. With iptables disabled, or with a rule that explicitly allows all traffic, the rsync completes. However, with iptalbes enabled, the rsync starts, but will not finish. It fails after copying a seemingly random amount of data. I have been able to reproduce this problem using a large variety of different computers, network switches, network cables, Linux distributions, kernel versions and rsync versions, installed in a variety of different physical locations. In fact, I am confident I have changed out every other component in the process: the only item that is consistently present when it fails is any version of RHEL (or CentOS) 4 is on the daemon (source) end, up to and including a fully-updated version. This problem did not happen with RHEL3 or previous versions. I have not found another Linux distribution that has this problem, either; however, *most* of our Linux systems that we would back up this way are running some version of Red Hat Linux, so this is far from conclusive. NOTE: the rsync daemon system is 172.28.16.36. The rsync client system is 172.28.16.35. Normally, the path would be /, but for testing I have changed it to /test. /test contains eight copies of /usr. This eliminates special files, as well as gives me enough data to copy: 4.1GB worth. The error occurs at some seemingly random point during the copy. I have had it happen anywhere between 200MB and 2GB into the copy. Here is the iptables ruleset for the source system: # cat iptables *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT -A RH-Firewall-1-INPUT -p icmp --icmp-type any -j ACCEPT -A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 873 -j ACCEPT -A RH-Firewall-1-INPUT -s 172.28.16.35 -d 172.28.16.36 -p tcp --dport 873 -j LOG -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited COMMIT # As you can see, I have pared this down to the bare minimum. You can also see that I have added a LOG rule (the second to last one). As far as I can tell, this rule should never be used: either the packet will be a new request and will be handled by the line above the LOG, or it will be a packet that's part of the established connection, and will therefore be handled by the ESTABLISHED rule. However, packets *are* logged by this rule. With the above iptables rules enabled on the rsync daemon side, I get the following error on the destination system: # rsync --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size=2048 --recursive --one-file-system backupuser@172.28.16.36::ROOT/* . Password: rsync: read error: No route to host rsync error: error in rsync protocol data stream (code 12) at io.c(177) rsync: connection unexpectedly closed (5682976 bytes read so far) rsync error: error in rsync protocol data stream (code 12) at io.c(165) # If you watch the network between these two systems, you will see high traffic between them. A top of either system shows that rsync is at the top. However, suddenly the network traffic will stop. A top of the system will no longer show rsync: it will have been moved to the bottom. About 10 minutes later, it will evenually time out with the error messages above. Here is the rsync daemon's log for this time: # tail /var/log/rsyncd.log 2007/02/04 00:27:41 [4386] name lookup failed for 172.28.16.35: Name or service not known 2007/02/04 00:27:43 [4386] rsync on ROOT/* from backupuser@UNKNOWN (172.28.16.35) 2007/02/04 05:45:50 [4386] rsync: writefd_unbuffered failed to write 4096 bytes: phase "unknown" [sender]: Connection timed out (110) 2007/02/04 05:45:50 [4386] rsync error: error in rsync protocol data stream (code 12) at io.c(909) # Here is the iptables status during the rsync *before* the error occurs (edited to avoid the spam filter): # iptables -L -v Chain RH-Firewall-1-INPUT (2 references) pkts bytes target prot opt in out source destination 3477 181K ACCEPT all -- any any anywhere anywhere state RELATED,ESTABLISHED 1 60 ACCEPT tcp -- any any anywhere anywhere state NEW tcp dpt:rsync 0 0 LOG tcp -- any any 172.28.16.35 172.28.16.36 tcp dpt:rsync LOG level warning # As you can see, it looks exactly like what you would expect: one packet starting the rsync request, and a bunch of established packets. However, notice the iptables status *after* the error has occurred: # iptables -L -v Chain RH-Firewall-1-INPUT (2 references) pkts bytes target prot opt in out source destination 256K 14M ACCEPT all -- any any anywhere anywhere state RELATED,ESTABLISHED 1 60 ACCEPT tcp -- any any anywhere anywhere state NEW tcp dpt:rsync 38 2732 LOG tcp -- any any 172.28.16.35 172.28.16.36 tcp dpt:rsync LOG level warning # There are 38 packets that were sent by the rsync client, but that iptables did not consider part of the established connection. Here are the first four logged packets. All 38 entries are at the bottom of this message. (Actually, no: the spam filter won't let me include them.) # dmesg ip_conntrack version 2.1 (4092 buckets, 32736 max) - 356 bytes per conntrack IN=eth0 OUT= MAC=00:02:55:3b:23:51:00:02:55:5b:75:13:08:00 SRC=172.28.16.35 DST=172.28.16.36 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=20551 DF PROTO=TCP SPT=32790 DPT=873 WINDOW=0 RES=0x00 ACK URGP=0 IN=eth0 OUT= MAC=00:02:55:3b:23:51:00:02:55:5b:75:13:08:00 SRC=172.28.16.35 DST=172.28.16.36 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=20552 DF PROTO=TCP SPT=32790 DPT=873 WINDOW=2896 RES=0x00 ACK URGP=0 IN=eth0 OUT= MAC=00:02:55:3b:23:51:00:02:55:5b:75:13:08:00 SRC=172.28.16.35 DST=172.28.16.36 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=20553 DF PROTO=TCP SPT=32790 DPT=873 WINDOW=2896 RES=0x00 ACK URGP=0 IN=eth0 OUT= MAC=00:02:55:3b:23:51:00:02:55:5b:75:13:08:00 SRC=172.28.16.35 DST=172.28.16.36 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=20554 DF PROTO=TCP SPT=32790 DPT=873 WINDOW=5792 RES=0x00 ACK URGP=0 # I do not know why iptables is filtering these packets. However, this is what is killing my rsyncs. To prove this, all you have to do is change the LOG iptables rule to ACCEPT (like this: -A RH-Firewall-1-INPUT -s 172.28.16.35 -d 172.28.16.36 -p tcp --dport 873 -j ACCEPT) and the rsync will complete successfully. Notice the iptables status after the rsync has completed: # iptables -L -v Chain RH-Firewall-1-INPUT (2 references) pkts bytes target prot opt in out source destination 983K 53M ACCEPT all -- any any anywhere anywhere state RELATED,ESTABLISHED 1 60 ACCEPT tcp -- any any anywhere anywhere state NEW tcp dpt:rsync 173K 9375K ACCEPT tcp -- any any 172.28.16.35 172.28.16.36 tcp dpt:rsync # As you can see, the client is sending *lots* of packets that are not being matched by the normal iptables rules, and need that extra second to last rule to prevent it from failing. In fact, once a single packet matches that rule, it does not seem that any more packets are matched by the ESTABLISHED rule: only that second to last rule's counts increase. It seems like iptables is somehow forgetting about the connection. Why is the rsync client's connection all of a sudden not being recognized? Is the client sending the packets differently, or is iptables recognizing them differently? Is there something I can do to get rsync to *not* send these packets, or to get iptables to filter them the way its supposed to? What changed that is causing RHEL4 to handle this differently than RHEL3? I would greatly appreciate any information you might be able to give me. Believe it or not, this is the *abridged* version: I've been struggling with this for several weeks now, trying scores of different permutations to try to resolve this. On top of that, the list's spam filter has forced me to pare down or eliminate a number of items I had originally tried to include. If you have any questions, or would like any additional information, please do not hesitate to ask. Thank you very much for any help or insight you might be able to provide. Timothy J. Massey
On Sun, Feb 04, 2007 at 10:45:28AM -0500, Timothy J. Massey wrote:> Why is the rsync client's connection all of a sudden not being > recognized? Is the client sending the packets differently, or is > iptables recognizing them differently?Rsync just opens a socket and writes data to it (as well as reading from it), so it sounds to me like this is something that is happening outside of rsync's control. Your investigative work seems very thorough -- have you run it by some Linux kernel and/or iptables folks to see what they think? ..wayne..
tmassey@obscorp.com
2007-Feb-05 15:22 UTC
Reproducable failure with rsync, iptables and RHEL4
Wayne Davison <wayned@samba.org> wrote on 02/04/2007 06:58:41 PM:> On Sun, Feb 04, 2007 at 10:45:28AM -0500, Timothy J. Massey wrote: > > Why is the rsync client's connection all of a sudden not being > > recognized? Is the client sending the packets differently, or is > > iptables recognizing them differently? > > Rsync just opens a socket and writes data to it (as well as reading from > it), so it sounds to me like this is something that is happening outside > of rsync's control.Sounds logical.> Your investigative work seems very thorough -- have you run it by some > Linux kernel and/or iptables folks to see what they think?Not yet. I figured I'd do one then the other, and I chose rsync first! I'll follow up with the iptables people. Thank you for the followup. Timothy J. Massey
Timothy J. Massey wrote:> Hello! > > I have a consistent, reproducable failure performing an rsync of an > RHEL4 system running rsync in daemon mode with iptables enabled. With > iptables disabled, or with a rule that explicitly allows all traffic, > the rsync completes. However, with iptalbes enabled, the rsync > starts, but will not finish. It fails after copying a seemingly > random amount of data.Could it be you're hitting a iptables session timeout setting? e.g. if (during a rsync transfer) rsync "hangs" while reading in a large directory listing, iptables may decide that tcp session is dead. Then when tcp packets start flowing again, iptables sees them as part of a new tcp session - and they're not part of an existing session - so they're rejected. ethereal/wireshark should be able to prove that. (however, I think all the "hanging" rsync does is right back in the beginning - which doesn't match your symptoms) -- Cheers Jason Haar Information Security Manager, Trimble Navigation Ltd. Phone: +64 3 9635 377 Fax: +64 3 9635 417 PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1