Travis Millican
2010-Sep-16 17:46 UTC
[Pkg-xen-devel] Bug#571634: xen-utils-common: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING
I recently encountered this in the logs of a new Debian Xen Dom0, and having now spent the better part of a day researching and testing, I've come to the conclusion that this is not a bug in xen-utils-common or even iptables; it's merely the consequence of structural changes to the core netfilter code starting in the 2.6.20 kernel. This is rather long, but the issue is complicated. Please bear with me :) I can't say with any certainty why some of you are having problems doing packet forwarding on DomUs with iptables in place, but I suspect it is a matter of misunderstanding how the bridging and routing kernel code now interact, and the implications for the physdev iptables module. Which is entirely understandable. I certainly didn't really "grok" what was going on until spending quite a few hours reading up on it. I'm glad I did, though, as I now know more than I ever wanted to about the kernel's netfilter code... An absolutely invaluable resource on this subject is the "ebtables/iptables interaction on a Linux-based bridge" document published by the ebtables developers: http://ebtables.sourceforge.net/br_fw_ia/br_fw_ia.html I don't know who specifically wrote it, but I can't thank them enough. If you're like me, you'll have to read this slowly and several times before it totally sinks in. I now have a copy of their detailed packet flow chart (bottom of the article) printed out and hanging next to my workstation :) The long and short of it is this -- there are two different general processes that an IP packet or link-layer frame can follow through the core netfilter code in the kernel: the bridging process and the routing process. Although iptables is "network layer" as opposed to "link layer", some of the iptables chains are hooked in both the routing process and the bridging process. The significance of this is that for these chains (the filter table's FORWARD chain being a crucial example), there are two entirely different contexts in which the chain may be processed. That is, iptables installs itself into the kernel hooks in both the bridging code and the routing code. More particularly, if the OUTPUT, FORWARD, or POSTROUTING chains are called from the routing context, no bridging decision has (yet) been made. Therefore it is not possible for --physdev-out to ever match in this context, even though it might naively seem to be a logical thing to do in certain situations. And that brings us to the kernel warning emitted with respect to iptables. Admittedly, the text of this warning could stand to be revised a bit, as it does tend to give the wrong impression. Taking a look at xt_physdev.c in the kernel code would be useful in figuring out what the warning truly indicates: In function physdev_mt_check, xt_physdev.c wrote:> if (!(info->bitmask & XT_PHYSDEV_OP_MASK) || > info->bitmask & ~XT_PHYSDEV_OP_MASK) > return false; > if (info->bitmask & XT_PHYSDEV_OP_OUT && > (!(info->bitmask & XT_PHYSDEV_OP_BRIDGED) || > info->invert & XT_PHYSDEV_OP_BRIDGED) && > par->hook_mask & ((1 << NF_INET_LOCAL_OUT) | > (1 << NF_INET_FORWARD) | (1 << NF_INET_POST_ROUTING))) { > printk(KERN_WARNING "physdev match: using --physdev-out in the " > "OUTPUT, FORWARD and POSTROUTING chains for non-bridged " > "traffic is not supported anymore.\n"); > if (par->hook_mask & (1 << NF_INET_LOCAL_OUT)) > return false; > }In a nutshell, this warning is emitted any time that you use the --physdev-out rule in the OUTPUT, FORWARD, or POSTROUTING chains if: 1. You haven't included the --physdev-is-bridged option as well. 2. You have explicitly tried to apply this rule to non-bridged traffic by including "! --physdev-is-bridged" in the rule. Here is the description of --physdev-is-bridged in the man page: IPTABLES(8) wrote:> Matches if the packet is being bridged and therefore is not > being routed. This is only useful in the FORWARD and POSTROUT- > ING chains.Note that these are the same chains mentioned by the warning. This is not merely coincidental. Since we can probably rule out the second condition for emitting this warning, the most likely reason that you are seeing this is the first. In plain English, what this warning actually indicates is that you have written a rule which *might* be processed in the context of the routing process, but which cannot ever possibly match in that context. This could leading to potentially unexpected behavior -- namely that the rule never matches any traffic at all. Therefore the netfilter code in the kernel would like to see you explicitly acknowledge this situation by always using --physdev-is-bridged whenever you use --physdev-out. By doing this, you make it clear to anyone who might be looking at your rules that this particular rule can only be used to match packets that have arrived at iptables via the bridging process. If it is your intention that your rule will only apply to bridged traffic in the first place, and if you have verified that the rule is in fact matching all of the traffic that you intend, you *can* safely ignore this warning. However, as a matter of best-practices, I would recommend that you go ahead and add --physdev-is-bridged to the rule for readability/maintainability reasons anyway. All that said, the nature of this warning might be a little less confusing if you know its history. Prior to kernel version 2.6.20, there were some deferred hooks in netfilter code that would allow the aforementioned chains to be processed *after* bridging had occurred, even for packets that followed the routing process rather than than the bridging process. This was yanked for the reasons given in the changelog: http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.20> [NETFILTER]: bridge-netfilter: remove deferred hooks > > Remove the deferred hooks and all related code as scheduled in > feature-removal-schedule.And per that: http://www.linuxhq.com/kernel/v2.6/20/Documentation/feature-removal-schedule.txt> What: Bridge netfilter deferred IPv4/IPv6 output hook calling > When: January 2007 > Why: The deferred output hooks are a layering violation causing unusual > and broken behaviour on bridge devices. Examples of things they > break include QoS classifation using the MARK or CLASSIFY targets, > the IPsec policy match and connection tracking with VLANs on a > bridge. Their only use is to enable bridge output port filtering > within iptables with the physdev match, which can also be done by > combining iptables and ebtables using netfilter marks. Until it > will get removed the hook deferral is disabled by default and is > only enabled when needed.Now we know why/when the feature was removed, but the warning message itself is still fairly confusing to the uninitiated. To explain that, we'll have to take a look at the patch history for xt_physdev.c. http://www.linuxhq.com/kernel/v2.6/20-rc4/net/netfilter/xt_physdev.c We can see that the previous incarnation of this warning was a deprecation warning:> printk(KERN_WARNING "physdev match: using --physdev-out in the " > "OUTPUT, FORWARD and POSTROUTING chains for non-bridged " > "traffic is deprecated and breaks other things, it will " > "be removed in January 2007. See Documentation/" > "feature-removal-schedule.txt for details. This doesn't " > "affect you in case you're using it for purely bridged " > "traffic.\n");In other words, the warning was originally added back when using --physdev-out was still possible for non-bridged traffic, but after that feature had already been slated for removal. I.e. it was added to warn people that their existing rules using --physdev-out might be broken soon when the netfilter deferred hooks were removed from the kernel. Once that change was committed, the warning was edited in a less-than-clear fashion. Unfortunately, there's no feasible way for this warning to only be emitted when the rule is actually processed from a non-bridging (routing) context, because doing so would require placing the check inside of the callback functions that are hooked into the netfilter code. I am not a kernel developer, but I suspect that this would have a unacceptably negative performance impact on iptables. So...there we are. If you're still having issues routing traffic in a Xen DomU, it may or may not be because of the condition flagged by the warning. The changes made to the kernel in 2.6.20 are not a "bug", but they may require that you re-think how you process traffic on a host that functions as both a bridge and a router (or as a combination brouter). It should still be possible for you to achieve whatever it is you're trying to do with the post-2.6.20 kernel, but you may need to get a bit more sophisticated. All I recommend is to check out the ebtables package. Ebtables is to the bridging/link-layer process what iptables is to the routing/network layer. The two are very similar by design, and both can be used in conjunction to handle more complicated bridging + routing conditions. -- Travis Millican
Possibly Parallel Threads
- Bug#571634: [xen-utils-common] using --physdev-out in the OUTPUT, FORWARD and POSTROUTING
- Bug#571634: xen-utils-common - using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic
- Bug#571634: xen-utils-common - using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic
- Bug#571634: xen-utils-common: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING
- Bug#571634: xen-utils-common: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING