Florian Westphal
2015-Feb-23 16:06 UTC
[Bridge] [RFC PATCH v2] bridge: make it possible for packets to traverse the bridge without hitting netfilter
Imre Palik <imrep.amz at gmail.com> wrote:> The netfilter code is made with flexibility instead of performance in mind. > So when all we want is to pass packets between different interfaces, the > performance penalty of hitting netfilter code can be considerable, even when > all the firewalling is disabled for the bridge. > > This change makes it possible to disable netfilter on a per bridge basis. > In the case interesting to us, this can lead to more than 15% speedup > compared to the case when only bridge-iptables is disabled.I wonder what the speed difference is between no-rules (i.e., we hit jump label in NF_HOOK), one single (ebtables) accept-all rule, and this patch, for the call_nf==false case. I guess your 15% speedup figure is coming from ebtables' O(n) rule evaluation overhead? If yes, how many rules are we talking about? Iff thats true, then the 'better' (I know, it won't help you) solution would be to use nftables bridgeport-based verdict maps... If thats still too much overhead, then we clearly need to do *something*... Thanks, Florian
Imre Palik
2015-Feb-26 10:19 UTC
[Bridge] [RFC PATCH v2] bridge: make it possible for packets to traverse the bridge without hitting netfilter
On 02/23/15 17:06, Florian Westphal wrote:> Imre Palik <imrep.amz at gmail.com> wrote: >> The netfilter code is made with flexibility instead of performance in mind. >> So when all we want is to pass packets between different interfaces, the >> performance penalty of hitting netfilter code can be considerable, even when >> all the firewalling is disabled for the bridge. >> >> This change makes it possible to disable netfilter on a per bridge basis. >> In the case interesting to us, this can lead to more than 15% speedup >> compared to the case when only bridge-iptables is disabled. > > I wonder what the speed difference is between no-rules (i.e., we hit jump label > in NF_HOOK), one single (ebtables) accept-all rule, and this patch, for > the call_nf==false case.ebtables is completely empty: # ebtables -L Bridge table: filter Bridge chain: INPUT, entries: 0, policy: ACCEPT Bridge chain: FORWARD, entries: 0, policy: ACCEPT Bridge chain: OUTPUT, entries: 0, policy: ACCEPT on some bridges I have iptables rules, but on the critical bridges I am running with iptables disabled.> I guess your 15% speedup figure is coming from ebtables' O(n) rule > evaluation overhead? If yes, how many rules are we talking about?If you are looking for peculiarities in my setup then here they are: I am on 4k pages, and perf is not working :-( (I am trying to fix those too, but that is far from being a low hanging fruit.) So my guess would be that the packet pipeline doesn't fit in the cache/tlb> Iff thats true, then the 'better' (I know, it won't help you) solution > would be to use nftables bridgeport-based verdict maps... > > If thats still too much overhead, then we clearly need to do *something*... > > Thanks, > Florian >
David Miller
2015-Feb-26 16:34 UTC
[Bridge] [RFC PATCH v2] bridge: make it possible for packets to traverse the bridge without hitting netfilter
From: Imre Palik <imrep at amazon.de> Date: Thu, 26 Feb 2015 11:19:25 +0100> If you are looking for peculiarities in my setup then here they are: > I am on 4k pages, and perf is not working :-( > (I am trying to fix those too, but that is far from being a low hanging fruit.) > So my guess would be that the packet pipeline doesn't fit in the cache/tlbPure specualtion until you can actually use perf to measure these things. And I don't want to apply patches which were designed based upon pure speculation.
Felix Fietkau
2015-Feb-26 21:17 UTC
[Bridge] [RFC PATCH v2] bridge: make it possible for packets to traverse the bridge without hitting netfilter
On 2015-02-24 05:06, Florian Westphal wrote:> Imre Palik <imrep.amz at gmail.com> wrote: >> The netfilter code is made with flexibility instead of performance in mind. >> So when all we want is to pass packets between different interfaces, the >> performance penalty of hitting netfilter code can be considerable, even when >> all the firewalling is disabled for the bridge. >> >> This change makes it possible to disable netfilter on a per bridge basis. >> In the case interesting to us, this can lead to more than 15% speedup >> compared to the case when only bridge-iptables is disabled. > > I wonder what the speed difference is between no-rules (i.e., we hit jump label > in NF_HOOK), one single (ebtables) accept-all rule, and this patch, for > the call_nf==false case. > > I guess your 15% speedup figure is coming from ebtables' O(n) rule > evaluation overhead? If yes, how many rules are we talking about? > > Iff thats true, then the 'better' (I know, it won't help you) solution > would be to use nftables bridgeport-based verdict maps... > > If thats still too much overhead, then we clearly need to do *something*...I work with MIPS based routers that typically only have 32 or 64 KB of Dcache. I've had quite a bit of 'fun' working on optimizing netfilter on these systems. I've done a lot of measurements using oprofile (going to use perf on my next run). On these devices, even without netfilter compiled in, the data structures and code are already way too big for the hot path to fit in the Dcache (not to mention Icache). This problem has typically gotten a little bit worse with every new kernel release, aside from just a few exceptions. This means that in the hot path, any unnecessary memory access to packet data (especially IP headers) or to some degree also extra data structures for netfilter, ebtables, etc. has a significant and visible performance impact. The impact of the memory accesses is orders of magnitude bigger than the pure cycles used for running the actual code. In OpenWrt, I made similar hacks a long time ago, and on the system I tested on, the speedup was even bigger than 15%, probably closer to 30%. By the way, this was also with a completely empty ruleset. Maybe there's a way to get reasonable performance by optimizing NF_HOOK, however I'd like to remind you guys that if we have to fetch some netfilter/nftables/ebtables data structures and run part of the table processing code on a system where no rules are present (or ebtables functionality is otherwise not needed for a particular bridge), then performance is going to suck - at least on most small scale embedded devices. Based on that, I support the general approach taken by this patch, at least until somebody has shown that a better approach is feasible. - Felix