Zoltan Kiss
2013-Oct-30 00:50 UTC
[PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
A long known problem of the upstream netback implementation that on the TX path (from guest to Dom0) it copies the whole packet from guest memory into Dom0. That simply became a bottleneck with 10Gb NICs, and generally it''s a huge perfomance penalty. The classic kernel version of netback used grant mapping, and to get notified when the page can be unmapped, it used page destructors. Unfortunately that destructor is not an upstreamable solution. Ian Campbell''s skb fragment destructor patch series (http://lwn.net/Articles/491522/) tried to solve this problem, however it seems to be very invasive on the network stack''s code, and therefore haven''t progressed very well. This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to know when the skb is freed up. That is the way KVM solved the same problem, and based on my initial tests it can do the same for us. Avoiding the extra copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node, running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb switch) Based on my investigations the packet get only copied if it is delivered to Dom0 stack, which is due to this patch: https://lkml.org/lkml/2012/7/20/363 That''s a bit unfortunate, but as far as I know for the huge majority this use case is not too important. There are a couple of things which need more polishing, see the FIXME comments. I will run some more extensive tests, but in the meantime I would like to hear comments about what I''ve done so far. I''ve tried to broke it down to smaller patches, with mixed results, so I welcome suggestions on that part as well: 1/5: Introduce TX grant map definitions 2/5: Change TX path from grant copy to mapping 3/5: Remove old TX grant copy definitons 4/5: Fix indentations 5/5: Change RX path for mapped SKB fragments Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Konrad Rzeszutek Wilk
2013-Oct-30 19:16 UTC
Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
On Wed, Oct 30, 2013 at 12:50:15AM +0000, Zoltan Kiss wrote:> A long known problem of the upstream netback implementation that on the TX > path (from guest to Dom0) it copies the whole packet from guest memory into > Dom0. That simply became a bottleneck with 10Gb NICs, and generally it''s a > huge perfomance penalty. The classic kernel version of netback used grant > mapping, and to get notified when the page can be unmapped, it used page > destructors. Unfortunately that destructor is not an upstreamable solution. > Ian Campbell''s skb fragment destructor patch series > (http://lwn.net/Articles/491522/) tried to solve this problem, however it > seems to be very invasive on the network stack''s code, and therefore haven''t > progressed very well. > This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to > know when the skb is freed up. That is the way KVM solved the same problem, > and based on my initial tests it can do the same for us. Avoiding the extra > copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower > Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node, > running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb > switch) > Based on my investigations the packet get only copied if it is delivered to > Dom0 stack, which is due to this patch: > https://lkml.org/lkml/2012/7/20/363 > That''s a bit unfortunate, but as far as I know for the huge majority this use > case is not too important. There are a couple of things which need more > polishing, see the FIXME comments. I will run some more extensive tests, but > in the meantime I would like to hear comments about what I''ve done so far. > I''ve tried to broke it down to smaller patches, with mixed results, so I > welcome suggestions on that part as well: > 1/5: Introduce TX grant map definitions > 2/5: Change TX path from grant copy to mapping > 3/5: Remove old TX grant copy definitons > 4/5: Fix indentations > 5/5: Change RX path for mapped SKB fragmentsOdd. I don''t see #5 patch patch?> > Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2013-Oct-30 19:17 UTC
Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
On Wed, Oct 30, 2013 at 03:16:17PM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Oct 30, 2013 at 12:50:15AM +0000, Zoltan Kiss wrote: > > A long known problem of the upstream netback implementation that on the TX > > path (from guest to Dom0) it copies the whole packet from guest memory into > > Dom0. That simply became a bottleneck with 10Gb NICs, and generally it''s a > > huge perfomance penalty. The classic kernel version of netback used grant > > mapping, and to get notified when the page can be unmapped, it used page > > destructors. Unfortunately that destructor is not an upstreamable solution. > > Ian Campbell''s skb fragment destructor patch series > > (http://lwn.net/Articles/491522/) tried to solve this problem, however it > > seems to be very invasive on the network stack''s code, and therefore haven''t > > progressed very well. > > This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to > > know when the skb is freed up. That is the way KVM solved the same problem, > > and based on my initial tests it can do the same for us. Avoiding the extra > > copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower > > Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node, > > running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb > > switch) > > Based on my investigations the packet get only copied if it is delivered to > > Dom0 stack, which is due to this patch: > > https://lkml.org/lkml/2012/7/20/363 > > That''s a bit unfortunate, but as far as I know for the huge majority this use > > case is not too important. There are a couple of things which need more > > polishing, see the FIXME comments. I will run some more extensive tests, but > > in the meantime I would like to hear comments about what I''ve done so far. > > I''ve tried to broke it down to smaller patches, with mixed results, so I > > welcome suggestions on that part as well: > > 1/5: Introduce TX grant map definitions > > 2/5: Change TX path from grant copy to mapping > > 3/5: Remove old TX grant copy definitons > > 4/5: Fix indentations > > 5/5: Change RX path for mapped SKB fragments > > Odd. I don''t see #5 patch patch?Ah, you have two #4 patches: [PATCH net-next RFC 4/5] xen-netback: Change RX path for mapped SKB fragments [PATCH net-next RFC 4/5] xen-netback: Fix indentations !> > > > Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel
Zoltan Kiss
2013-Oct-30 21:14 UTC
Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
On 30/10/13 19:17, Konrad Rzeszutek Wilk wrote:> On Wed, Oct 30, 2013 at 03:16:17PM -0400, Konrad Rzeszutek Wilk wrote: >> Odd. I don''t see #5 patch patch? > > Ah, you have two #4 patches: > > [PATCH net-next RFC 4/5] xen-netback: Change RX path for mapped SKB fragments > [PATCH net-next RFC 4/5] xen-netback: Fix indentationsYep, sorry, I will fix it up in the next version! Zoli
Ian Campbell
2013-Nov-01 10:50 UTC
Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
On Wed, 2013-10-30 at 00:50 +0000, Zoltan Kiss wrote:> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to > know when the skb is freed up.Does this always avoid copying when bridging/openvswitching/forwarding (e.g. masquerading etc)? For both domU->domU and domU->physical NIC? How does it deal with broadcast traffic?> That is the way KVM solved the same problem, > and based on my initial tests it can do the same for us. Avoiding the extra > copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower > Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node, > running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb > switch)Do you have any numbers for the dom0 cpu usage impact? Aggregate throughput for many guests would be a useful datapoint too.> Based on my investigations the packet get only copied if it is delivered to > Dom0 stack, which is due to this patch: > https://lkml.org/lkml/2012/7/20/363 > That''s a bit unfortunate, but as far as I know for the huge majority this use > case is not too important.Likely to be true, but it would still be interesting to know how badly this use case suffers with this change, and any increase in CPU usage would be interesting to know about as well.> There are a couple of things which need more > polishing, see the FIXME comments. I will run some more extensive tests, but > in the meantime I would like to hear comments about what I''ve done so far. > I''ve tried to broke it down to smaller patches, with mixed results, so I > welcome suggestions on that part as well: > 1/5: Introduce TX grant map definitions > 2/5: Change TX path from grant copy to mapping > 3/5: Remove old TX grant copy definitons > 4/5: Fix indentations > 5/5: Change RX path for mapped SKB fragments > > Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> >
Zoltan Kiss
2013-Nov-01 19:00 UTC
Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
On 01/11/13 10:50, Ian Campbell wrote:> Does this always avoid copying when bridging/openvswitching/forwarding > (e.g. masquerading etc)? For both domU->domU and domU->physical NIC?I''ve tested the domU->domU, domU->physical with bridge and openvswitch usecase, and now I''ve created a new stat counter to see how often copy happens (the callback''s second parameter tells you whether the skb was freed or copied). It doesn''t do copy in all of these scenarios. What do you mean by forwarding? The scenario when you use bridge and iptables mangling with the packet, not just filtering?> How does it deal with broadcast traffic?Most of the real broadcast traffic actually small packets fit in the PKT_PROT_LEN sized linear space, so it doesn''t make any difference, apart from doing a mapping before copy. But that will be eliminated later on, I plan to add an incremental improvement to grant copy the linear part. I haven''t spent too much time on that, but I couldn''t find any broadcast protocol which use large enough packets and easy to test, so I''m open to ideas. What I already know, skb_clone trigger a copy, and if the caller use the original skb for every cloning, it will do several copy. I think that could be fixed by using the first clone to do any further clones.> Do you have any numbers for the dom0 cpu usage impact?DomU->NIC: the vif took 40% according to top, I guess the bottleneck there is the TLB flushing. DomU->DomU: the vif of the RX side cause the bottleneck due to grant copy to the guest> Aggregate throughput for many guests would be a useful datapoint too.I will do measurements about that. >> Based on my investigations the packet get only copied if it is delivered to >>Dom0 stack, which is due to this patch: >>https://lkml.org/lkml/2012/7/20/363 >>That''s a bit unfortunate, but as far as I know for the huge majority this use >>case is not too important.> Likely to be true, but it would still be interesting to know how badly > this use case suffers with this change, and any increase in CPU usage > would be interesting to know about as well.I can''t find my numbers, but as far as I remember it wasn''t significantly worse than grant copy. I will check that again. Zoli
Zoltan Kiss
2013-Nov-05 17:01 UTC
Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
On 01/11/13 19:00, Zoltan Kiss wrote:> >> Based on my investigations the packet get only copied if it is > delivered to > >>Dom0 stack, which is due to this patch: > >>https://lkml.org/lkml/2012/7/20/363 > >>That''s a bit unfortunate, but as far as I know for the huge majority > this use > >>case is not too important. >> Likely to be true, but it would still be interesting to know how badly >> this use case suffers with this change, and any increase in CPU usage >> would be interesting to know about as well. > I can''t find my numbers, but as far as I remember it wasn''t > significantly worse than grant copy. I will check that again.I''ve measured it now: with my patch it was 5.2 Gbps, without it 5.4. Both cases iperf in Dom0 maxed out its CPU, mostly in soft interrupt context, based on top. Zoli
Ian Campbell
2013-Nov-07 10:52 UTC
Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
On Fri, 2013-11-01 at 19:00 +0000, Zoltan Kiss wrote:> On 01/11/13 10:50, Ian Campbell wrote: > > Does this always avoid copying when bridging/openvswitching/forwarding > > (e.g. masquerading etc)? For both domU->domU and domU->physical NIC? > I''ve tested the domU->domU, domU->physical with bridge and openvswitch > usecase, and now I''ve created a new stat counter to see how often copy > happens (the callback''s second parameter tells you whether the skb was > freed or copied). It doesn''t do copy in all of these scenarios. > What do you mean by forwarding? The scenario when you use bridge and > iptables mangling with the packet, not just filtering?I mean using L3 routing rather L2 bridging. Which might involve NAT/MASQUERADE or might just be normal IP routing.> > How does it deal with broadcast traffic? > Most of the real broadcast traffic actually small packets fit in the > PKT_PROT_LEN sized linear space, so it doesn''t make any difference, > apart from doing a mapping before copy. But that will be eliminated > later on, I plan to add an incremental improvement to grant copy the > linear part.OK. If I were a malicious guest and decided to start sending out loads of huge broadcasts would that lead to a massive spike of activity in dom0?> I haven''t spent too much time on that, but I couldn''t find any broadcast > protocol which use large enough packets and easy to test, so I''m open to > ideas.I guess you could hack something up using raw sockets?> What I already know, skb_clone trigger a copy, and if the caller use the > original skb for every cloning, it will do several copy. I think that > could be fixed by using the first clone to do any further clones.Yes. I suppose doing this automatically might be an interesting extension to SKBTX_DEV_ZEROCOPY? Ian.
Zoltan Kiss
2013-Nov-28 17:37 UTC
Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
On 07/11/13 10:52, Ian Campbell wrote:> On Fri, 2013-11-01 at 19:00 +0000, Zoltan Kiss wrote: >> On 01/11/13 10:50, Ian Campbell wrote: >>> Does this always avoid copying when bridging/openvswitching/forwarding >>> (e.g. masquerading etc)? For both domU->domU and domU->physical NIC? >> I''ve tested the domU->domU, domU->physical with bridge and openvswitch >> usecase, and now I''ve created a new stat counter to see how often copy >> happens (the callback''s second parameter tells you whether the skb was >> freed or copied). It doesn''t do copy in all of these scenarios. >> What do you mean by forwarding? The scenario when you use bridge and >> iptables mangling with the packet, not just filtering? > > I mean using L3 routing rather L2 bridging. Which might involve > NAT/MASQUERADE or might just be normal IP routing.I still couldn''t find time to try out this scenario, but I think in this case packet goes through deliver_skb, which means it will get copied. So performance would be a bit worse due to the extra map/unmap. And I''m afraid we can''t help that too much due to this: https://lkml.org/lkml/2012/7/20/363 However I think using Dom0 as a router/firewall is already a suboptimal solution, so maybe a small performance regression is acceptable? Anyway, I will try this out, and see if it really copies everything, and get some numbers as well.>>> How does it deal with broadcast traffic?Now I had time to check it: broadcast packets get copied only once, when cloning happens. It will swap out the frags with local ones, so any subsequent cloning will have a local SKB. Zoli
Ian Campbell
2013-Nov-28 17:43 UTC
Re: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy
On Thu, 2013-11-28 at 17:37 +0000, Zoltan Kiss wrote:> On 07/11/13 10:52, Ian Campbell wrote: > > On Fri, 2013-11-01 at 19:00 +0000, Zoltan Kiss wrote: > >> On 01/11/13 10:50, Ian Campbell wrote: > >>> Does this always avoid copying when bridging/openvswitching/forwarding > >>> (e.g. masquerading etc)? For both domU->domU and domU->physical NIC? > >> I''ve tested the domU->domU, domU->physical with bridge and openvswitch > >> usecase, and now I''ve created a new stat counter to see how often copy > >> happens (the callback''s second parameter tells you whether the skb was > >> freed or copied). It doesn''t do copy in all of these scenarios. > >> What do you mean by forwarding? The scenario when you use bridge and > >> iptables mangling with the packet, not just filtering? > > > > I mean using L3 routing rather L2 bridging. Which might involve > > NAT/MASQUERADE or might just be normal IP routing. > I still couldn''t find time to try out this scenario, but I think in this > case packet goes through deliver_skb, which means it will get copied. So > performance would be a bit worse due to the extra map/unmap. And I''m > afraid we can''t help that too much due to this: > https://lkml.org/lkml/2012/7/20/363 > However I think using Dom0 as a router/firewall is already a suboptimal > solution, so maybe a small performance regression is acceptable?Routing/firewalling domUs is as valid as bridging. There is nothing in the slightest bit suboptimal about it. If this use case regresses with this approach then I''m afraid that either needs to be addressed or a different approach considered.> Anyway, I will try this out, and see if it really copies everything, and > get some numbers as well.Thanks.> >>> How does it deal with broadcast traffic? > Now I had time to check it: broadcast packets get copied only once, when > cloning happens. It will swap out the frags with local ones, so any > subsequent cloning will have a local SKB.That''s good. Ian.