thr3ads.net - Xen devel - [PATCH] net: allow configuration of the size of page in __netdev_alloc

If this information is useful, please help other people find it:
Share via:

Ian Campbell

2012-Oct-24 11:42 UTC

[PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

The commit 69b08f62e174 "net: use bigger pages in __netdev_alloc_frag"
lead to 70%+ packet loss under Xen when transmitting from physical (as
opposed to virtual) network devices.

This is because under Xen pages which are contiguous in the physical
address space may not be contiguous in the DMA space, in fact it is
very likely that they are not. I think there are other architectures
where this is true, although perhaps non quite so aggressive as to
have this property at a per-order-0-page granularity.

The real underlying bug here most likely lies in the swiotlb not
correctly handling compound pages, and Konrad is investigating this.
However even with the swiotlb issue fixed the current arrangement
seems likely to result in a lot of bounce buffering which seems likely
to more than offset any benefit from the use of larger pages.

Therefore make NETDEV_FRAG_PAGE_MAX_ORDER configurable at runtime and
use this to request order-0 frags under Xen. Also expose this setting
via sysctl.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: netdev@vger.kernel.org
Cc: xen-devel@lists.xen.org
---
 arch/x86/xen/setup.c       |    7 +++++++
 include/linux/skbuff.h     |    2 ++
 net/core/skbuff.c          |    7 ++++---
 net/core/sysctl_net_core.c |    7 +++++++
 4 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 8971a26..ad14d46 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -11,6 +11,7 @@
 #include <linux/memblock.h>
 #include <linux/cpuidle.h>
 #include <linux/cpufreq.h>
+#include <linux/skbuff.h>
 
 #include <asm/elf.h>
 #include <asm/vdso.h>
@@ -555,6 +556,12 @@ void __init xen_arch_setup(void)
 	       MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ?
 	       COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE);
 
+	/*
+	 * Xen cannot handle DMA to/from compound pages so avoid
+	 * bounce buffering by not allocating large network frags.
+	 */
+	netdev_frag_page_max_order = 0;
+
 	/* Set up idle, making sure it calls safe_halt() pvop */
 #ifdef CONFIG_X86_32
 	boot_cpu_data.hlt_works_ok = 1;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6a2c34e..a3a748f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1719,6 +1719,8 @@ static inline void __skb_queue_purge(struct sk_buff_head
*list)
 		kfree_skb(skb);
 }
 
+extern int netdev_frag_page_max_order;
+
 extern void *netdev_alloc_frag(unsigned int fragsz);
 
 extern struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6e04b1f..88cbe5f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -348,8 +348,9 @@ struct netdev_alloc_cache {
 };
 static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);
 
-#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(32768)
-#define NETDEV_FRAG_PAGE_MAX_SIZE  (PAGE_SIZE <<
NETDEV_FRAG_PAGE_MAX_ORDER)
+int netdev_frag_page_max_order __read_mostly = get_order(32768);
+
+#define NETDEV_FRAG_PAGE_MAX_SIZE  (PAGE_SIZE <<
netdev_frag_page_max_order)
 #define NETDEV_PAGECNT_MAX_BIAS	   NETDEV_FRAG_PAGE_MAX_SIZE
 
 static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
@@ -363,7 +364,7 @@ static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t
gfp_mask)
 	nc = &__get_cpu_var(netdev_alloc_cache);
 	if (unlikely(!nc->frag.page)) {
 refill:
-		for (order = NETDEV_FRAG_PAGE_MAX_ORDER; ;) {
+		for (order = netdev_frag_page_max_order; ;) {
 			gfp_t gfp = gfp_mask;
 
 			if (order)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index a7c3684..e5ab6df 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -129,6 +129,13 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "netdev_frag_page_max_order",
+		.data		= &netdev_frag_page_max_order,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 #ifdef CONFIG_BPF_JIT
 	{
 		.procname	= "bpf_jit_enable",
-- 
1.7.2.5

Eric Dumazet

2012-Oct-24 12:28 UTC

head link

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

On Wed, 2012-10-24 at 12:42 +0100, Ian Campbell wrote:> The commit 69b08f62e174 "net: use bigger pages in
__netdev_alloc_frag"
> lead to 70%+ packet loss under Xen when transmitting from physical (as
> opposed to virtual) network devices.
> 
> This is because under Xen pages which are contiguous in the physical
> address space may not be contiguous in the DMA space, in fact it is
> very likely that they are not. I think there are other architectures
> where this is true, although perhaps non quite so aggressive as to
> have this property at a per-order-0-page granularity.
> 
> The real underlying bug here most likely lies in the swiotlb not
> correctly handling compound pages, and Konrad is investigating this.
> However even with the swiotlb issue fixed the current arrangement
> seems likely to result in a lot of bounce buffering which seems likely
> to more than offset any benefit from the use of larger pages.
> 
> Therefore make NETDEV_FRAG_PAGE_MAX_ORDER configurable at runtime and
> use this to request order-0 frags under Xen. Also expose this setting
> via sysctl.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: netdev@vger.kernel.org
> Cc: xen-devel@lists.xen.org
> ---
I understand your concern, but this seems a quick/dirty hack at this
moment. After setting the sysctl to 0, some tasks may still have some
order-3 pages in their cache.

Your driver must already cope with skb->head being split on several
pages.

So what fundamental difference exists with frags ?

Ian Campbell

2012-Oct-24 13:16 UTC

head link

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

On Wed, 2012-10-24 at 13:28 +0100, Eric Dumazet wrote:> On Wed, 2012-10-24 at 12:42 +0100, Ian Campbell wrote:
> > The commit 69b08f62e174 "net: use bigger pages in
__netdev_alloc_frag"
> > lead to 70%+ packet loss under Xen when transmitting from physical (as
> > opposed to virtual) network devices.
> > 
> > This is because under Xen pages which are contiguous in the physical
> > address space may not be contiguous in the DMA space, in fact it is
> > very likely that they are not. I think there are other architectures
> > where this is true, although perhaps non quite so aggressive as to
> > have this property at a per-order-0-page granularity.
> > 
> > The real underlying bug here most likely lies in the swiotlb not
> > correctly handling compound pages, and Konrad is investigating this.
> > However even with the swiotlb issue fixed the current arrangement
> > seems likely to result in a lot of bounce buffering which seems likely
> > to more than offset any benefit from the use of larger pages.
> > 
> > Therefore make NETDEV_FRAG_PAGE_MAX_ORDER configurable at runtime and
> > use this to request order-0 frags under Xen. Also expose this setting
> > via sysctl.
> > 
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: netdev@vger.kernel.org
> > Cc: xen-devel@lists.xen.org
> > ---
> 
> I understand your concern, but this seems a quick/dirty hack at this
> moment. After setting the sysctl to 0, some tasks may still have some
> order-3 pages in their cache.
Right, the sysctl thing might be overkill, I just figured it was useful
for debugging. When booting in a Xen VM the patch sets it to zero very
early on, during setup_arch(), which is before any tasks even exist.
> Your driver must already cope with skb->head being split on several
> pages.
> 
> So what fundamental difference exists with frags ?
The issue here is with drivers for physical network devices when running
under Xen not with the Xen paravirtualised network drivers (AKA
netback/netfront).

The problem is that pages which are contiguous in the physical address
space may not be contiguous in the DMA address space. With order>0 pages
this becomes a problem when you poke down the DMA address and length of
a compound page into the hardware registers. The DMA address will be
right for the head of the page but once the hardware steps off the end
of that it''ll get the wrong page.

I don''t think this non-contiguousness between physical and DMA
addresses
is specific to Xen, although it is more frequent under Xen than any real
hardware platform. (Xen has often been a good canary for these sorts of
issues which turn out later on to impact other arches too.)

In theory this could be fixed in all the drivers for physical network
devices, but that would be a lot of effort (and probably a fair bit of
ugliness in the drivers) for a gain which was only relevant to Xen. 

Ian.

Eric Dumazet

2012-Oct-24 13:30 UTC

head link

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

On Wed, 2012-10-24 at 14:16 +0100, Ian Campbell wrote:> On Wed, 2012-10-24 at 13:28 +0100, Eric Dumazet wrote:
> > On Wed, 2012-10-24 at 12:42 +0100, Ian Campbell wrote:
> > > The commit 69b08f62e174 "net: use bigger pages in
__netdev_alloc_frag"
> > > lead to 70%+ packet loss under Xen when transmitting from
physical (as
> > > opposed to virtual) network devices.
> > > 
> > > This is because under Xen pages which are contiguous in the
physical
> > > address space may not be contiguous in the DMA space, in fact it
is
> > > very likely that they are not. I think there are other
architectures
> > > where this is true, although perhaps non quite so aggressive as
to
> > > have this property at a per-order-0-page granularity.
> > > 
> > > The real underlying bug here most likely lies in the swiotlb not
> > > correctly handling compound pages, and Konrad is investigating
this.
> > > However even with the swiotlb issue fixed the current arrangement
> > > seems likely to result in a lot of bounce buffering which seems
likely
> > > to more than offset any benefit from the use of larger pages.
> > > 
> > > Therefore make NETDEV_FRAG_PAGE_MAX_ORDER configurable at runtime
and
> > > use this to request order-0 frags under Xen. Also expose this
setting
> > > via sysctl.
> > > 
> > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > > Cc: Eric Dumazet <edumazet@google.com>
> > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > Cc: netdev@vger.kernel.org
> > > Cc: xen-devel@lists.xen.org
> > > ---
> > 
> > I understand your concern, but this seems a quick/dirty hack at this
> > moment. After setting the sysctl to 0, some tasks may still have some
> > order-3 pages in their cache.
> 
> Right, the sysctl thing might be overkill, I just figured it was useful
> for debugging. When booting in a Xen VM the patch sets it to zero very
> early on, during setup_arch(), which is before any tasks even exist.
> 
> > Your driver must already cope with skb->head being split on several
> > pages.
> > 
> > So what fundamental difference exists with frags ?
> 
> The issue here is with drivers for physical network devices when running
> under Xen not with the Xen paravirtualised network drivers (AKA
> netback/netfront).
> 
> The problem is that pages which are contiguous in the physical address
> space may not be contiguous in the DMA address space. With order>0 pages
> this becomes a problem when you poke down the DMA address and length of
> a compound page into the hardware registers. The DMA address will be
> right for the head of the page but once the hardware steps off the end
> of that it''ll get the wrong page.
> 
> I don''t think this non-contiguousness between physical and DMA
addresses
> is specific to Xen, although it is more frequent under Xen than any real
> hardware platform. (Xen has often been a good canary for these sorts of
> issues which turn out later on to impact other arches too.)
> 
> In theory this could be fixed in all the drivers for physical network
> devices, but that would be a lot of effort (and probably a fair bit of
> ugliness in the drivers) for a gain which was only relevant to Xen. 
I still have concerns about skb->head that you dint really answered.

Why skb->head can be on order-1 or order-2 pages and this is working ?

It seems to me its a driver issue, for example
drivers/net/xen-netfront.c has assumptions that can be easily fixed.

Ian Campbell

2012-Oct-24 14:02 UTC

head link

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

On Wed, 2012-10-24 at 14:30 +0100, Eric Dumazet wrote:> It seems to me its a driver issue, for example
> drivers/net/xen-netfront.c has assumptions that can be easily fixed.
The netfront ->head thing is a separate (although perhaps related)
issue, I intended to fix along the same lines as the previous netback
except for some unfathomable reason I haven''t been able to reproduce
the
problem with netfront -- I''ve no idea why though since it seems like it
should be a no brainer!
> Why skb->head can be on order-1 or order-2 pages and this is working ?
skb->head being order 1 or 2 isn''t working for me. The driver
I''m having
issues with which caused me to create this particular patch is the tg3
driver (although I don''t think this is by any means specific to tg3).

For the ->head the tg3 driver does:
        mapping = pci_map_single(tp->pdev, skb->data, len,
PCI_DMA_TODEVICE);
while for the frags it does:
        mapping = skb_frag_dma_map(&tp->pdev->dev, frag, 0, len,
DMA_TO_DEVICE);

This ought to do the Right Thing but doesn''t seem to be working. Konrad
suspected an issue with the swiotlb''s handling of order>0 pages in
some
cases. As I said in the commit message he is looking into this issue.

My concern however was that even once the swiotlb is fixed to work right
the effect of pci_map_single on a order>0 page is going to be that the
data gets bounced into contiguous memory -- that is a memcpy which would
undo the benefit of having allocating large pages to start with. So I
figured that in such cases we''d be better off just using order 0
allocations to start with.

Ian.

Eric Dumazet

2012-Oct-24 15:21 UTC

head link

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

On Wed, 2012-10-24 at 15:02 +0100, Ian Campbell wrote:> On Wed, 2012-10-24 at 14:30 +0100, Eric Dumazet wrote:
> > It seems to me its a driver issue, for example
> > drivers/net/xen-netfront.c has assumptions that can be easily fixed.
> 
> The netfront ->head thing is a separate (although perhaps related)
> issue, I intended to fix along the same lines as the previous netback
> except for some unfathomable reason I haven''t been able to
reproduce the
> problem with netfront -- I''ve no idea why though since it seems
like it
> should be a no brainer!
> 
> > Why skb->head can be on order-1 or order-2 pages and this is
working ?
> 
> skb->head being order 1 or 2 isn''t working for me. The driver
I''m having
> issues with which caused me to create this particular patch is the tg3
> driver (although I don''t think this is by any means specific to
tg3).
> 
> For the ->head the tg3 driver does:
>         mapping = pci_map_single(tp->pdev, skb->data, len,
PCI_DMA_TODEVICE);
> while for the frags it does:
>         mapping = skb_frag_dma_map(&tp->pdev->dev, frag, 0, len,
DMA_TO_DEVICE);
> 
> This ought to do the Right Thing but doesn''t seem to be working.
Konrad
> suspected an issue with the swiotlb''s handling of order>0 pages
in some
> cases. As I said in the commit message he is looking into this issue.
> 
> My concern however was that even once the swiotlb is fixed to work right
> the effect of pci_map_single on a order>0 page is going to be that the
> data gets bounced into contiguous memory -- that is a memcpy which would
> undo the benefit of having allocating large pages to start with. So I
> figured that in such cases we''d be better off just using order 0
> allocations to start with.
I am really confused.

If you really have such problems, why locally generated TCP traffic
doesnt also have it ?

Your patch doesnt touch sk_page_frag_refill(), does it ?

Ian Campbell

2012-Oct-24 16:22 UTC

head link

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

On Wed, 2012-10-24 at 16:21 +0100, Eric Dumazet wrote:> On Wed, 2012-10-24 at 15:02 +0100, Ian Campbell wrote:
> > On Wed, 2012-10-24 at 14:30 +0100, Eric Dumazet wrote:
> > > It seems to me its a driver issue, for example
> > > drivers/net/xen-netfront.c has assumptions that can be easily
fixed.
> > 
> > The netfront ->head thing is a separate (although perhaps related)
> > issue, I intended to fix along the same lines as the previous netback
> > except for some unfathomable reason I haven''t been able to
reproduce the
> > problem with netfront -- I''ve no idea why though since it
seems like it
> > should be a no brainer!
> > 
> > > Why skb->head can be on order-1 or order-2 pages and this is
working ?
> > 
> > skb->head being order 1 or 2 isn''t working for me. The
driver I''m having
> > issues with which caused me to create this particular patch is the tg3
> > driver (although I don''t think this is by any means specific
to tg3).
> > 
> > For the ->head the tg3 driver does:
> >         mapping = pci_map_single(tp->pdev, skb->data, len,
PCI_DMA_TODEVICE);
> > while for the frags it does:
> >         mapping = skb_frag_dma_map(&tp->pdev->dev, frag, 0,
len, DMA_TO_DEVICE);
> > 
> > This ought to do the Right Thing but doesn''t seem to be
working. Konrad
> > suspected an issue with the swiotlb''s handling of order>0
pages in some
> > cases. As I said in the commit message he is looking into this issue.
> > 
> > My concern however was that even once the swiotlb is fixed to work
right
> > the effect of pci_map_single on a order>0 page is going to be that
the
> > data gets bounced into contiguous memory -- that is a memcpy which
would
> > undo the benefit of having allocating large pages to start with. So I
> > figured that in such cases we''d be better off just using
order 0
> > allocations to start with.
> 
> I am really confused.
> 
> If you really have such problems, why locally generated TCP traffic
> doesnt also have it ?
I think it does. The reason I noticed the original problem was that ssh
to the machine was virtually (no pun intended) unusable.
> Your patch doesnt touch sk_page_frag_refill(), does it ?
That''s right. It doesn''t. When is (sk->sk_allocation &
__GFP_WAIT) true?
Is it possible I''m just not hitting that case?

Is it possible that this only affects certain traffic patterns (I only
really tried ssh/scp and ping)? Or perhaps its just that the swiotlb is
only broken in one corner case and not the other.

Ian.

Eric Dumazet

2012-Oct-24 16:43 UTC

head link

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

On Wed, 2012-10-24 at 17:22 +0100, Ian Campbell wrote:> On Wed, 2012-10-24 at 16:21 +0100, Eric Dumazet wrote:
> > If you really have such problems, why locally generated TCP traffic
> > doesnt also have it ?
> 
> I think it does. The reason I noticed the original problem was that ssh
> to the machine was virtually (no pun intended) unusable.
> 
> > Your patch doesnt touch sk_page_frag_refill(), does it ?
> 
> That''s right. It doesn''t. When is (sk->sk_allocation
& __GFP_WAIT) true?
> Is it possible I''m just not hitting that case?
> 
I hope not. GFP_KERNEL has __GFP_WAIT.
> Is it possible that this only affects certain traffic patterns (I only
> really tried ssh/scp and ping)? Or perhaps its just that the swiotlb is
> only broken in one corner case and not the other.
Could you try a netperf -t TCP_STREAM ?

Because ssh use small packets, and small TCP packets dont use frags but
skb->head.

You mentioned a 70% drop of performance, but what test have you used
exactly ?

David Miller

2012-Oct-24 18:19 UTC

head link

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

I''m not applying this.

Fix your drivers and the infrastructure they use, don''t paper
around it.

Konrad Rzeszutek Wilk

2012-Oct-30 16:53 UTC

head link

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

On Wed, Oct 24, 2012 at 06:43:20PM +0200, Eric Dumazet
wrote:> On Wed, 2012-10-24 at 17:22 +0100, Ian Campbell wrote:
> > On Wed, 2012-10-24 at 16:21 +0100, Eric Dumazet wrote:
> 
> > > If you really have such problems, why locally generated TCP
traffic
> > > doesnt also have it ?
> > 
> > I think it does. The reason I noticed the original problem was that
ssh
> > to the machine was virtually (no pun intended) unusable.
> > 
> > > Your patch doesnt touch sk_page_frag_refill(), does it ?
> > 
> > That''s right. It doesn''t. When is
(sk->sk_allocation & __GFP_WAIT) true?
> > Is it possible I''m just not hitting that case?
> > 
> 
> I hope not. GFP_KERNEL has __GFP_WAIT.
> 
> > Is it possible that this only affects certain traffic patterns (I only
> > really tried ssh/scp and ping)? Or perhaps its just that the swiotlb
is
> > only broken in one corner case and not the other.
> 
> Could you try a netperf -t TCP_STREAM ?
For fun I did a couple of tests - I setup two machines (one r8168, the other
e1000e) and tried to do netperf/netserver. Both of them are running a baremetal
kernel and one of them has ''iommu=soft swiotlb=force'' to
simulate the worst
case. This is using v3.7-rc3.

The r8169 is booted without any arguments, the e1000e is using
''iommu=soft
swiotlb=force''.

So r8169 -> e1000e, I get ~940 (this is odd, I expected that the e1000e
on the recv side would be using the bounce buffer, but then I realized it
sets up using pci_alloc_coherent an ''dma'' pool).

The other way - e1000e -> r8169 got me around ~128. So it is the sending
side that ends up using the bounce buffer and it slows down considerably.

I also swapped the machine that had e1000e with a tg3 - and got around
the same numbers.

So all of this points to the swiotlb and to just make sure that nothing
was amiss I wrote a little driver that would allocate a compound page,
setup DMA mapping, do some writes, sync and unmap the DMA page. And it works
correctly - so swiotlb (and the xen variant) work right just right.
Attached for your fun.

Then I decided to try v3.6.3, with the same exact parameters.. and
the problem went away.

The e1000e -> r8169 which got me around ~128, now gets ~940! Still
using the swiotlb bounce buffer.

> 
> Because ssh use small packets, and small TCP packets dont use frags but
> skb->head.
> 
> You mentioned a 70% drop of performance, but what test have you used
> exactly ?
Note, I did not provide any arguments to netperf, but it did pick the
test you wanted:
> netperf -H tst019TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to tst019.dumpdata.com
(192.168.101.39) port 0 AF_INET
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2012-Oct-30 17:23 UTC

head link

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

On Tue, Oct 30, 2012 at 12:53:09PM -0400, Konrad Rzeszutek Wilk
wrote:> On Wed, Oct 24, 2012 at 06:43:20PM +0200, Eric Dumazet wrote:
> > On Wed, 2012-10-24 at 17:22 +0100, Ian Campbell wrote:
> > > On Wed, 2012-10-24 at 16:21 +0100, Eric Dumazet wrote:
> > 
> > > > If you really have such problems, why locally generated TCP
traffic
> > > > doesnt also have it ?
> > > 
> > > I think it does. The reason I noticed the original problem was
that ssh
> > > to the machine was virtually (no pun intended) unusable.
> > > 
> > > > Your patch doesnt touch sk_page_frag_refill(), does it ?
> > > 
> > > That''s right. It doesn''t. When is
(sk->sk_allocation & __GFP_WAIT) true?
> > > Is it possible I''m just not hitting that case?
> > > 
> > 
> > I hope not. GFP_KERNEL has __GFP_WAIT.
> > 
> > > Is it possible that this only affects certain traffic patterns (I
only
> > > really tried ssh/scp and ping)? Or perhaps its just that the
swiotlb is
> > > only broken in one corner case and not the other.
> > 
> > Could you try a netperf -t TCP_STREAM ?
> 
> For fun I did a couple of tests - I setup two machines (one r8168, the
other
> e1000e) and tried to do netperf/netserver. Both of them are running a
baremetal
> kernel and one of them has ''iommu=soft swiotlb=force'' to
simulate the worst
> case. This is using v3.7-rc3.
I also did a test with the patch at the top, with the same setup and ... it
does look like it fixes some issues, but not the underlaying one.

The same test, with net.core.netdev_frag_page_max_order=0, the e1000e->r8169
gets ~124, but then on subsequent runs it picks up to ~933. If I let the
machine stay a bit idle and then do this again, it does around ~124 again.

Thoughts?

Konrad Rzeszutek Wilk

2012-Oct-31 11:01 UTC

head link

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

On Tue, Oct 30, 2012 at 01:23:52PM -0400, Konrad Rzeszutek Wilk
wrote:> On Tue, Oct 30, 2012 at 12:53:09PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Oct 24, 2012 at 06:43:20PM +0200, Eric Dumazet wrote:
> > > On Wed, 2012-10-24 at 17:22 +0100, Ian Campbell wrote:
> > > > On Wed, 2012-10-24 at 16:21 +0100, Eric Dumazet wrote:
> > > 
> > > > > If you really have such problems, why locally generated
TCP traffic
> > > > > doesnt also have it ?
> > > > 
> > > > I think it does. The reason I noticed the original problem
was that ssh
> > > > to the machine was virtually (no pun intended) unusable.
> > > > 
> > > > > Your patch doesnt touch sk_page_frag_refill(), does it
?
> > > > 
> > > > That''s right. It doesn''t. When is
(sk->sk_allocation & __GFP_WAIT) true?
> > > > Is it possible I''m just not hitting that case?
> > > > 
> > > 
> > > I hope not. GFP_KERNEL has __GFP_WAIT.
> > > 
> > > > Is it possible that this only affects certain traffic
patterns (I only
> > > > really tried ssh/scp and ping)? Or perhaps its just that the
swiotlb is
> > > > only broken in one corner case and not the other.
> > > 
> > > Could you try a netperf -t TCP_STREAM ?
> > 
> > For fun I did a couple of tests - I setup two machines (one r8168, the
other
> > e1000e) and tried to do netperf/netserver. Both of them are running a
baremetal
> > kernel and one of them has ''iommu=soft
swiotlb=force'' to simulate the worst
> > case. This is using v3.7-rc3.
> 
> I also did a test with the patch at the top, with the same setup and ... it
> does look like it fixes some issues, but not the underlaying one.
> 
> The same test, with net.core.netdev_frag_page_max_order=0, the
e1000e->r8169
> gets ~124, but then on subsequent runs it picks up to ~933. If I let the
> machine stay a bit idle and then do this again, it does around ~124 again.
> 
> Thoughts?
Argh. Please disregard this test. I added an extra patch in the kernel
tree to track the SWIOTLB bounce usage and print it.. and the printk was
going out to the FB, which was not too fast - so the whole test was being
slowed down by FB drivers :-)

Will re-run this test without the offending patch.

Eric Dumazet

2012-Oct-31 11:19 UTC

head link

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

On Wed, 2012-10-31 at 07:01 -0400, Konrad Rzeszutek Wilk wrote:
> Argh. Please disregard this test. I added an extra patch in the kernel
> tree to track the SWIOTLB bounce usage and print it.. and the printk was
> going out to the FB, which was not too fast - so the whole test was being
> slowed down by FB drivers :-)
> 
> Will re-run this test without the offending patch.
Anyway, I must confess I didnt understand what you did ;)

Xen devel - Oct 2012 - [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

[PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag

Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag