Ian Campbell
2012-Oct-24 11:42 UTC
[PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
The commit 69b08f62e174 "net: use bigger pages in __netdev_alloc_frag" lead to 70%+ packet loss under Xen when transmitting from physical (as opposed to virtual) network devices. This is because under Xen pages which are contiguous in the physical address space may not be contiguous in the DMA space, in fact it is very likely that they are not. I think there are other architectures where this is true, although perhaps non quite so aggressive as to have this property at a per-order-0-page granularity. The real underlying bug here most likely lies in the swiotlb not correctly handling compound pages, and Konrad is investigating this. However even with the swiotlb issue fixed the current arrangement seems likely to result in a lot of bounce buffering which seems likely to more than offset any benefit from the use of larger pages. Therefore make NETDEV_FRAG_PAGE_MAX_ORDER configurable at runtime and use this to request order-0 frags under Xen. Also expose this setting via sysctl. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: netdev@vger.kernel.org Cc: xen-devel@lists.xen.org --- arch/x86/xen/setup.c | 7 +++++++ include/linux/skbuff.h | 2 ++ net/core/skbuff.c | 7 ++++--- net/core/sysctl_net_core.c | 7 +++++++ 4 files changed, 20 insertions(+), 3 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 8971a26..ad14d46 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -11,6 +11,7 @@ #include <linux/memblock.h> #include <linux/cpuidle.h> #include <linux/cpufreq.h> +#include <linux/skbuff.h> #include <asm/elf.h> #include <asm/vdso.h> @@ -555,6 +556,12 @@ void __init xen_arch_setup(void) MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ? COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE); + /* + * Xen cannot handle DMA to/from compound pages so avoid + * bounce buffering by not allocating large network frags. + */ + netdev_frag_page_max_order = 0; + /* Set up idle, making sure it calls safe_halt() pvop */ #ifdef CONFIG_X86_32 boot_cpu_data.hlt_works_ok = 1; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 6a2c34e..a3a748f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1719,6 +1719,8 @@ static inline void __skb_queue_purge(struct sk_buff_head *list) kfree_skb(skb); } +extern int netdev_frag_page_max_order; + extern void *netdev_alloc_frag(unsigned int fragsz); extern struct sk_buff *__netdev_alloc_skb(struct net_device *dev, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 6e04b1f..88cbe5f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -348,8 +348,9 @@ struct netdev_alloc_cache { }; static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache); -#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(32768) -#define NETDEV_FRAG_PAGE_MAX_SIZE (PAGE_SIZE << NETDEV_FRAG_PAGE_MAX_ORDER) +int netdev_frag_page_max_order __read_mostly = get_order(32768); + +#define NETDEV_FRAG_PAGE_MAX_SIZE (PAGE_SIZE << netdev_frag_page_max_order) #define NETDEV_PAGECNT_MAX_BIAS NETDEV_FRAG_PAGE_MAX_SIZE static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask) @@ -363,7 +364,7 @@ static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask) nc = &__get_cpu_var(netdev_alloc_cache); if (unlikely(!nc->frag.page)) { refill: - for (order = NETDEV_FRAG_PAGE_MAX_ORDER; ;) { + for (order = netdev_frag_page_max_order; ;) { gfp_t gfp = gfp_mask; if (order) diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index a7c3684..e5ab6df 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -129,6 +129,13 @@ static struct ctl_table net_core_table[] = { .mode = 0644, .proc_handler = proc_dointvec }, + { + .procname = "netdev_frag_page_max_order", + .data = &netdev_frag_page_max_order, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec + }, #ifdef CONFIG_BPF_JIT { .procname = "bpf_jit_enable", -- 1.7.2.5
Eric Dumazet
2012-Oct-24 12:28 UTC
Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
On Wed, 2012-10-24 at 12:42 +0100, Ian Campbell wrote:> The commit 69b08f62e174 "net: use bigger pages in __netdev_alloc_frag" > lead to 70%+ packet loss under Xen when transmitting from physical (as > opposed to virtual) network devices. > > This is because under Xen pages which are contiguous in the physical > address space may not be contiguous in the DMA space, in fact it is > very likely that they are not. I think there are other architectures > where this is true, although perhaps non quite so aggressive as to > have this property at a per-order-0-page granularity. > > The real underlying bug here most likely lies in the swiotlb not > correctly handling compound pages, and Konrad is investigating this. > However even with the swiotlb issue fixed the current arrangement > seems likely to result in a lot of bounce buffering which seems likely > to more than offset any benefit from the use of larger pages. > > Therefore make NETDEV_FRAG_PAGE_MAX_ORDER configurable at runtime and > use this to request order-0 frags under Xen. Also expose this setting > via sysctl. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > Cc: netdev@vger.kernel.org > Cc: xen-devel@lists.xen.org > ---I understand your concern, but this seems a quick/dirty hack at this moment. After setting the sysctl to 0, some tasks may still have some order-3 pages in their cache. Your driver must already cope with skb->head being split on several pages. So what fundamental difference exists with frags ?
Ian Campbell
2012-Oct-24 13:16 UTC
Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
On Wed, 2012-10-24 at 13:28 +0100, Eric Dumazet wrote:> On Wed, 2012-10-24 at 12:42 +0100, Ian Campbell wrote: > > The commit 69b08f62e174 "net: use bigger pages in __netdev_alloc_frag" > > lead to 70%+ packet loss under Xen when transmitting from physical (as > > opposed to virtual) network devices. > > > > This is because under Xen pages which are contiguous in the physical > > address space may not be contiguous in the DMA space, in fact it is > > very likely that they are not. I think there are other architectures > > where this is true, although perhaps non quite so aggressive as to > > have this property at a per-order-0-page granularity. > > > > The real underlying bug here most likely lies in the swiotlb not > > correctly handling compound pages, and Konrad is investigating this. > > However even with the swiotlb issue fixed the current arrangement > > seems likely to result in a lot of bounce buffering which seems likely > > to more than offset any benefit from the use of larger pages. > > > > Therefore make NETDEV_FRAG_PAGE_MAX_ORDER configurable at runtime and > > use this to request order-0 frags under Xen. Also expose this setting > > via sysctl. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > Cc: Eric Dumazet <edumazet@google.com> > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > Cc: netdev@vger.kernel.org > > Cc: xen-devel@lists.xen.org > > --- > > I understand your concern, but this seems a quick/dirty hack at this > moment. After setting the sysctl to 0, some tasks may still have some > order-3 pages in their cache.Right, the sysctl thing might be overkill, I just figured it was useful for debugging. When booting in a Xen VM the patch sets it to zero very early on, during setup_arch(), which is before any tasks even exist.> Your driver must already cope with skb->head being split on several > pages. > > So what fundamental difference exists with frags ?The issue here is with drivers for physical network devices when running under Xen not with the Xen paravirtualised network drivers (AKA netback/netfront). The problem is that pages which are contiguous in the physical address space may not be contiguous in the DMA address space. With order>0 pages this becomes a problem when you poke down the DMA address and length of a compound page into the hardware registers. The DMA address will be right for the head of the page but once the hardware steps off the end of that it''ll get the wrong page. I don''t think this non-contiguousness between physical and DMA addresses is specific to Xen, although it is more frequent under Xen than any real hardware platform. (Xen has often been a good canary for these sorts of issues which turn out later on to impact other arches too.) In theory this could be fixed in all the drivers for physical network devices, but that would be a lot of effort (and probably a fair bit of ugliness in the drivers) for a gain which was only relevant to Xen. Ian.
Eric Dumazet
2012-Oct-24 13:30 UTC
Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
On Wed, 2012-10-24 at 14:16 +0100, Ian Campbell wrote:> On Wed, 2012-10-24 at 13:28 +0100, Eric Dumazet wrote: > > On Wed, 2012-10-24 at 12:42 +0100, Ian Campbell wrote: > > > The commit 69b08f62e174 "net: use bigger pages in __netdev_alloc_frag" > > > lead to 70%+ packet loss under Xen when transmitting from physical (as > > > opposed to virtual) network devices. > > > > > > This is because under Xen pages which are contiguous in the physical > > > address space may not be contiguous in the DMA space, in fact it is > > > very likely that they are not. I think there are other architectures > > > where this is true, although perhaps non quite so aggressive as to > > > have this property at a per-order-0-page granularity. > > > > > > The real underlying bug here most likely lies in the swiotlb not > > > correctly handling compound pages, and Konrad is investigating this. > > > However even with the swiotlb issue fixed the current arrangement > > > seems likely to result in a lot of bounce buffering which seems likely > > > to more than offset any benefit from the use of larger pages. > > > > > > Therefore make NETDEV_FRAG_PAGE_MAX_ORDER configurable at runtime and > > > use this to request order-0 frags under Xen. Also expose this setting > > > via sysctl. > > > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > > Cc: Eric Dumazet <edumazet@google.com> > > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > > Cc: netdev@vger.kernel.org > > > Cc: xen-devel@lists.xen.org > > > --- > > > > I understand your concern, but this seems a quick/dirty hack at this > > moment. After setting the sysctl to 0, some tasks may still have some > > order-3 pages in their cache. > > Right, the sysctl thing might be overkill, I just figured it was useful > for debugging. When booting in a Xen VM the patch sets it to zero very > early on, during setup_arch(), which is before any tasks even exist. > > > Your driver must already cope with skb->head being split on several > > pages. > > > > So what fundamental difference exists with frags ? > > The issue here is with drivers for physical network devices when running > under Xen not with the Xen paravirtualised network drivers (AKA > netback/netfront). > > The problem is that pages which are contiguous in the physical address > space may not be contiguous in the DMA address space. With order>0 pages > this becomes a problem when you poke down the DMA address and length of > a compound page into the hardware registers. The DMA address will be > right for the head of the page but once the hardware steps off the end > of that it''ll get the wrong page. > > I don''t think this non-contiguousness between physical and DMA addresses > is specific to Xen, although it is more frequent under Xen than any real > hardware platform. (Xen has often been a good canary for these sorts of > issues which turn out later on to impact other arches too.) > > In theory this could be fixed in all the drivers for physical network > devices, but that would be a lot of effort (and probably a fair bit of > ugliness in the drivers) for a gain which was only relevant to Xen.I still have concerns about skb->head that you dint really answered. Why skb->head can be on order-1 or order-2 pages and this is working ? It seems to me its a driver issue, for example drivers/net/xen-netfront.c has assumptions that can be easily fixed.
Ian Campbell
2012-Oct-24 14:02 UTC
Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
On Wed, 2012-10-24 at 14:30 +0100, Eric Dumazet wrote:> It seems to me its a driver issue, for example > drivers/net/xen-netfront.c has assumptions that can be easily fixed.The netfront ->head thing is a separate (although perhaps related) issue, I intended to fix along the same lines as the previous netback except for some unfathomable reason I haven''t been able to reproduce the problem with netfront -- I''ve no idea why though since it seems like it should be a no brainer!> Why skb->head can be on order-1 or order-2 pages and this is working ?skb->head being order 1 or 2 isn''t working for me. The driver I''m having issues with which caused me to create this particular patch is the tg3 driver (although I don''t think this is by any means specific to tg3). For the ->head the tg3 driver does: mapping = pci_map_single(tp->pdev, skb->data, len, PCI_DMA_TODEVICE); while for the frags it does: mapping = skb_frag_dma_map(&tp->pdev->dev, frag, 0, len, DMA_TO_DEVICE); This ought to do the Right Thing but doesn''t seem to be working. Konrad suspected an issue with the swiotlb''s handling of order>0 pages in some cases. As I said in the commit message he is looking into this issue. My concern however was that even once the swiotlb is fixed to work right the effect of pci_map_single on a order>0 page is going to be that the data gets bounced into contiguous memory -- that is a memcpy which would undo the benefit of having allocating large pages to start with. So I figured that in such cases we''d be better off just using order 0 allocations to start with. Ian.
Eric Dumazet
2012-Oct-24 15:21 UTC
Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
On Wed, 2012-10-24 at 15:02 +0100, Ian Campbell wrote:> On Wed, 2012-10-24 at 14:30 +0100, Eric Dumazet wrote: > > It seems to me its a driver issue, for example > > drivers/net/xen-netfront.c has assumptions that can be easily fixed. > > The netfront ->head thing is a separate (although perhaps related) > issue, I intended to fix along the same lines as the previous netback > except for some unfathomable reason I haven''t been able to reproduce the > problem with netfront -- I''ve no idea why though since it seems like it > should be a no brainer! > > > Why skb->head can be on order-1 or order-2 pages and this is working ? > > skb->head being order 1 or 2 isn''t working for me. The driver I''m having > issues with which caused me to create this particular patch is the tg3 > driver (although I don''t think this is by any means specific to tg3). > > For the ->head the tg3 driver does: > mapping = pci_map_single(tp->pdev, skb->data, len, PCI_DMA_TODEVICE); > while for the frags it does: > mapping = skb_frag_dma_map(&tp->pdev->dev, frag, 0, len, DMA_TO_DEVICE); > > This ought to do the Right Thing but doesn''t seem to be working. Konrad > suspected an issue with the swiotlb''s handling of order>0 pages in some > cases. As I said in the commit message he is looking into this issue. > > My concern however was that even once the swiotlb is fixed to work right > the effect of pci_map_single on a order>0 page is going to be that the > data gets bounced into contiguous memory -- that is a memcpy which would > undo the benefit of having allocating large pages to start with. So I > figured that in such cases we''d be better off just using order 0 > allocations to start with.I am really confused. If you really have such problems, why locally generated TCP traffic doesnt also have it ? Your patch doesnt touch sk_page_frag_refill(), does it ?
Ian Campbell
2012-Oct-24 16:22 UTC
Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
On Wed, 2012-10-24 at 16:21 +0100, Eric Dumazet wrote:> On Wed, 2012-10-24 at 15:02 +0100, Ian Campbell wrote: > > On Wed, 2012-10-24 at 14:30 +0100, Eric Dumazet wrote: > > > It seems to me its a driver issue, for example > > > drivers/net/xen-netfront.c has assumptions that can be easily fixed. > > > > The netfront ->head thing is a separate (although perhaps related) > > issue, I intended to fix along the same lines as the previous netback > > except for some unfathomable reason I haven''t been able to reproduce the > > problem with netfront -- I''ve no idea why though since it seems like it > > should be a no brainer! > > > > > Why skb->head can be on order-1 or order-2 pages and this is working ? > > > > skb->head being order 1 or 2 isn''t working for me. The driver I''m having > > issues with which caused me to create this particular patch is the tg3 > > driver (although I don''t think this is by any means specific to tg3). > > > > For the ->head the tg3 driver does: > > mapping = pci_map_single(tp->pdev, skb->data, len, PCI_DMA_TODEVICE); > > while for the frags it does: > > mapping = skb_frag_dma_map(&tp->pdev->dev, frag, 0, len, DMA_TO_DEVICE); > > > > This ought to do the Right Thing but doesn''t seem to be working. Konrad > > suspected an issue with the swiotlb''s handling of order>0 pages in some > > cases. As I said in the commit message he is looking into this issue. > > > > My concern however was that even once the swiotlb is fixed to work right > > the effect of pci_map_single on a order>0 page is going to be that the > > data gets bounced into contiguous memory -- that is a memcpy which would > > undo the benefit of having allocating large pages to start with. So I > > figured that in such cases we''d be better off just using order 0 > > allocations to start with. > > I am really confused. > > If you really have such problems, why locally generated TCP traffic > doesnt also have it ?I think it does. The reason I noticed the original problem was that ssh to the machine was virtually (no pun intended) unusable.> Your patch doesnt touch sk_page_frag_refill(), does it ?That''s right. It doesn''t. When is (sk->sk_allocation & __GFP_WAIT) true? Is it possible I''m just not hitting that case? Is it possible that this only affects certain traffic patterns (I only really tried ssh/scp and ping)? Or perhaps its just that the swiotlb is only broken in one corner case and not the other. Ian.
Eric Dumazet
2012-Oct-24 16:43 UTC
Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
On Wed, 2012-10-24 at 17:22 +0100, Ian Campbell wrote:> On Wed, 2012-10-24 at 16:21 +0100, Eric Dumazet wrote:> > If you really have such problems, why locally generated TCP traffic > > doesnt also have it ? > > I think it does. The reason I noticed the original problem was that ssh > to the machine was virtually (no pun intended) unusable. > > > Your patch doesnt touch sk_page_frag_refill(), does it ? > > That''s right. It doesn''t. When is (sk->sk_allocation & __GFP_WAIT) true? > Is it possible I''m just not hitting that case? >I hope not. GFP_KERNEL has __GFP_WAIT.> Is it possible that this only affects certain traffic patterns (I only > really tried ssh/scp and ping)? Or perhaps its just that the swiotlb is > only broken in one corner case and not the other.Could you try a netperf -t TCP_STREAM ? Because ssh use small packets, and small TCP packets dont use frags but skb->head. You mentioned a 70% drop of performance, but what test have you used exactly ?
David Miller
2012-Oct-24 18:19 UTC
Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
I''m not applying this. Fix your drivers and the infrastructure they use, don''t paper around it.
Konrad Rzeszutek Wilk
2012-Oct-30 16:53 UTC
Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
On Wed, Oct 24, 2012 at 06:43:20PM +0200, Eric Dumazet wrote:> On Wed, 2012-10-24 at 17:22 +0100, Ian Campbell wrote: > > On Wed, 2012-10-24 at 16:21 +0100, Eric Dumazet wrote: > > > > If you really have such problems, why locally generated TCP traffic > > > doesnt also have it ? > > > > I think it does. The reason I noticed the original problem was that ssh > > to the machine was virtually (no pun intended) unusable. > > > > > Your patch doesnt touch sk_page_frag_refill(), does it ? > > > > That''s right. It doesn''t. When is (sk->sk_allocation & __GFP_WAIT) true? > > Is it possible I''m just not hitting that case? > > > > I hope not. GFP_KERNEL has __GFP_WAIT. > > > Is it possible that this only affects certain traffic patterns (I only > > really tried ssh/scp and ping)? Or perhaps its just that the swiotlb is > > only broken in one corner case and not the other. > > Could you try a netperf -t TCP_STREAM ?For fun I did a couple of tests - I setup two machines (one r8168, the other e1000e) and tried to do netperf/netserver. Both of them are running a baremetal kernel and one of them has ''iommu=soft swiotlb=force'' to simulate the worst case. This is using v3.7-rc3. The r8169 is booted without any arguments, the e1000e is using ''iommu=soft swiotlb=force''. So r8169 -> e1000e, I get ~940 (this is odd, I expected that the e1000e on the recv side would be using the bounce buffer, but then I realized it sets up using pci_alloc_coherent an ''dma'' pool). The other way - e1000e -> r8169 got me around ~128. So it is the sending side that ends up using the bounce buffer and it slows down considerably. I also swapped the machine that had e1000e with a tg3 - and got around the same numbers. So all of this points to the swiotlb and to just make sure that nothing was amiss I wrote a little driver that would allocate a compound page, setup DMA mapping, do some writes, sync and unmap the DMA page. And it works correctly - so swiotlb (and the xen variant) work right just right. Attached for your fun. Then I decided to try v3.6.3, with the same exact parameters.. and the problem went away. The e1000e -> r8169 which got me around ~128, now gets ~940! Still using the swiotlb bounce buffer.> > Because ssh use small packets, and small TCP packets dont use frags but > skb->head. > > You mentioned a 70% drop of performance, but what test have you used > exactly ?Note, I did not provide any arguments to netperf, but it did pick the test you wanted:> netperf -H tst019TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to tst019.dumpdata.com (192.168.101.39) port 0 AF_INET> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Oct-30 17:23 UTC
Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
On Tue, Oct 30, 2012 at 12:53:09PM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Oct 24, 2012 at 06:43:20PM +0200, Eric Dumazet wrote: > > On Wed, 2012-10-24 at 17:22 +0100, Ian Campbell wrote: > > > On Wed, 2012-10-24 at 16:21 +0100, Eric Dumazet wrote: > > > > > > If you really have such problems, why locally generated TCP traffic > > > > doesnt also have it ? > > > > > > I think it does. The reason I noticed the original problem was that ssh > > > to the machine was virtually (no pun intended) unusable. > > > > > > > Your patch doesnt touch sk_page_frag_refill(), does it ? > > > > > > That''s right. It doesn''t. When is (sk->sk_allocation & __GFP_WAIT) true? > > > Is it possible I''m just not hitting that case? > > > > > > > I hope not. GFP_KERNEL has __GFP_WAIT. > > > > > Is it possible that this only affects certain traffic patterns (I only > > > really tried ssh/scp and ping)? Or perhaps its just that the swiotlb is > > > only broken in one corner case and not the other. > > > > Could you try a netperf -t TCP_STREAM ? > > For fun I did a couple of tests - I setup two machines (one r8168, the other > e1000e) and tried to do netperf/netserver. Both of them are running a baremetal > kernel and one of them has ''iommu=soft swiotlb=force'' to simulate the worst > case. This is using v3.7-rc3.I also did a test with the patch at the top, with the same setup and ... it does look like it fixes some issues, but not the underlaying one. The same test, with net.core.netdev_frag_page_max_order=0, the e1000e->r8169 gets ~124, but then on subsequent runs it picks up to ~933. If I let the machine stay a bit idle and then do this again, it does around ~124 again. Thoughts?
Konrad Rzeszutek Wilk
2012-Oct-31 11:01 UTC
Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
On Tue, Oct 30, 2012 at 01:23:52PM -0400, Konrad Rzeszutek Wilk wrote:> On Tue, Oct 30, 2012 at 12:53:09PM -0400, Konrad Rzeszutek Wilk wrote: > > On Wed, Oct 24, 2012 at 06:43:20PM +0200, Eric Dumazet wrote: > > > On Wed, 2012-10-24 at 17:22 +0100, Ian Campbell wrote: > > > > On Wed, 2012-10-24 at 16:21 +0100, Eric Dumazet wrote: > > > > > > > > If you really have such problems, why locally generated TCP traffic > > > > > doesnt also have it ? > > > > > > > > I think it does. The reason I noticed the original problem was that ssh > > > > to the machine was virtually (no pun intended) unusable. > > > > > > > > > Your patch doesnt touch sk_page_frag_refill(), does it ? > > > > > > > > That''s right. It doesn''t. When is (sk->sk_allocation & __GFP_WAIT) true? > > > > Is it possible I''m just not hitting that case? > > > > > > > > > > I hope not. GFP_KERNEL has __GFP_WAIT. > > > > > > > Is it possible that this only affects certain traffic patterns (I only > > > > really tried ssh/scp and ping)? Or perhaps its just that the swiotlb is > > > > only broken in one corner case and not the other. > > > > > > Could you try a netperf -t TCP_STREAM ? > > > > For fun I did a couple of tests - I setup two machines (one r8168, the other > > e1000e) and tried to do netperf/netserver. Both of them are running a baremetal > > kernel and one of them has ''iommu=soft swiotlb=force'' to simulate the worst > > case. This is using v3.7-rc3. > > I also did a test with the patch at the top, with the same setup and ... it > does look like it fixes some issues, but not the underlaying one. > > The same test, with net.core.netdev_frag_page_max_order=0, the e1000e->r8169 > gets ~124, but then on subsequent runs it picks up to ~933. If I let the > machine stay a bit idle and then do this again, it does around ~124 again. > > Thoughts?Argh. Please disregard this test. I added an extra patch in the kernel tree to track the SWIOTLB bounce usage and print it.. and the printk was going out to the FB, which was not too fast - so the whole test was being slowed down by FB drivers :-) Will re-run this test without the offending patch.
Eric Dumazet
2012-Oct-31 11:19 UTC
Re: [PATCH] net: allow configuration of the size of page in __netdev_alloc_frag
On Wed, 2012-10-31 at 07:01 -0400, Konrad Rzeszutek Wilk wrote:> Argh. Please disregard this test. I added an extra patch in the kernel > tree to track the SWIOTLB bounce usage and print it.. and the printk was > going out to the FB, which was not too fast - so the whole test was being > slowed down by FB drivers :-) > > Will re-run this test without the offending patch.Anyway, I must confess I didnt understand what you did ;)