Ian Campbell
2012-Oct-12 10:28 UTC
Dom0 physical networking/swiotlb/something issue in 3.7-rc1
Hi Konrad, The following patch causes fairly large packet loss when transmitting from dom0 to the physical network, at least with my tg3 hardware, but I assume it can impact anything which uses this interface. I suspect that the issue is that the compound pages allocated in this way are not backed by contiguous mfns and so things fall apart when the driver tries to do DMA. However I don''t understand why the swiotlb is not fixing this up successfully? The tg3 driver seems to use pci_map_single on this data. Any thoughts? Perhaps the swiotlb (either generically or in the Xen backend) doesn''t correctly handle compound pages? Ideally we would also fix this at the point of allocation to avoid the bouncing -- I suppose that would involve using the DMA API in netdev_alloc_frag? We have a, sort of, similar situation in the block layer which is solved via BIOVEC_PHYS_MERGEABLE. Sadly I don''t think anything similar can easily be retrofitted to the net drivers without changing every single one. Ian. commit 69b08f62e17439ee3d436faf0b9a7ca6fffb78db Author: Eric Dumazet <edumazet@google.com> Date: Wed Sep 26 06:46:57 2012 +0000 net: use bigger pages in __netdev_alloc_frag We currently use percpu order-0 pages in __netdev_alloc_frag to deliver fragments used by __netdev_alloc_skb() Depending on NIC driver and arch being 32 or 64 bit, it allows a page to be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096 Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows : - Better filling of space (the ending hole overhead is less an issue) - Less calls to page allocator or accesses to page->_count - Could allow struct skb_shared_info futures changes without major performance impact. This patch implements a transparent fallback to smaller pages in case of memory pressure. It also uses a standard "struct page_frag" instead of a custom one. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Konrad Rzeszutek Wilk
2012-Oct-12 11:59 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
On Fri, Oct 12, 2012 at 11:28:08AM +0100, Ian Campbell wrote:> Hi Konrad, > > The following patch causes fairly large packet loss when transmitting > from dom0 to the physical network, at least with my tg3 hardware, but I > assume it can impact anything which uses this interface.Ah, that would explain why one of my machines suddenly started developing checksum errors (and had a tg3 card). I hadn''t gotten deep into it.> > I suspect that the issue is that the compound pages allocated in this > way are not backed by contiguous mfns and so things fall apart when the > driver tries to do DMA.So this should also be easily reproduced on barmetal with ''iommu=soft'' then.> > However I don''t understand why the swiotlb is not fixing this up > successfully? The tg3 driver seems to use pci_map_single on this data. > Any thoughts? Perhaps the swiotlb (either generically or in the Xen > backend) doesn''t correctly handle compound pages?The assumption is that it is just a page. I am surprsed that the other IOMMUs aren''t hitting this as well - ah, that is b/c they do handle a virtual address of more than one PAGE_SIZE..> > Ideally we would also fix this at the point of allocation to avoid the > bouncing -- I suppose that would involve using the DMA API in > netdev_alloc_frag?Using pci_alloc_coherent would do it.. but> > We have a, sort of, similar situation in the block layer which is solved > via BIOVEC_PHYS_MERGEABLE. Sadly I don''t think anything similar can > easily be retrofitted to the net drivers without changing every single > one... I think the right way would be to fix the SWIOTLB. And since I am now officially the maintainer of said subsystem you have come to the right person! What is the easiest way of reproducing this? Just doing large amount of netperf/netserver traffic both ways?> > Ian. > > commit 69b08f62e17439ee3d436faf0b9a7ca6fffb78db > Author: Eric Dumazet <edumazet@google.com> > Date: Wed Sep 26 06:46:57 2012 +0000 > > net: use bigger pages in __netdev_alloc_frag > > We currently use percpu order-0 pages in __netdev_alloc_frag > to deliver fragments used by __netdev_alloc_skb() > > Depending on NIC driver and arch being 32 or 64 bit, it allows a page to > be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096 > > Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows : > > - Better filling of space (the ending hole overhead is less an issue) > > - Less calls to page allocator or accesses to page->_count > > - Could allow struct skb_shared_info futures changes without major > performance impact. > > This patch implements a transparent fallback to smaller > pages in case of memory pressure. > > It also uses a standard "struct page_frag" instead of a custom one. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Alexander Duyck <alexander.h.duyck@intel.com> > Cc: Benjamin LaHaise <bcrl@kvack.org> > Signed-off-by: David S. Miller <davem@davemloft.net> > > >
Ian Campbell
2012-Oct-12 12:09 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
On Fri, 2012-10-12 at 12:59 +0100, Konrad Rzeszutek Wilk wrote:> > I suspect that the issue is that the compound pages allocated in this > > way are not backed by contiguous mfns and so things fall apart when the > > driver tries to do DMA. > > So this should also be easily reproduced on barmetal with ''iommu=soft'' then.Does that cause the frames backing the page to become non-contiguous in "machine" memory?> > However I don''t understand why the swiotlb is not fixing this up > > successfully? The tg3 driver seems to use pci_map_single on this data. > > Any thoughts? Perhaps the swiotlb (either generically or in the Xen > > backend) doesn''t correctly handle compound pages? > > The assumption is that it is just a page. I am surprsed that the other > IOMMUs aren''t hitting this as well - ah, that is b/c they do handle > a virtual address of more than one PAGE_SIZE.. > > > > Ideally we would also fix this at the point of allocation to avoid the > > bouncing -- I suppose that would involve using the DMA API in > > netdev_alloc_frag? > > Using pci_alloc_coherent would do it.. but > > > > We have a, sort of, similar situation in the block layer which is solved > > via BIOVEC_PHYS_MERGEABLE. Sadly I don''t think anything similar can > > easily be retrofitted to the net drivers without changing every single > > one. > > .. I think the right way would be to fix the SWIOTLB. And since I am now > officially the maintainer of said subsystem you have come to the right > person! > > What is the easiest way of reproducing this? Just doing large amount > of netperf/netserver traffic both ways?I''m seeing ~75% loss from ping plus an scp from the dom0 to another host was measuring hundreds of kb/s instead of a few mb/s. Fixing this only at the swiotlb layer is going to cause lots of bouncing though? Ian.
Konrad Rzeszutek Wilk
2012-Oct-12 12:10 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
On Fri, Oct 12, 2012 at 07:59:49AM -0400, Konrad Rzeszutek Wilk wrote:> On Fri, Oct 12, 2012 at 11:28:08AM +0100, Ian Campbell wrote: > > Hi Konrad, > > > > The following patch causes fairly large packet loss when transmitting > > from dom0 to the physical network, at least with my tg3 hardware, but I > > assume it can impact anything which uses this interface. > > Ah, that would explain why one of my machines suddenly started > developing checksum errors (and had a tg3 card). I hadn''t gotten > deep into it. > > > > I suspect that the issue is that the compound pages allocated in this > > way are not backed by contiguous mfns and so things fall apart when the > > driver tries to do DMA. > > So this should also be easily reproduced on barmetal with ''iommu=soft'' then. > > > > However I don''t understand why the swiotlb is not fixing this up > > successfully? The tg3 driver seems to use pci_map_single on this data. > > Any thoughts? Perhaps the swiotlb (either generically or in the Xen > > backend) doesn''t correctly handle compound pages? > > The assumption is that it is just a page. I am surprsed that the other > IOMMUs aren''t hitting this as well - ah, that is b/c they do handle > a virtual address of more than one PAGE_SIZE..So.. the GART one (AMD poor man IOTLB - was used for AGP card translation, but can still be used as an IOMMU - and is still present on some AMD machines), looks to suffer the same problem. But perhaps not - can you explain to me if a compound page is virtually contingous? One of the things the GART does for pci_map_single is call page_to_phys(p), feeds the CPU physical address (and size) into the GART engine to setup the mapping. If compound pages are virtually (and physically on barmetal) contingous - this ought to work. But if they are not, then this should also break on AMD machines with tg3 and a AMD GART enabled. (and I should be able to find such machine in my lab).
Ian Campbell
2012-Oct-12 12:18 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
On Fri, 2012-10-12 at 13:10 +0100, Konrad Rzeszutek Wilk wrote:> On Fri, Oct 12, 2012 at 07:59:49AM -0400, Konrad Rzeszutek Wilk wrote: > > On Fri, Oct 12, 2012 at 11:28:08AM +0100, Ian Campbell wrote: > > > Hi Konrad, > > > > > > The following patch causes fairly large packet loss when transmitting > > > from dom0 to the physical network, at least with my tg3 hardware, but I > > > assume it can impact anything which uses this interface. > > > > Ah, that would explain why one of my machines suddenly started > > developing checksum errors (and had a tg3 card). I hadn''t gotten > > deep into it. > > > > > > I suspect that the issue is that the compound pages allocated in this > > > way are not backed by contiguous mfns and so things fall apart when the > > > driver tries to do DMA. > > > > So this should also be easily reproduced on barmetal with ''iommu=soft'' then. > > > > > > However I don''t understand why the swiotlb is not fixing this up > > > successfully? The tg3 driver seems to use pci_map_single on this data. > > > Any thoughts? Perhaps the swiotlb (either generically or in the Xen > > > backend) doesn''t correctly handle compound pages? > > > > The assumption is that it is just a page. I am surprsed that the other > > IOMMUs aren''t hitting this as well - ah, that is b/c they do handle > > a virtual address of more than one PAGE_SIZE.. > > So.. the GART one (AMD poor man IOTLB - was used for AGP card > translation, but can still be used as an IOMMU - and is still present on > some AMD machines), looks to suffer the same problem. > > But perhaps not - can you explain to me if a compound page > is virtually contingous? One of the things the GART does for > pci_map_single is call page_to_phys(p), feeds the CPU physical address > (and size) into the GART engine to setup the mapping. > > If compound pages are virtually (and physically on barmetal) contingous > - this ought to work. But if they are not, then this should also break on > AMD machines with tg3 and a AMD GART enabled.AFAIK compound pages are always physically contiguous. i.e. given a "struct page *page" which is the head of a compound page you can do "page++" to walk through its constituent frames. I''m not sure about virtually contiguous. Obviously if they are in lowmem then the 1-1 map combined with the fact that they are physically contiguous makes them virtually contiguous too. I''m not sure what happens if they are highmem -- since kmap (or whatever) would need to do some extra work in this case. I''ve not looked but I don''t recall noticing this in the past... Ian.
Konrad Rzeszutek Wilk
2012-Oct-12 13:17 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
On Fri, Oct 12, 2012 at 01:18:16PM +0100, Ian Campbell wrote:> On Fri, 2012-10-12 at 13:10 +0100, Konrad Rzeszutek Wilk wrote: > > On Fri, Oct 12, 2012 at 07:59:49AM -0400, Konrad Rzeszutek Wilk wrote: > > > On Fri, Oct 12, 2012 at 11:28:08AM +0100, Ian Campbell wrote: > > > > Hi Konrad, > > > > > > > > The following patch causes fairly large packet loss when transmitting > > > > from dom0 to the physical network, at least with my tg3 hardware, but I > > > > assume it can impact anything which uses this interface. > > > > > > Ah, that would explain why one of my machines suddenly started > > > developing checksum errors (and had a tg3 card). I hadn''t gotten > > > deep into it. > > > > > > > > I suspect that the issue is that the compound pages allocated in this > > > > way are not backed by contiguous mfns and so things fall apart when the > > > > driver tries to do DMA. > > > > > > So this should also be easily reproduced on barmetal with ''iommu=soft'' then. > > > > > > > > However I don''t understand why the swiotlb is not fixing this up > > > > successfully? The tg3 driver seems to use pci_map_single on this data. > > > > Any thoughts? Perhaps the swiotlb (either generically or in the Xen > > > > backend) doesn''t correctly handle compound pages? > > > > > > The assumption is that it is just a page. I am surprsed that the other > > > IOMMUs aren''t hitting this as well - ah, that is b/c they do handle > > > a virtual address of more than one PAGE_SIZE.. > > > > So.. the GART one (AMD poor man IOTLB - was used for AGP card > > translation, but can still be used as an IOMMU - and is still present on > > some AMD machines), looks to suffer the same problem. > > > > But perhaps not - can you explain to me if a compound page > > is virtually contingous? One of the things the GART does for > > pci_map_single is call page_to_phys(p), feeds the CPU physical address > > (and size) into the GART engine to setup the mapping. > > > > If compound pages are virtually (and physically on barmetal) contingous > > - this ought to work. But if they are not, then this should also break on > > AMD machines with tg3 and a AMD GART enabled. > > AFAIK compound pages are always physically contiguous. i.e. given a > "struct page *page" which is the head of a compound page you can do > "page++" to walk through its constituent frames. > > I''m not sure about virtually contiguous. Obviously if they are in lowmem > then the 1-1 map combined with the fact that they are physically > contiguous makes them virtually contiguous too. I''m not sure what > happens if they are highmem -- since kmap (or whatever) would need to do > some extra work in this case. I''ve not looked but I don''t recall > noticing this in the past...I think it also depends on how they are allocated - if your use GFP_DMA32 they will be in lowmem. And since the networking is using that by default they would be in the 1-1 map.
Konrad Rzeszutek Wilk
2012-Oct-29 15:55 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
On Fri, Oct 12, 2012 at 01:18:16PM +0100, Ian Campbell wrote:> On Fri, 2012-10-12 at 13:10 +0100, Konrad Rzeszutek Wilk wrote: > > On Fri, Oct 12, 2012 at 07:59:49AM -0400, Konrad Rzeszutek Wilk wrote: > > > On Fri, Oct 12, 2012 at 11:28:08AM +0100, Ian Campbell wrote: > > > > Hi Konrad, > > > > > > > > The following patch causes fairly large packet loss when transmitting > > > > from dom0 to the physical network, at least with my tg3 hardware, but I > > > > assume it can impact anything which uses this interface. > > > > > > Ah, that would explain why one of my machines suddenly started > > > developing checksum errors (and had a tg3 card). I hadn''t gotten > > > deep into it. > > > > > > > > I suspect that the issue is that the compound pages allocated in this > > > > way are not backed by contiguous mfns and so things fall apart when the > > > > driver tries to do DMA. > > > > > > So this should also be easily reproduced on barmetal with ''iommu=soft'' then. > > > > > > > > However I don''t understand why the swiotlb is not fixing this up > > > > successfully? The tg3 driver seems to use pci_map_single on this data. > > > > Any thoughts? Perhaps the swiotlb (either generically or in the Xen > > > > backend) doesn''t correctly handle compound pages? > > > > > > The assumption is that it is just a page. I am surprsed that the other > > > IOMMUs aren''t hitting this as well - ah, that is b/c they do handle > > > a virtual address of more than one PAGE_SIZE.. > > > > So.. the GART one (AMD poor man IOTLB - was used for AGP card > > translation, but can still be used as an IOMMU - and is still present on > > some AMD machines), looks to suffer the same problem. > > > > But perhaps not - can you explain to me if a compound page > > is virtually contingous? One of the things the GART does for > > pci_map_single is call page_to_phys(p), feeds the CPU physical address > > (and size) into the GART engine to setup the mapping. > > > > If compound pages are virtually (and physically on barmetal) contingous > > - this ought to work. But if they are not, then this should also break on > > AMD machines with tg3 and a AMD GART enabled. > > AFAIK compound pages are always physically contiguous. i.e. given a > "struct page *page" which is the head of a compound page you can do > "page++" to walk through its constituent frames. > > I''m not sure about virtually contiguous. Obviously if they are in lowmem > then the 1-1 map combined with the fact that they are physically > contiguous makes them virtually contiguous too. I''m not sure what > happens if they are highmem -- since kmap (or whatever) would need to do > some extra work in this case. I''ve not looked but I don''t recall > noticing this in the past...So to double check this, I wrote this nice little module (attached) that would allocate these type of pages and do ''DMA'' on them. From the tests it seems to work OK - in some cases it uses a bounce buffer and in some it does not. And the resulting buffers do contain the data we expected. # modprobe dma_test modprobe dma_test calling dma_test_init+0x0/0x1000 [dma_test] @ 2875 initcall dma_test_init+0x0/0x1000 [dma_test] returned 0 after 309 usecs fallback_bus: to_cpu: va: ffff8800642dd000 (pfn:642dd, mfn:53706) w.r.t prev mfn: 53707! fallback_bus: to_cpu: va: ffff8800642de000 (pfn:642de, mfn:53705) w.r.t prev mfn: 53706! fallback_bus: to_cpu: va: ffff8800642df000 (pfn:642df, mfn:53704) w.r.t prev mfn: 53705! fallback_bus: to_cpu: ffff8800642dc000 (pfn:642dc, bus frame: 53707) <= ffff880070046000 (addr: 70046000, frame: 186) fallback_bus: to_cpu: ffff8800642dd000 (pfn:642dd, bus frame: 53706) <= ffff880070047000 (addr: 70047000, frame: 187) fallback_bus: to_cpu: ffff8800642de000 (pfn:642de, bus frame: 53705) <= ffff880070048000 (addr: 70048000, frame: 188) fallback_bus: to_cpu: ffff8800642df000 (pfn:642df, bus frame: 53704) <= ffff880070049000 (addr: 70049000, frame: 189) fallback_bus: to_dev: va: ffff880059521000 (pfn:59521, mfn:488c2) w.r.t prev mfn: 488c3! fallback_bus: to_dev: va: ffff880059522000 (pfn:59522, mfn:488c1) w.r.t prev mfn: 488c2! fallback_bus: to_dev: va: ffff880059523000 (pfn:59523, mfn:488c0) w.r.t prev mfn: 488c1! fallback_bus: to_dev: va: ffff880059524000 (pfn:59524, mfn:488bf) w.r.t prev mfn: 488c0! fallback_bus: to_dev: va: ffff880059525000 (pfn:59525, mfn:488be) w.r.t prev mfn: 488bf! fallback_bus: to_dev: va: ffff880059526000 (pfn:59526, mfn:488bd) w.r.t prev mfn: 488be! fallback_bus: to_dev: va: ffff880059527000 (pfn:59527, mfn:488bc) w.r.t prev mfn: 488bd! fallback_bus: to_dev: 0xffff88007004a000(bounce) <= 0xffff880059520000 (sz: 32768) fallback_bus: to_dev: ffff880059520000 (pfn:59520, bus frame: 488c3) => ffff88007004a000 (addr: 7004a000, frame: 18a) fallback_bus: to_dev: ffff880059521000 (pfn:59521, bus frame: 488c2) => ffff88007004b000 (addr: 7004b000, frame: 18b) fallback_bus: to_dev: ffff880059522000 (pfn:59522, bus frame: 488c1) => ffff88007004c000 (addr: 7004c000, frame: 18c) fallback_bus: to_dev: ffff880059523000 (pfn:59523, bus frame: 488c0) => ffff88007004d000 (addr: 7004d000, frame: 18d) fallback_bus: to_dev: ffff880059524000 (pfn:59524, bus frame: 488bf) => ffff88007004e000 (addr: 7004e000, frame: 18e) fallback_bus: to_dev: ffff880059525000 (pfn:59525, bus frame: 488be) => ffff88007004f000 (addr: 7004f000, frame: 18f) fallback_bus: to_dev: ffff880059526000 (pfn:59526, bus frame: 488bd) => ffff880070050000 (addr: 70050000, frame: 190) fallback_bus: to_dev: ffff880059527000 (pfn:59527, bus frame: 488bc) => ffff880070051000 (addr: 70051000, frame: 191) fallback_bus: to_dev: ffff880059520000 with DMA (18a000) has ffffffcc (expected ffffffcc) fallback_bus: to_dev: ffff880059521000 with DMA (18b000) has ffffffcc (expected ffffffcc) fallback_bus: to_dev: ffff880059522000 with DMA (18c000) has ffffffcc (expected ffffffcc) fallback_bus: to_dev: ffff880059523000 with DMA (18d000) has ffffffcc (expected ffffffcc) fallback_bus: to_dev: ffff880059524000 with DMA (18e000) has ffffffcc (expected ffffffcc) fallback_bus: to_dev: ffff880059525000 with DMA (18f000) has ffffffcc (expected ffffffcc) fallback_bus: to_dev: ffff880059526000 with DMA (190000) has ffffffcc (expected ffffffcc) fallback_bus: to_dev: ffff880059527000 with DMA (191000) has ffffffcc (expected ffffffcc) fallback_bus: to_cpu: 0xffff880070046000(bounce) => 0xffff8800642dc000 (sz: 16384) fallback_bus: to_cpu: ffff8800642dc000 with DMA (186000) has ffffffdd (expected ffffffdd) fallback_bus: to_cpu: ffff8800642dd000 with DMA (187000) has ffffffdd (expected ffffffdd) fallback_bus: to_cpu: ffff8800642de000 with DMA (188000) has ffffffdd (expected ffffffdd) fallback_bus: to_cpu: ffff8800642df000 with DMA (189000) has ffffffdd (expected ffffffdd) fallback_bus: to_cpu: 0xffff880070046000(bounce) => 0xffff8800642dc000 (sz: 16384)> > Ian._______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2012-Nov-09 09:03 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
>>> On 12.10.12 at 12:28, Ian Campbell <Ian.Campbell@citrix.com> wrote: > The following patch causes fairly large packet loss when transmitting > from dom0 to the physical network, at least with my tg3 hardware, but I > assume it can impact anything which uses this interface. > > I suspect that the issue is that the compound pages allocated in this > way are not backed by contiguous mfns and so things fall apart when the > driver tries to do DMA.Has this seen any sort of resolution yet? Despite having forced NETDEV_FRAG_PAGE_MAX_ORDER to zero for Xen (following your suggested patch, Ian), and with a different NIC (e1000e driven) I''m seeing similar packet loss/corruption on transmits, and only if running a debug hypervisor (in a non-debug one, MFNs are largely contiguous, so this issue should be observable there only very rarely). Jan
Ian Campbell
2012-Nov-09 09:16 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
On Fri, 2012-11-09 at 09:03 +0000, Jan Beulich wrote:> >>> On 12.10.12 at 12:28, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > The following patch causes fairly large packet loss when transmitting > > from dom0 to the physical network, at least with my tg3 hardware, but I > > assume it can impact anything which uses this interface. > > > > I suspect that the issue is that the compound pages allocated in this > > way are not backed by contiguous mfns and so things fall apart when the > > driver tries to do DMA. > > Has this seen any sort of resolution yet? Despite having forced > NETDEV_FRAG_PAGE_MAX_ORDER to zero for Xen (following > your suggested patch, Ian), and with a different NIC (e1000e > driven) I''m seeing similar packet loss/corruption on transmits, > and only if running a debug hypervisor (in a non-debug one, > MFNs are largely contiguous, so this issue should be observable > there only very rarely).I think Konrad is still looking into the underlying swiotlb issue. If you want to go with the workaround then there is another order>0 to frob in net/core/sock.c #define SKB_FRAG_PAGE_ORDER get_order(32768) which might help. Dave Miller unequivocally rejected this approach so I haven''t been pursuing it any further. Perhaps once the swiotlb fix is made it will be possible to make an argument for it based on actual performance numbers. Ian.
Jan Beulich
2012-Nov-09 09:40 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
>>> On 09.11.12 at 10:16, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Fri, 2012-11-09 at 09:03 +0000, Jan Beulich wrote: >> >>> On 12.10.12 at 12:28, Ian Campbell <Ian.Campbell@citrix.com> wrote: >> > The following patch causes fairly large packet loss when transmitting >> > from dom0 to the physical network, at least with my tg3 hardware, but I >> > assume it can impact anything which uses this interface. >> > >> > I suspect that the issue is that the compound pages allocated in this >> > way are not backed by contiguous mfns and so things fall apart when the >> > driver tries to do DMA. >> >> Has this seen any sort of resolution yet? Despite having forced >> NETDEV_FRAG_PAGE_MAX_ORDER to zero for Xen (following >> your suggested patch, Ian), and with a different NIC (e1000e >> driven) I''m seeing similar packet loss/corruption on transmits, >> and only if running a debug hypervisor (in a non-debug one, >> MFNs are largely contiguous, so this issue should be observable >> there only very rarely). > > I think Konrad is still looking into the underlying swiotlb issue. > > If you want to go with the workaround then there is another order>0 to > frob in net/core/sock.c > #define SKB_FRAG_PAGE_ORDER get_order(32768) > which might help.Thanks, that indeed helped. And gives me food for thought as well, since this should - if anything at all - be a performance improvement, not a something affecting correctness. Jan
Jan Beulich
2012-Nov-09 10:36 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
>>> On 09.11.12 at 10:40, "Jan Beulich" <JBeulich@suse.com> wrote: >>>> On 09.11.12 at 10:16, Ian Campbell <Ian.Campbell@citrix.com> wrote: >> On Fri, 2012-11-09 at 09:03 +0000, Jan Beulich wrote: >>> >>> On 12.10.12 at 12:28, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>> > The following patch causes fairly large packet loss when transmitting >>> > from dom0 to the physical network, at least with my tg3 hardware, but I >>> > assume it can impact anything which uses this interface. >>> > >>> > I suspect that the issue is that the compound pages allocated in this >>> > way are not backed by contiguous mfns and so things fall apart when the >>> > driver tries to do DMA. >>> >>> Has this seen any sort of resolution yet? Despite having forced >>> NETDEV_FRAG_PAGE_MAX_ORDER to zero for Xen (following >>> your suggested patch, Ian), and with a different NIC (e1000e >>> driven) I''m seeing similar packet loss/corruption on transmits, >>> and only if running a debug hypervisor (in a non-debug one, >>> MFNs are largely contiguous, so this issue should be observable >>> there only very rarely). >> >> I think Konrad is still looking into the underlying swiotlb issue. >> >> If you want to go with the workaround then there is another order>0 to >> frob in net/core/sock.c >> #define SKB_FRAG_PAGE_ORDER get_order(32768) >> which might help. > > Thanks, that indeed helped. And gives me food for thought as > well, since this should - if anything at all - be a performance > improvement, not a something affecting correctness.Okay, one problem in the pv-ops case seems pretty obvious: dma_capable() does (potentially cross-page) arithmetic on a dma_addr_t value, ignoring dis-contiguity altogether. Specifically its first use in swiotlb_map_page() and its only use in swiotlb_map_sg_attrs() are problematic. In the forward ported kernels, those two checks are however accompanied by range_needs_mapping() (aka range_straddles_page_boundary()) checks, which ought to take care of this. There is brokenness there with the invocations of gnttab_dma_map_page(), but only if the initial offset is at least PAGE_SIZE - will have to check whether that occurs. Jan
Jan Beulich
2012-Nov-09 11:43 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
>>> On 09.11.12 at 11:36, "Jan Beulich" <JBeulich@suse.com> wrote: > In the forward ported kernels, those two checks are however > accompanied by range_needs_mapping() (aka > range_straddles_page_boundary()) checks, which ought to > take care of this. There is brokenness there with the invocations > of gnttab_dma_map_page(), but only if the initial offset is at > least PAGE_SIZE - will have to check whether that occurs.And indeed, fixing this also makes the problem go away when the allocation order doesn''t get forced to zero. So presumably there''s also only that one problem I had pointed out in pv-ops. Jan
Konrad Rzeszutek Wilk
2012-Nov-09 13:48 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
On Fri, Nov 09, 2012 at 11:43:39AM +0000, Jan Beulich wrote:> >>> On 09.11.12 at 11:36, "Jan Beulich" <JBeulich@suse.com> wrote: > > In the forward ported kernels, those two checks are however > > accompanied by range_needs_mapping() (aka > > range_straddles_page_boundary()) checks, which ought to > > take care of this. There is brokenness there with the invocations > > of gnttab_dma_map_page(), but only if the initial offset is at > > least PAGE_SIZE - will have to check whether that occurs. > > And indeed, fixing this also makes the problem go away when > the allocation order doesn''t get forced to zero. So presumably > there''s also only that one problem I had pointed out in pv-ops.The pvops one has this in the map-page variant (xen_swiotlb_map_page): 351 if (dma_capable(dev, dev_addr, size) && 352 !range_straddles_page_boundary(phys, size) && !swiotlb_force) 353 return dev_addr; and in the sg variant: 494 if (swiotlb_force || 495 !dma_capable(hwdev, dev_addr, sg->length) || 496 range_straddles_page_boundary(paddr, sg->length)) { 497 void *map = swiotlb_tbl_map_single(hwdev, So I think that check is OK. There is no gnttab_dma_map_page call - so that can''t be the issue. I did play with this a bit and wrote this little driver (see attached) that forces allocation of large pages and it worked as expected on Xen-SWIOTLB. But while doing this I found that the ''skge'' driver is busted - it does not even work on baremetal if you do ''iommu=soft swiotlb=force''. Since Xen-SWIOTLB would occasionaly use the bounce-buffer - and with greater than 0-page order - the bug in skge became more obvious. I hadn''t narrowed down where the issue is with skge.> > Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2012-Nov-09 17:34 UTC
Re: Dom0 physical networking/swiotlb/something issue in 3.7-rc1
>>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 11/09/12 2:48 PM >>> >On Fri, Nov 09, 2012 at 11:43:39AM +0000, Jan Beulich wrote: >> >>> On 09.11.12 at 11:36, "Jan Beulich" <JBeulich@suse.com> wrote: >> > In the forward ported kernels, those two checks are however >> > accompanied by range_needs_mapping() (aka >> > range_straddles_page_boundary()) checks, which ought to >> > take care of this. There is brokenness there with the invocations >> > of gnttab_dma_map_page(), but only if the initial offset is at >> > least PAGE_SIZE - will have to check whether that occurs. >> >> And indeed, fixing this also makes the problem go away when >> the allocation order doesn''t get forced to zero. So presumably >> there''s also only that one problem I had pointed out in pv-ops. > >The pvops one has this in the map-page variant (xen_swiotlb_map_page): > >351 if (dma_capable(dev, dev_addr, size) && >352 !range_straddles_page_boundary(phys, size) && !swiotlb_force) >353 return dev_addr; > >and in the sg variant: > >494 if (swiotlb_force || >495 !dma_capable(hwdev, dev_addr, sg->length) || >496 range_straddles_page_boundary(paddr, sg->length)) { >497 void *map = swiotlb_tbl_map_single(hwdev,Oh, right, I forgot that there''s yet another clone of that code under drivers/xen/.>So I think that check is OK. There is no gnttab_dma_map_page call - so that >can''t be the issue.Indeed. Jan
Seemingly Similar Threads
- [PATCH][Linux] gnttab: make dma address conversion logic of gnttab dma arch specific.
- [PATCH v3 04/41] virtio: memory access APIs
- [PATCH v3 04/41] virtio: memory access APIs
- [PATCH v3 04/41] virtio: memory access APIs
- [PATCH v3 04/41] virtio: memory access APIs