Hi, I''ve been trying to get current xen-sparse up and running on a 2-cpu box and have had a number of problems. One has been that networking is completely unstable: I get kernel panics under the slightest network load. The trouble is that this is a 1G box, so its memory is not large enough to automatically enable the swiotlb. (arch/xen/i386/kernel/swiotlb.c enables swiotlb automatically for dom0 only if there''s at least 2G of memory.) And the first time we get a pci_dma_single() request for a dom0-contiguous region which crosses a page boundary, we hit the BUG_ON at arch/xen/i386/kernel/pci_dma.c:270 due to dma_map_single() checking: IOMMU_BUG_ON(range_straddles_page_boundary(ptr, size)); And this happens *instantly* on any loaded tcp connection on my e1000 NIC. All I need to do to kill the box is to ssh in and type "find\n". Instant dom0 death after the ssh client receives about a dozen lines of output. The stack trace is appended below. The PCI mapping documentation certainly says that pci_map_single() needs to be able to map a single region, not just a single page. If it can''t, then I suspect we really need to enable swiotlb by default, because we''ll just be unstable without it. The kernel panics after this with "Fatal DMA error! Please use ''swiotlb=force''". But of course the default for Xen is to instantly reboot at this point before the error is visible. And even after catching the message with serial console, I found that "swiotlb=force" *also* dies on this box, with (XEN) (file=memory.c, line=57) Could not allocate order=14 extent: id=0 flags=0 (0 of 1) kernel BUG at arch/xen/i386/mm/hypervisor.c:354 (xen_create_contiguous_region)! [<c011a77d>] xen_create_contiguous_region+0x26d/0x2b0 [<c0112596>] swiotlb_init_with_default_size+0x86/0x1c0 [<c0112735>] swiotlb_init+0x65/0xa0 because we don''t have a large enough zone at boot time to create the 64MB swiotlb. Booting with "swiotlb=force swiotlb=8m" works around both of these bugs and allows me to boot; fortunately things are much more stable after I get this far. Cheers, Stephen --- kernel BUG at arch/xen/i386/kernel/pci-dma.c:270 (dma_map_single)! [<c010ecd6>] dma_map_single+0xf6/0x160 [<f49cd40b>] e1000_xmit_frame+0x40b/0xd30 [e1000] [<c0313510>] qdisc_restart+0x100/0x2f0 [<c03241d0>] ip_finish_output2+0x0/0x250 [<c030d594>] nf_hook_slow+0x64/0x110 [<c03010ff>] dev_queue_xmit+0x9f/0x340 [<c032404c>] ip_finish_output+0x15c/0x2e0 [<c03241d0>] ip_finish_output2+0x0/0x250 [<c0324947>] ip_queue_xmit+0x2b7/0x560 [<c0323ec0>] dst_output+0x0/0x30 [<c0155bf2>] poison_obj+0x32/0x60 [<c0155408>] dbg_redzone1+0x18/0x60 [<c0155e06>] check_poison_obj+0x26/0x1c0 [<c0155bf2>] poison_obj+0x32/0x60 [<c0155408>] dbg_redzone1+0x18/0x60 [<c0157dbc>] cache_alloc_debugcheck_after+0x4c/0x1b0 [<c0336e24>] tcp_transmit_skb+0x3d4/0x810 [<c02fab10>] skb_clone+0x20/0x1d0 [<c0337efd>] tcp_write_xmit+0x10d/0x330 [<c0334943>] __tcp_data_snd_check+0xa3/0xe0 [<c02fa961>] kfree_skbmem+0x21/0x30 [<c0335069>] tcp_rcv_established+0x2a9/0x910 [<f4b3f036>] ipt_hook+0x36/0x40 [iptable_filter] [<c033ef5a>] tcp_v4_do_rcv+0xfa/0x150 [<c033f8d5>] tcp_v4_rcv+0x925/0x980 [<c030d594>] nf_hook_slow+0x64/0x110 [<c03208d0>] ip_local_deliver_finish+0x0/0x270 [<c03206bc>] ip_local_deliver+0xdc/0x2f0 [<c03208d0>] ip_local_deliver_finish+0x0/0x270 [<c0320f0e>] ip_rcv+0x3ce/0x5b0 [<c03210f0>] ip_rcv_finish+0x0/0x320 [<c0301be0>] netif_receive_skb+0x250/0x310 [<f49cf3ae>] e1000_clean_rx_irq+0x13e/0x5d0 [e1000] [<f49ce8a2>] e1000_clean+0x52/0x1c0 [e1000] [<c0301f2c>] net_rx_action+0xdc/0x220 [<c0128f4a>] __do_softirq+0x8a/0x120 [<c012905d>] do_softirq+0x7d/0x80 [<c010ee22>] do_IRQ+0x22/0x30 [<c01049be>] evtchn_do_upcall+0x9e/0xe0 [<c010a2f0>] hypervisor_callback+0x2c/0x34 [<c0107b30>] xen_idle+0x40/0x80 [<c0107bd4>] cpu_idle+0x64/0xb0 [<c0436a4f>] start_kernel+0x1af/0x210 [<c0436380>] unknown_bootoption+0x0/0x220 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28 Oct 2005, at 20:21, Stephen C. Tweedie wrote:> The trouble is that this is a 1G box, so its memory is not large enough > to automatically enable the swiotlb. (arch/xen/i386/kernel/swiotlb.c > enables swiotlb automatically for dom0 only if there''s at least 2G of > memory.) And the first time we get a pci_dma_single() request for a > dom0-contiguous region which crosses a page boundary, we hit the BUG_ON > at arch/xen/i386/kernel/pci_dma.c:270 due to dma_map_single() checking: > > IOMMU_BUG_ON(range_straddles_page_boundary(ptr, size)); > > And this happens *instantly* on any loaded tcp connection on my e1000 > NIC. All I need to do to kill the box is to ssh in and type "find\n". > Instant dom0 death after the ssh client receives about a dozen lines of > output. The stack trace is appended below.Is the network interface set up to use jumbo frames? Otherwise I wouldn''t expect alloc_skb() to allocate a data area that straddles a page boundary, since the allocation will come from one of the sub-page-sized power-of-two kmem caches. If the problem is jumbo frames, we might need to add a hook to alloc_skb(). Using swiotlb will suck hugely. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Oct 28, 2005 at 03:21:20PM -0400, Stephen C. Tweedie wrote:> Hi, > > I''ve been trying to get current xen-sparse up and running on a 2-cpu box > and have had a number of problems. One has been that networking is > completely unstable: I get kernel panics under the slightest network > load.FYI, I opened bugzilla #373 to track this issue. Cheers, Muli -- Muli Ben-Yehuda http://www.mulix.org | http://mulix.livejournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> On 28 Oct 2005, at 20:21, Stephen C. Tweedie wrote: > > > The trouble is that this is a 1G box, so its memory is not large > > enough to automatically enable the swiotlb. > > (arch/xen/i386/kernel/swiotlb.c enables swiotlb > automatically for dom0 > > only if there''s at least 2G of > > memory.) And the first time we get a pci_dma_single() > request for a > > dom0-contiguous region which crosses a page boundary, we hit the > > BUG_ON at arch/xen/i386/kernel/pci_dma.c:270 due to > dma_map_single() checking:Does your card support TSO? What revision e1000 is it? Please can you try turning it off with: ethtool -K eth0 tso off If TSO is the problem we''ll come up with a better fix than using swiotlb. Thanks, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, On Wed, Nov 02, 2005 at 03:32:58PM -0000, Ian Pratt wrote:> Does your card support TSO? What revision e1000 is it?Yes, and I''ll check on Friday once I''m back from travelling (but it is a very recent box.)> Please can you try turning it off with: > ethtool -K eth0 tso offI already tried that and it did not help. I''ve also tried both gcc32 and gcc4 with no success. Cheers, Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Nov 02, 2005 at 10:36:17AM -0500, Stephen Tweedie wrote:> Hi, > > On Wed, Nov 02, 2005 at 03:32:58PM -0000, Ian Pratt wrote: > > > Does your card support TSO? What revision e1000 is it? > > Yes, and I''ll check on Friday once I''m back from travelling (but it is > a very recent box.)I am seeing the exact same problem with my Dell Latitude D800 laptop using Ethernet controller: Broadcom Corporation NetXtreme BCM5705M Gigabit Ethernet (rev 01) This is a relatively common and not so recent configuration.> > Please can you try turning it off with: > > ethtool -K eth0 tso off > > I already tried that and it did not help. I''ve also tried both gcc32 > and gcc4 with no success.[root@localhost ~]# ethtool -K eth0 tso off Cannot set device tcp segmentation offload settings: Operation not supported too bad ... with ''swiotlb=force swiotlb=8m'' kernel parameters the box is stable, without it very basic network access can crash it (say ''locate lib'' over ssh) and then the whole system reboots. 100% reproductible for me, and without crazy hardware :-) Hope this helps, Daniel -- Daniel Veillard | Red Hat http://redhat.com/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 2 Nov 2005, at 15:59, Daniel Veillard wrote:> [root@localhost ~]# ethtool -K eth0 tso off > Cannot set device tcp segmentation offload settings: Operation not > supported > > too bad ... > with ''swiotlb=force swiotlb=8m'' kernel parameters the box is stable, > without it very basic network access can crash it (say ''locate lib'' > over ssh) > and then the whole system reboots. > > 100% reproductible for me, and without crazy hardware :-)It''d be interesting to know what form of skbuffs get sent to the driver when this happens. e.g., how big is the skbuff data area, is the skbuff fragmented, etc. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Nov 02, 2005 at 05:12:27PM +0000, Keir Fraser wrote:> > On 2 Nov 2005, at 15:59, Daniel Veillard wrote: > > >[root@localhost ~]# ethtool -K eth0 tso off > >Cannot set device tcp segmentation offload settings: Operation not > >supported > > > > too bad ... > > with ''swiotlb=force swiotlb=8m'' kernel parameters the box is stable, > >without it very basic network access can crash it (say ''locate lib'' > >over ssh) > >and then the whole system reboots. > > > > 100% reproductible for me, and without crazy hardware :-) > > It''d be interesting to know what form of skbuffs get sent to the driver > when this happens. e.g., how big is the skbuff data area, is the skbuff > fragmented, etc.I''m not a kernel hacker, but if you give me a patch displaying those informations at the IOMMU_BUG_ON pointed by Steven, I will gladly rebuild and try to reboot over it to give you the informations (I have no serial so hint on avoiding the instant reboot of the dom0 would help). Oh yeah it''s just dom0 on top of the hypervisor, no domU even started. Daniel -- Daniel Veillard | Red Hat http://redhat.com/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Nov 02, 2005 at 06:04:25PM -0500, Daniel Veillard wrote:> I''m not a kernel hacker, but if you give me a patch displaying those > informations at the IOMMU_BUG_ON pointed by Steven, I will gladly rebuild > and try to reboot over it to give you the informations (I have no serial > so hint on avoiding the instant reboot of the dom0 would help). Oh yeah > it''s just dom0 on top of the hypervisor, no domU even started.Hi Daniel, could you try the following patch just to have a bit more information about the pointer and the size ? diff -r ca2e91ab4311 linux-2.6-xen-sparse/arch/xen/i386/kernel/pci-dma.c --- a/linux-2.6-xen-sparse/arch/xen/i386/kernel/pci-dma.c Thu Nov 3 01:45:07 2005 +++ b/linux-2.6-xen-sparse/arch/xen/i386/kernel/pci-dma.c Wed Nov 2 21:32:34 2005 @@ -267,6 +267,8 @@ dma = swiotlb_map_single(dev, ptr, size, direction); } else { dma = virt_to_bus(ptr); + if (range_straddles_page_boundary(ptr, size)) + printk("ptr: %p %zd\n", ptr, size); IOMMU_BUG_ON(range_straddles_page_boundary(ptr, size)); IOMMU_BUG_ON(address_needs_mapping(dev, dma)); } stick a while (1) ; after the printk would help you to avoid the reboot something like: if (range_straddles_page_boundary(ptr, size)) { printk("ptr: %p %zd\n", ptr, size); while (1); } Cheers, -- Vincent Hanquez _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Nov 03, 2005 at 03:45:27AM +0100, Vincent Hanquez wrote:> On Wed, Nov 02, 2005 at 06:04:25PM -0500, Daniel Veillard wrote: > > I''m not a kernel hacker, but if you give me a patch displaying those > > informations at the IOMMU_BUG_ON pointed by Steven, I will gladly rebuild > > and try to reboot over it to give you the informations (I have no serial > > so hint on avoiding the instant reboot of the dom0 would help). Oh yeah > > it''s just dom0 on top of the hypervisor, no domU even started. > > Hi Daniel,Hi, Salut :-)> could you try the following patch just to have a bit more information > about the pointer and the size ?[...]> stick a while (1) ; after the printk would help you to avoid the reboot > something like:Sure, took a bit of time to recompile the kernel (I didn''t do this for years) and it crashed as expected, here are the info: ptr: f160ed8e 1514 the size looks a full ethernet frame, i.e. 1500 of payload, 2 ethernet addresses and the 2bytes for the ethernet type, that looks kosher to me but clearly it is not aligned. Daniel -- Daniel Veillard | Red Hat http://redhat.com/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Sure, took a bit of time to recompile the kernel (I didn''t do > this for years) and it crashed as expected, here are the info: > > ptr: f160ed8e 1514 > > the size looks a full ethernet frame, i.e. 1500 of payload, 2 > ethernet addresses and the 2bytes for the ethernet type, that > looks kosher to me but clearly it is not aligned.Please can you try using either our -xen or -xen0 kernel config. I strongly suspect there''s something in your config that is breaking this for you, just not sure what. (NB: make sure you ''rm dist/install/boot/config*'' to avoid make woprld from grabbing your old config) Best, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Sure, took a bit of time to recompile the kernel (I didn''t do > this for years) and it crashed as expected, here are the info: > > ptr: f160ed8e 1514 > > the size looks a full ethernet frame, i.e. 1500 of payload, 2 > ethernet addresses and the 2bytes for the ethernet type, that > looks kosher to me but clearly it is not aligned.This allocation isn''t aligned to the next power of 2 boundary --- usually 1514 byte allocations are 2KB aligned. You''re not enabling some experimental option in your config that changes the alignment of slab allocations are you? Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Nov 07, 2005 at 01:51:35PM -0000, Ian Pratt wrote:> > > Sure, took a bit of time to recompile the kernel (I didn''t do > > this for years) and it crashed as expected, here are the info: > > > > ptr: f160ed8e 1514 > > > > the size looks a full ethernet frame, i.e. 1500 of payload, 2 > > ethernet addresses and the 2bytes for the ethernet type, that > > looks kosher to me but clearly it is not aligned. > > This allocation isn''t aligned to the next power of 2 boundary --- > usually 1514 byte allocations are 2KB aligned. > > You''re not enabling some experimental option in your config that changes > the alignment of slab allocations are you?Hi Ian, sorry for not responding to your previous message. The point is that I don''t really know offhand myself those kernel internals aspects. Steven can certainly provide a more informed answer. I checked our kernel config, and I see CONFIG_DEBUG_SLAB=y to be set up in our kernel-2.6.12-i686-hypervisor.config. Browsing to check all the other DEBUG option which might be potentially relevant I only found CONFIG_DEBUG_KERNEL CONFIG_DEBUG_HIGHMEM and CONFIG_DEBUG_INFO enabled. CONFIG_DEBUG_DRIVER is not set. The Xen options are: CONFIG_XEN=y CONFIG_ARCH_XEN=y CONFIG_NO_IDLE_HZ=y CONFIG_XEN_WRITABLE_PAGETABLES=y # CONFIG_XEN_SHADOW_MODE is not set CONFIG_XEN_SCRUB_PAGES=y CONFIG_FOREIGN_PAGES=y CONFIG_HAVE_ARCH_DEV_ALLOC_SKB=y CONFIG_XEN_BLKDEV_GRANT=y # CONFIG_XEN_BLKDEV_TAP_BE is not set # CONFIG_XEN_BLKDEV_TAP is not set # CONFIG_XEN_NETDEV_GRANT_TX is not set # CONFIG_XEN_NETDEV_GRANT_RX is not set # CONFIG_SMP_ALTERNATIVES is not set CONFIG_X86=y # CONFIG_X86_64 is not set CONFIG_XENARCH="i386" CONFIG_MMU=y CONFIG_UID16=y CONFIG_GENERIC_ISA_DMA=y # CONFIG_M686 is not set Daniel -- Daniel Veillard | Red Hat http://redhat.com/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephen C. Tweedie
2005-Nov-07 21:28 UTC
RE: [Xen-devel] DMA trouble with current xen-sparse
Hi, On Fri, 2005-11-04 at 14:50 +0000, Ian Pratt wrote:> > Sure, took a bit of time to recompile the kernel (I didn''t do > > this for years) and it crashed as expected, here are the info: > > > > ptr: f160ed8e 1514 > > > > the size looks a full ethernet frame, i.e. 1500 of payload, 2 > > ethernet addresses and the 2bytes for the ethernet type, that > > looks kosher to me but clearly it is not aligned. > > Please can you try using either our -xen or -xen0 kernel config. I > strongly suspect there''s something in your config that is breaking this > for you, just not sure what.I just tried to build it; it would not boot. That was building the 2.6.12 xen-sparse w/ gcc4; retrying with gcc32 now. But I suspect that the problem is CONFIG_SLAB_DEBUG. That sets up slab redzoning which checks for buffer overruns. One consequence is that cached objects grow very slightly --- enough that the 2k kmalloc cache gets created with 3 objects per order-2 slab, ie. all MTU-sized frames are going to be allocated from an 8k slab and one in three will straddle the page boundary. I may not have time to verify that today, but it sounds like a likely explanation for what we''re seeing. NB. even without redzoning, the slab allocator will try both order-1 and order-2 slab sizes to see what minimises the wasted space in a slab, so any subsystem that''s doing its own allocation of objects from a pool outside kmalloc may hit a size that creates these page-straddling caches. There''s a hacky quick-fix, which is to change #define BREAK_GFP_ORDER_HI 1 from 1 to 0 in mm/slab.c. But that''s just going to waste more slab cache space for many caches. Without that change, the fact is that an important debugging option is creating cross-page objects routinely, and that the slab allocator can create such objects quite normally even without that option; so it may end up being something that Xen just has to deal with. --Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> from 1 to 0 in mm/slab.c. But that''s just going to waste more slab > cache space for many caches. Without that change, the fact > is that an > important debugging option is creating cross-page objects > routinely, and that the slab allocator can create such > objects quite normally even without that option; so it may > end up being something that Xen just has to deal with.The best xen fix for this is for us to hook alloc_skb (rather than just dev_alloc_skb). This will enable us to solve the jumbo frames issue too. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Nov 07, 2005 at 04:28:59PM -0500, Stephen C. Tweedie wrote:> Hi, > > On Fri, 2005-11-04 at 14:50 +0000, Ian Pratt wrote: > > > > Sure, took a bit of time to recompile the kernel (I didn''t do > > > this for years) and it crashed as expected, here are the info: > > > > > > ptr: f160ed8e 1514 > > > > > > the size looks a full ethernet frame, i.e. 1500 of payload, 2 > > > ethernet addresses and the 2bytes for the ethernet type, that > > > looks kosher to me but clearly it is not aligned. > > > > Please can you try using either our -xen or -xen0 kernel config. I > > strongly suspect there''s something in your config that is breaking this > > for you, just not sure what. > > I just tried to build it; it would not boot. That was building the > 2.6.12 xen-sparse w/ gcc4; retrying with gcc32 now. > > But I suspect that the problem is CONFIG_SLAB_DEBUG. That sets up slab > redzoning which checks for buffer overruns. One consequence is thatJust to confirm that CONFIG_SLAB_DEBUG is the one exposing the issue. I recompiled the exact same kernel with just that option turned off and the tg3 driver does not seems to hang anymore. Daniel -- Daniel Veillard | Red Hat http://redhat.com/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephen C. Tweedie
2005-Nov-08 15:25 UTC
Re: [Xen-devel] DMA trouble with current xen-sparse
Hi, On Tue, 2005-11-08 at 15:55 +0000, Keir Fraser wrote:> > Just to confirm that CONFIG_SLAB_DEBUG is the one exposing the issue. > > I recompiled the exact same kernel with just that option turned off > > and the tg3 driver does not seems to hang anymore. > > This is now fixed in our tree (changeset 7700:98bcd8fbd5e3). Should get > pushed to the public repository in an hour or two...Thanks; I''ll have a look at that when it shows up. My main test box at work just died, though, so it might be a while before I can test it out properly. Cheers, Stephen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 8 Nov 2005, at 06:41, Daniel Veillard wrote:>> I just tried to build it; it would not boot. That was building the >> 2.6.12 xen-sparse w/ gcc4; retrying with gcc32 now. >> >> But I suspect that the problem is CONFIG_SLAB_DEBUG. That sets up >> slab >> redzoning which checks for buffer overruns. One consequence is that > > Just to confirm that CONFIG_SLAB_DEBUG is the one exposing the issue. > I recompiled the exact same kernel with just that option turned off > and the tg3 driver does not seems to hang anymore.This is now fixed in our tree (changeset 7700:98bcd8fbd5e3). Should get pushed to the public repository in an hour or two... -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel