[This email is either empty or too large to be displayed at this time]
On Sun, Aug 15, 2010 at 11:28:39PM +0200, Sander Eikelenboom wrote:> Hi Konrad, > > After i have exhausted all kernel debug options without getting a pointer to my freezes, i have added some printk''s to all functions in swiotlb-xen.c > I see a lot of calls to xen_swiotlb_dma_mapping_error, which doesn''t seem to be good ?The driver looks to be actually testing the value, which is great. Some of the older drivers (r8169) don''t even do that.> > Although all the errors the device works fine (grabs video), but eventually the machine freezes, probably due to overwriting some mem it shouldn''t.Looking at the output, the physical addresses that DMA-ed are: 0x1f2962dc0 0x1f24f2e68 and they look to be called quite often. In fact, there looks to be a loop that does something like this: again: p = kmalloc(..) dma = pci_map_single(p) pci_dma_mapping_error(dma); /* get some data.. */ /* parse the: (pipe 0x80000280): IN: c0 00 00 00 0c 00 01 00 */ pci_unmap_sg(dma); goto again; As the virtual address sent to pci_map_single looks to be sequentially increasing. It might be: a). the pci_dma_mapping_error is used incorrectly, ie, it is used as !pci_dma_mapping_error, but I doubt that - the Linux kernel has soo many exampples of how to proper use that. b). The pci_dma_mapping_error implementation in Xen-SWIOTLB is busted, but I can''t see how. The logic is basically ''return !addr'' so, if you have addr = 0xf200000'', you will get 0, which is the proper return value. c). the xhci driver does something similar to the pseudo-code I''ve pointed out. It is missing a kfree somewhere. Can you point me to the git tree for the xhci and I can take a look there? Also could you send me yor debug patch - that will help in finding the culprit. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Monday, August 16, 2010, 4:43:53 PM, you wrote:> On Sun, Aug 15, 2010 at 11:28:39PM +0200, Sander Eikelenboom wrote: >> Hi Konrad, >> >> After i have exhausted all kernel debug options without getting a pointer to my freezes, i have added some printk''s to all functions in swiotlb-xen.c >> I see a lot of calls to xen_swiotlb_dma_mapping_error, which doesn''t seem to be good ?> The driver looks to be actually testing the value, which is great. Some > of the older drivers (r8169) don''t even do that. >> >> Although all the errors the device works fine (grabs video), but eventually the machine freezes, probably due to overwriting some mem it shouldn''t.> Looking at the output, the physical addresses that DMA-ed are:> 0x1f2962dc0 > 0x1f24f2e68> and they look to be called quite often. In fact, there looks to be a > loop that does something like this:> again: > p = kmalloc(..)> dma = pci_map_single(p) > pci_dma_mapping_error(dma); > /* get some data.. */ > /* parse the: (pipe 0x80000280): IN: c0 00 00 00 0c 00 01 00 */ > pci_unmap_sg(dma); > goto again;> As the virtual address sent to pci_map_single looks to be sequentially > increasing.> It might be: > a). the pci_dma_mapping_error is used incorrectly, ie, it is used > as !pci_dma_mapping_error, but I doubt that - the Linux kernel > has soo many exampples of how to proper use that. > b). The pci_dma_mapping_error implementation in Xen-SWIOTLB is busted, > but I can''t see how. The logic is basically ''return !addr'' so, if > you have addr = 0xf200000'', you will get 0, which is the proper > return value. > c). the xhci driver does something similar to the pseudo-code I''ve > pointed out. It is missing a kfree somewhere.> Can you point me to the git tree for the xhci and I can take a look > there? Also could you send me yor debug patch - that will help in > finding the culprit.I suspect the xhci drivers, because usb2 (ohci/ehci) works fine, but it''s all hard to find due to the freeze without debug output. And for linux, xen and pci-passthrough is still somewhat a corner case :( xhci-isoc patches required for webcams/videograbbers to work got merged in 2.6.36 merge window, together with xen-swiotlb. (isoc patch series by "Xu, Andiry" <Andiry.Xu@amd.com> , maintainer of the complete xhci tree is Sarah Sharp <sarah.a.sharp@linux.intel.com>) So what i have as domU kernel is: - from Linus tree 2.6.36-pre with latest commit 8357422d4bf33bc2c35884d4016c3fc9efbbc1d2 - pulled your devel/xen-pcifront-0.5 tree, and fixed the merge conflicts due to the pv on hvm patches, in the same way Jeremy did for his 2.6.36 branch. Conflicts: drivers/xen/events.c include/xen/events.h - Added a patch for xhci isoc that fixes another bug. (attached 0001-xHCI-update-ring-dequeue-pointer-when-process-missed.patch) - Added a patch that shows extra debug info for xhci from the author of the xhci-isoc pathes (attached isoc_length5.patch) - Changed some debug lines from dbg to warn level, (just enabling xhci-debug in kernel config floods the logs too fast, so i just enabled them in xhci-mem.c) (Attached a patch with all my changes to xhci*, including the 2 patches above applied.) - Added some printk''s to swiotlb-xen.c to see which functions were used, and let some of them print the address as well, in the hope i could find some debug info there.(attached) Apart from fixing the xhci in the end, is there a way to prevent xen from freezing altogether without leaving a trace ? Even an Oops is much easier to debug than a freeze. Due to the nature of DMA that could perhaps be difficult, although there is an DMA API ... Had a fruitful LinuxCon ? -- Sander _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Looking at the output, the physical addresses that DMA-ed are: > > > 0x1f2962dc0 > > 0x1f24f2e68 > > > and they look to be called quite often. In fact, there looks to be a > > loop that does something like this: > > > again: > > p = kmalloc(..) > > > dma = pci_map_single(p) > > pci_dma_mapping_error(dma); > > /* get some data.. */ > > /* parse the: (pipe 0x80000280): IN: c0 00 00 00 0c 00 01 00 */ > > pci_unmap_sg(dma); > > goto again; > > > As the virtual address sent to pci_map_single looks to be sequentially > > increasing.I am not sure what virtual address I was thinking off. Looking again at the dmesg output it looks actually correct. SWIOTLB is giving out chunks, increasing its own index, and then unmaping those pages right away. This stack-trace in the middle of the log: Aug 15 23:14:02 security kernel: [ 103.646497] em28xx #0 em28xx_isoc_copy :Video frame 1, length=960, odd Aug 15 23:14:02 security kernel: [ 103.650177] xhci_hcd 0000:08:00.0: Miss service interval error, set skip flag Aug 15 23:14:02 security kernel: [ 103.651371] em28xx #0 em28xx_irq_callback :urb completition error -18. Aug 15 23:14:02 security kernel: [ 103.651406] em28xx #0 print_err_status :URB status -18 [Unknown]. Aug 15 23:14:02 security kernel: [ 103.651432] em28xx #0 em28xx_isoc_copy :Video frame 2, length=2888, even Aug 15 23:14:02 security kernel: [ 103.651454] em28xx #0 get_next_buf :No active queue to serve Aug 15 23:14:02 security kernel: [ 103.651477] em28xx #0 print_err_status :URB packet 63, status -18 [Unknown]. Aug 15 23:14:02 security kernel: [ 103.653143] motion: page allocation failure. order:1, mode:0x20 Aug 15 23:14:02 security kernel: [ 103.653177] Pid: 1418, comm: motion Not tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14 Aug 15 23:14:02 security kernel: [ 103.653205] Call Trace: Aug 15 23:14:02 security kernel: [ 103.653219] <IRQ> [<ffffffff810a7ebf>] __alloc_pages_nodemask+0x667/0x6c7 Aug 15 23:14:02 security kernel: [ 103.653272] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 Aug 15 23:14:02 security kernel: [ 103.653300] [<ffffffff810ca32f>] cache_alloc_refill+0x3b3/0x851 Aug 15 23:14:02 security kernel: [ 103.653326] [<ffffffff81006b4d>] ? xen_force_evtchn_callback+0xd/0xf Aug 15 23:14:02 security kernel: [ 103.653352] [<ffffffff81007202>] ? check_events+0x12/0x20 Aug 15 23:14:02 security kernel: [ 103.653375] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 Aug 15 23:14:02 security kernel: [ 103.653401] [<ffffffff810ce9bd>] ? create_object+0x2e/0x2a1 Aug 15 23:14:02 security kernel: [ 103.653425] [<ffffffff810c9e3f>] kmem_cache_alloc+0xf7/0x1c4 Aug 15 23:14:02 security kernel: [ 103.653451] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 Aug 15 23:14:02 security kernel: [ 103.653475] [<ffffffff810ce9bd>] create_object+0x2e/0x2a1 Aug 15 23:14:02 security kernel: [ 103.653499] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 Aug 15 23:14:02 security kernel: [ 103.653523] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 Aug 15 23:14:02 security kernel: [ 103.653548] [<ffffffff810c8f68>] ? cache_alloc_debugcheck_after+0x1bf/0x239 Aug 15 23:14:02 security kernel: [ 103.653575] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 Aug 15 23:14:02 security kernel: [ 103.653602] [<ffffffff8143e7d7>] kmemleak_alloc+0x21/0x3e Aug 15 23:14:02 security kernel: [ 103.653625] [<ffffffff810c9edf>] kmem_cache_alloc+0x197/0x1c4 Aug 15 23:14:02 security kernel: [ 103.653649] [<ffffffff8135e085>] xhci_urb_enqueue+0x14c/0x3f5 Aug 15 23:14:02 security kernel: [ 103.653675] [<ffffffff81340450>] usb_hcd_submit_urb+0xb1f/0xbf5 Aug 15 23:14:02 security kernel: [ 103.653701] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 Aug 15 23:14:02 security kernel: [ 103.653726] [<ffffffff81006b4d>] ? xen_force_evtchn_callback+0xd/0xf Aug 15 23:14:02 security kernel: [ 103.653751] [<ffffffff81007202>] ? check_events+0x12/0x20 Aug 15 23:14:02 security kernel: [ 103.653777] [<ffffffff812b6267>] ? vt_console_print+0x43/0x346 Aug 15 23:14:02 security kernel: [ 103.653802] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 Aug 15 23:14:02 security kernel: [ 103.653830] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 Aug 15 23:14:02 security kernel: [ 103.653857] [<ffffffff8106d5b0>] ? lock_release+0x1e0/0x1ed Aug 15 23:14:03 security kernel: [ 103.653882] [<ffffffff81340d45>] usb_submit_urb+0x249/0x260 Aug 15 23:14:03 security kernel: [ 103.653919] [<ffffffffa0060615>] em28xx_irq_callback+0xc9/0xfd [em28xx] Aug 15 23:14:03 security kernel: [ 103.653946] [<ffffffff8133f16a>] usb_hcd_giveback_urb+0x84/0xb8 Aug 15 23:14:03 security kernel: [ 103.653972] [<ffffffff8136480a>] xhci_handle_event+0xced/0xdf7 Aug 15 23:14:03 security kernel: [ 103.653998] [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7 Aug 15 23:14:03 security kernel: [ 103.654024] [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7 Aug 15 23:14:03 security kernel: [ 103.654050] [<ffffffff81364a31>] xhci_irq+0x11d/0x1af Aug 15 23:14:03 security kernel: [ 103.654074] [<ffffffff81364ade>] xhci_msi_irq+0x1b/0x1d Aug 15 23:14:03 security kernel: [ 103.654103] [<ffffffff8108e4aa>] handle_IRQ_event+0x20/0xc7 Aug 15 23:14:03 security kernel: [ 103.654128] [<ffffffff810902a4>] handle_level_irq+0x98/0xf3 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff812997a8>] __xen_evtchn_do_upcall+0xe1/0x183 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8129a423>] xen_evtchn_do_upcall+0x2a/0x3c Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8100a9ee>] xen_do_hypervisor_callback+0x1e/0x30 Aug 15 23:14:03 security kernel: [ 103.654151] <EOI> [<ffffffff810012eb>] ? hypercall_page+0x2eb/0x1006 Aug 15 23:14:03 security kernel: [ 103.654151] Mem-Info: Aug 15 23:14:03 security kernel: [ 103.654151] DMA per-cpu: Aug 15 23:14:03 security kernel: [ 103.654151] CPU 0: hi: 0, btch: 1 usd: 0 Aug 15 23:14:03 security kernel: [ 103.654151] DMA32 per-cpu: Aug 15 23:14:03 security kernel: [ 103.654151] CPU 0: hi: 186, btch: 31 usd: 168 Aug 15 23:14:03 security kernel: [ 103.654151] active_anon:4503 inactive_anon:4767 isolated_anon:0 Aug 15 23:14:03 security kernel: [ 103.654151] active_file:4071 inactive_file:9087 isolated_file:0 Aug 15 23:14:03 security kernel: [ 103.654151] unevictable:0 dirty:1081 writeback:0 unstable:0 Aug 15 23:14:03 security kernel: [ 103.654151] free:1584 slab_reclaimable:5991 slab_unreclaimable:153956 Aug 15 23:14:03 security kernel: [ 103.654151] mapped:3401 shmem:48 pagetables:1452 bounce:0 Aug 15 23:14:03 security kernel: [ 103.654151] DMA free:3992kB min:60kB low:72kB high:88kB active_anon:84kB inactive_anon:436kB active_file:24kB inactive_file:148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15712kB mlocked:0kB dirty:20kB writeback:0kB mapped:16kB shmem:0kB slab_reclaimable:264kB slab_unreclaimable:8932kB kernel_stack:0kB pagetables:384kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Aug 15 23:14:03 security kernel: [ 103.654151] lowmem_reserve[]: 0 994 994 994 Aug 15 23:14:03 security kernel: [ 103.654151] DMA32 free:2344kB min:4000kB low:5000kB high:6000kB active_anon:17928kB inactive_anon:18632kB active_file:16260kB inactive_file:36200kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018080kB mlocked:0kB dirty:4304kB writeback:0kB mapped:13588kB shmem:192kB slab_reclaimable:23700kB slab_unreclaimable:606892kB kernel_stack:720kB pagetables:5424kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Aug 15 23:14:03 security kernel: [ 103.654151] lowmem_reserve[]: 0 0 0 0 Aug 15 23:14:03 security kernel: [ 103.654151] DMA: 88*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3992kB Aug 15 23:14:03 security kernel: [ 103.654151] DMA32: 406*4kB 0*8kB 1*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2344kB Aug 15 23:14:03 security kernel: [ 103.654151] 13236 total pagecache pages Aug 15 23:14:03 security kernel: [ 103.654151] 21 pages in swap cache Aug 15 23:14:03 security kernel: [ 103.654151] Swap cache stats: add 32, delete 11, find 27/27 Aug 15 23:14:03 security kernel: [ 103.654151] Free swap = 524200kB Aug 15 23:14:03 security kernel: [ 103.654151] Total swap = 524284kB Aug 15 23:14:03 security kernel: [ 103.654151] 262128 pages RAM Aug 15 23:14:03 security kernel: [ 103.654151] 73203 pages reserved Aug 15 23:14:03 security kernel: [ 103.654151] 21103 pages shared Aug 15 23:14:03 security kernel: [ 103.654151] 105944 pages non-shared Aug 15 23:14:03 security kernel: [ 103.654151] kmemleak: Cannot allocate a kmemleak_object structure Aug 15 23:14:03 security kernel: [ 103.654151] Pid: 1418, comm: motion Not tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14 Aug 15 23:14:03 security kernel: [ 103.654151] Call Trace: Aug 15 23:14:03 security kernel: [ 103.654151] <IRQ> [<ffffffff810ce9d8>] create_object+0x49/0x2a1 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810c8f68>] ? cache_alloc_debugcheck_after+0x1bf/0x239 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8143e7d7>] kmemleak_alloc+0x21/0x3e Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810c9edf>] kmem_cache_alloc+0x197/0x1c4 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] xhci_urb_enqueue+0x14c/0x3f5 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81340450>] usb_hcd_submit_urb+0xb1f/0xbf5 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81006b4d>] ? xen_force_evtchn_callback+0xd/0xf Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81007202>] ? check_events+0x12/0x20 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff812b6267>] ? vt_console_print+0x43/0x346 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8106d5b0>] ? lock_release+0x1e0/0x1ed Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81340d45>] usb_submit_urb+0x249/0x260 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffffa0060615>] em28xx_irq_callback+0xc9/0xfd [em28xx] Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8133f16a>] usb_hcd_giveback_urb+0x84/0xb8 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8136480a>] xhci_handle_event+0xced/0xdf7 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81364a31>] xhci_irq+0x11d/0x1af Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81364ade>] xhci_msi_irq+0x1b/0x1d Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8108e4aa>] handle_IRQ_event+0x20/0xc7 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810902a4>] handle_level_irq+0x98/0xf3 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff812997a8>] __xen_evtchn_do_upcall+0xe1/0x183 Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8129a423>] xen_evtchn_do_upcall+0x2a/0x3c Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8100a9ee>] xen_do_hypervisor_callback+0x1e/0x30 Aug 15 23:14:03 security kernel: [ 103.654151] <EOI> [<ffffffff810012eb>] ? hypercall_page+0x2eb/0x1006 Aug 15 23:14:03 security kernel: [ 103.654151] kmemleak: Kernel memory leak detector disabled Aug 15 23:14:03 security kernel: [ 103.654151] xhci_hcd 0000:08:00.0: Found td. Clear skip flag. Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 0, status -18 [Unknown]. Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 1, status -18 [Unknown]. Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 2, status -18 [Unknown]. Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 3, status -18 [Unknown]. Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 4, status -18 [Unknown]. Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 5, status -18 [Unknown]. Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 6, status -18 [Unknown]. Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 7, status -18 [Unknown]. Aug 15 23:14:03 security kernel: [ 103.659299] xhci_hcd 0000:08:00.0: Miss service interval error, set skip flag Aug 15 23:14:03 security kernel: [ 103.659307] xhci_hcd 0000:08:00.0: Found td. Clear skip flag. Aug 15 23:14:03 security kernel: [ 103.661748] kmemleak: Automatic memory scanning thread ended Aug 15 23:14:03 security kernel: [ 103.755620] em28xx #0 print_err_status :URB packet 6, status -18 [Unknown]. Aug 15 23:14:03 security kernel: [ 103.755630] em28xx #0 print_err_status :URB packet 7, status -18 [Unknown]. The first fault is kmemleaker not being able to allocate its structure, and then it throws an error too. Then the em28xx gets horribly confused that it could not send an URB anymore and from there one things get worst. There is something causing this memory problem,and I think we can narrow it down if you output the kmemleaker output. You might have to unload/reload the em28xx driver to get an idea of where the leak is.> So what i have as domU kernel is: > - from Linus tree 2.6.36-pre with latest commit 8357422d4bf33bc2c35884d4016c3fc9efbbc1d2 > - pulled your devel/xen-pcifront-0.5 tree, and fixed the merge conflicts due to the pv on hvm patches, in the same way Jeremy did for his 2.6.36 branch. > Conflicts: > drivers/xen/events.c > include/xen/events.h > > - Added a patch for xhci isoc that fixes another bug. (attached 0001-xHCI-update-ring-dequeue-pointer-when-process-missed.patch) > - Added a patch that shows extra debug info for xhci from the author of the xhci-isoc pathes (attached isoc_length5.patch) > > - Changed some debug lines from dbg to warn level, (just enabling xhci-debug in kernel config floods the logs too fast, so i just enabled them in xhci-mem.c) > (Attached a patch with all my changes to xhci*, including the 2 patches above applied.) > > - Added some printk''s to swiotlb-xen.c to see which functions were used, and let some of them print the address as well, in the hope i could find some debug info there.(attached)So the other thing I realized is that xhci can actually do 64bit transfers. Try booting without the ''swiotlb=force'' (but still have iommu=soft).> > Apart from fixing the xhci in the end, is there a way to prevent xen from freezing altogether without leaving a trace ?I am still puzzled as why Xen hypervisor would freeze. It _should_ work just fine.> Even an Oops is much easier to debug than a freeze. Due to the nature of DMA that could perhaps be difficult, although there is an DMA API ... > > Had a fruitful LinuxCon ?Quite so!> > -- > Sander_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Aug 26, 2010 at 05:43:21PM -0400, Konrad Rzeszutek Wilk wrote:> > > > > Apart from fixing the xhci in the end, is there a way to prevent xen from freezing altogether without leaving a trace ? > > I am still puzzled as why Xen hypervisor would freeze. It _should_ work > just fine. >There has been multiple reports about these freezes.. for one user ''acpi=ht'' fixes the freezes. Or another way: libata.noacpi=1 fixes it aswell for that user. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Aug 27, 2010 at 09:46:37AM +0300, Pasi Kärkkäinen wrote:> On Thu, Aug 26, 2010 at 05:43:21PM -0400, Konrad Rzeszutek Wilk wrote: > > > > > > > > Apart from fixing the xhci in the end, is there a way to prevent xen from freezing altogether without leaving a trace ? > > > > I am still puzzled as why Xen hypervisor would freeze. It _should_ work > > just fine. > > > > There has been multiple reports about these freezes.. > for one user ''acpi=ht'' fixes the freezes. > > Or another way: libata.noacpi=1 fixes it aswell for that user.Weren''t those bootup options thought? As in, without those the user was not able to boot? Not as something where two days in the machine suddenly stops working? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Aug 27, 2010 at 10:19:55AM -0400, Konrad Rzeszutek Wilk wrote:> On Fri, Aug 27, 2010 at 09:46:37AM +0300, Pasi Kärkkäinen wrote: > > On Thu, Aug 26, 2010 at 05:43:21PM -0400, Konrad Rzeszutek Wilk wrote: > > > > > > > > > > > Apart from fixing the xhci in the end, is there a way to prevent xen from freezing altogether without leaving a trace ? > > > > > > I am still puzzled as why Xen hypervisor would freeze. It _should_ work > > > just fine. > > > > > > > There has been multiple reports about these freezes.. > > for one user ''acpi=ht'' fixes the freezes. > > > > Or another way: libata.noacpi=1 fixes it aswell for that user. > > Weren''t those bootup options thought? As in, without those the user was > not able to boot? Not as something where two days in the machine > suddenly stops working? >They were causing freezes at random during operation, without any log/serialconsole debug messages.. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Konrad, Thx for your reply, I have been hiking in the Swiss mountains for about 2 weeks, so i had more Zen than Xen, which is nice from time to time, but delayed my reply :-) I will try everything again in the next week some time, i also saw from the corner of my eye someone else on the usb-devel mailing list with some problems with XHCI-ISOC which could be related so i will also search for patches there. I will keep you posted with my new experiences. Thursday, August 26, 2010, 11:43:21 PM, you wrote:>> > Looking at the output, the physical addresses that DMA-ed are: >> >> > 0x1f2962dc0 >> > 0x1f24f2e68 >> >> > and they look to be called quite often. In fact, there looks to be a >> > loop that does something like this: >> >> > again: >> > p = kmalloc(..) >> >> > dma = pci_map_single(p) >> > pci_dma_mapping_error(dma); >> > /* get some data.. */ >> > /* parse the: (pipe 0x80000280): IN: c0 00 00 00 0c 00 01 00 */ >> > pci_unmap_sg(dma); >> > goto again; >> >> > As the virtual address sent to pci_map_single looks to be sequentially >> > increasing.> I am not sure what virtual address I was thinking off. Looking again > at the dmesg output it looks actually correct. SWIOTLB is giving out > chunks, increasing its own index, and then unmaping those pages right > away.> This stack-trace in the middle of the log: > > Aug 15 23:14:02 security kernel: [ 103.646497] em28xx #0 em28xx_isoc_copy :Video frame 1, length=960, odd > Aug 15 23:14:02 security kernel: [ 103.650177] xhci_hcd 0000:08:00.0: Miss service interval error, set skip flag > Aug 15 23:14:02 security kernel: [ 103.651371] em28xx #0 em28xx_irq_callback :urb completition error -18. > Aug 15 23:14:02 security kernel: [ 103.651406] em28xx #0 print_err_status :URB status -18 [Unknown]. > Aug 15 23:14:02 security kernel: [ 103.651432] em28xx #0 em28xx_isoc_copy :Video frame 2, length=2888, even > Aug 15 23:14:02 security kernel: [ 103.651454] em28xx #0 get_next_buf :No active queue to serve > Aug 15 23:14:02 security kernel: [ 103.651477] em28xx #0 print_err_status :URB packet 63, status -18 [Unknown]. > Aug 15 23:14:02 security kernel: [ 103.653143] motion: page allocation failure. order:1, mode:0x20 > Aug 15 23:14:02 security kernel: [ 103.653177] Pid: 1418, comm: motion Not tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14 > Aug 15 23:14:02 security kernel: [ 103.653205] Call Trace: > Aug 15 23:14:02 security kernel: [ 103.653219] <IRQ> [<ffffffff810a7ebf>] __alloc_pages_nodemask+0x667/0x6c7 > Aug 15 23:14:02 security kernel: [ 103.653272] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:02 security kernel: [ 103.653300] [<ffffffff810ca32f>] cache_alloc_refill+0x3b3/0x851 > Aug 15 23:14:02 security kernel: [ 103.653326] [<ffffffff81006b4d>] ? xen_force_evtchn_callback+0xd/0xf > Aug 15 23:14:02 security kernel: [ 103.653352] [<ffffffff81007202>] ? check_events+0x12/0x20 > Aug 15 23:14:02 security kernel: [ 103.653375] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:02 security kernel: [ 103.653401] [<ffffffff810ce9bd>] ? create_object+0x2e/0x2a1 > Aug 15 23:14:02 security kernel: [ 103.653425] [<ffffffff810c9e3f>] kmem_cache_alloc+0xf7/0x1c4 > Aug 15 23:14:02 security kernel: [ 103.653451] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:02 security kernel: [ 103.653475] [<ffffffff810ce9bd>] create_object+0x2e/0x2a1 > Aug 15 23:14:02 security kernel: [ 103.653499] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:02 security kernel: [ 103.653523] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:02 security kernel: [ 103.653548] [<ffffffff810c8f68>] ? cache_alloc_debugcheck_after+0x1bf/0x239 > Aug 15 23:14:02 security kernel: [ 103.653575] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:02 security kernel: [ 103.653602] [<ffffffff8143e7d7>] kmemleak_alloc+0x21/0x3e > Aug 15 23:14:02 security kernel: [ 103.653625] [<ffffffff810c9edf>] kmem_cache_alloc+0x197/0x1c4 > Aug 15 23:14:02 security kernel: [ 103.653649] [<ffffffff8135e085>] xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:02 security kernel: [ 103.653675] [<ffffffff81340450>] usb_hcd_submit_urb+0xb1f/0xbf5 > Aug 15 23:14:02 security kernel: [ 103.653701] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:02 security kernel: [ 103.653726] [<ffffffff81006b4d>] ? xen_force_evtchn_callback+0xd/0xf > Aug 15 23:14:02 security kernel: [ 103.653751] [<ffffffff81007202>] ? check_events+0x12/0x20 > Aug 15 23:14:02 security kernel: [ 103.653777] [<ffffffff812b6267>] ? vt_console_print+0x43/0x346 > Aug 15 23:14:02 security kernel: [ 103.653802] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:02 security kernel: [ 103.653830] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:02 security kernel: [ 103.653857] [<ffffffff8106d5b0>] ? lock_release+0x1e0/0x1ed > Aug 15 23:14:03 security kernel: [ 103.653882] [<ffffffff81340d45>] usb_submit_urb+0x249/0x260 > Aug 15 23:14:03 security kernel: [ 103.653919] [<ffffffffa0060615>] em28xx_irq_callback+0xc9/0xfd [em28xx] > Aug 15 23:14:03 security kernel: [ 103.653946] [<ffffffff8133f16a>] usb_hcd_giveback_urb+0x84/0xb8 > Aug 15 23:14:03 security kernel: [ 103.653972] [<ffffffff8136480a>] xhci_handle_event+0xced/0xdf7 > Aug 15 23:14:03 security kernel: [ 103.653998] [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7 > Aug 15 23:14:03 security kernel: [ 103.654024] [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7 > Aug 15 23:14:03 security kernel: [ 103.654050] [<ffffffff81364a31>] xhci_irq+0x11d/0x1af > Aug 15 23:14:03 security kernel: [ 103.654074] [<ffffffff81364ade>] xhci_msi_irq+0x1b/0x1d > Aug 15 23:14:03 security kernel: [ 103.654103] [<ffffffff8108e4aa>] handle_IRQ_event+0x20/0xc7 > Aug 15 23:14:03 security kernel: [ 103.654128] [<ffffffff810902a4>] handle_level_irq+0x98/0xf3 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff812997a8>] __xen_evtchn_do_upcall+0xe1/0x183 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8129a423>] xen_evtchn_do_upcall+0x2a/0x3c > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8100a9ee>] xen_do_hypervisor_callback+0x1e/0x30 > Aug 15 23:14:03 security kernel: [ 103.654151] <EOI> [<ffffffff810012eb>] ? hypercall_page+0x2eb/0x1006 > Aug 15 23:14:03 security kernel: [ 103.654151] Mem-Info: > Aug 15 23:14:03 security kernel: [ 103.654151] DMA per-cpu: > Aug 15 23:14:03 security kernel: [ 103.654151] CPU 0: hi: 0, btch: 1 usd: 0 > Aug 15 23:14:03 security kernel: [ 103.654151] DMA32 per-cpu: > Aug 15 23:14:03 security kernel: [ 103.654151] CPU 0: hi: 186, btch: 31 usd: 168 > Aug 15 23:14:03 security kernel: [ 103.654151] active_anon:4503 inactive_anon:4767 isolated_anon:0 > Aug 15 23:14:03 security kernel: [ 103.654151] active_file:4071 inactive_file:9087 isolated_file:0 > Aug 15 23:14:03 security kernel: [ 103.654151] unevictable:0 dirty:1081 writeback:0 unstable:0 > Aug 15 23:14:03 security kernel: [ 103.654151] free:1584 slab_reclaimable:5991 slab_unreclaimable:153956 > Aug 15 23:14:03 security kernel: [ 103.654151] mapped:3401 shmem:48 pagetables:1452 bounce:0 > Aug 15 23:14:03 security kernel: [ 103.654151] DMA free:3992kB min:60kB low:72kB high:88kB active_anon:84kB inactive_anon:436kB active_file:24kB inactive_file:148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15712kB mlocked:0kB dirty:20kB writeback:0kB mapped:16kB shmem:0kB slab_reclaimable:264kB slab_unreclaimable:8932kB kernel_stack:0kB pagetables:384kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > Aug 15 23:14:03 security kernel: [ 103.654151] lowmem_reserve[]: 0 994 994 994 > Aug 15 23:14:03 security kernel: [ 103.654151] DMA32 free:2344kB min:4000kB low:5000kB high:6000kB active_anon:17928kB inactive_anon:18632kB active_file:16260kB inactive_file:36200kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018080kB mlocked:0kB dirty:4304kB writeback:0kB mapped:13588kB shmem:192kB slab_reclaimable:23700kB slab_unreclaimable:606892kB kernel_stack:720kB pagetables:5424kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > Aug 15 23:14:03 security kernel: [ 103.654151] lowmem_reserve[]: 0 0 0 0 > Aug 15 23:14:03 security kernel: [ 103.654151] DMA: 88*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3992kB > Aug 15 23:14:03 security kernel: [ 103.654151] DMA32: 406*4kB 0*8kB 1*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2344kB > Aug 15 23:14:03 security kernel: [ 103.654151] 13236 total pagecache pages > Aug 15 23:14:03 security kernel: [ 103.654151] 21 pages in swap cache > Aug 15 23:14:03 security kernel: [ 103.654151] Swap cache stats: add 32, delete 11, find 27/27 > Aug 15 23:14:03 security kernel: [ 103.654151] Free swap = 524200kB > Aug 15 23:14:03 security kernel: [ 103.654151] Total swap = 524284kB > Aug 15 23:14:03 security kernel: [ 103.654151] 262128 pages RAM > Aug 15 23:14:03 security kernel: [ 103.654151] 73203 pages reserved > Aug 15 23:14:03 security kernel: [ 103.654151] 21103 pages shared > Aug 15 23:14:03 security kernel: [ 103.654151] 105944 pages non-shared > Aug 15 23:14:03 security kernel: [ 103.654151] kmemleak: Cannot allocate a kmemleak_object structure > Aug 15 23:14:03 security kernel: [ 103.654151] Pid: 1418, comm: motion Not tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14 > Aug 15 23:14:03 security kernel: [ 103.654151] Call Trace: > Aug 15 23:14:03 security kernel: [ 103.654151] <IRQ> [<ffffffff810ce9d8>] create_object+0x49/0x2a1 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810c8f68>] ? cache_alloc_debugcheck_after+0x1bf/0x239 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8143e7d7>] kmemleak_alloc+0x21/0x3e > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810c9edf>] kmem_cache_alloc+0x197/0x1c4 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81340450>] usb_hcd_submit_urb+0xb1f/0xbf5 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81006b4d>] ? xen_force_evtchn_callback+0xd/0xf > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81007202>] ? check_events+0x12/0x20 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff812b6267>] ? vt_console_print+0x43/0x346 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8106d5b0>] ? lock_release+0x1e0/0x1ed > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81340d45>] usb_submit_urb+0x249/0x260 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffffa0060615>] em28xx_irq_callback+0xc9/0xfd [em28xx] > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8133f16a>] usb_hcd_giveback_urb+0x84/0xb8 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8136480a>] xhci_handle_event+0xced/0xdf7 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81364a31>] xhci_irq+0x11d/0x1af > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81364ade>] xhci_msi_irq+0x1b/0x1d > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8108e4aa>] handle_IRQ_event+0x20/0xc7 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810902a4>] handle_level_irq+0x98/0xf3 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff812997a8>] __xen_evtchn_do_upcall+0xe1/0x183 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8129a423>] xen_evtchn_do_upcall+0x2a/0x3c > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8100a9ee>] xen_do_hypervisor_callback+0x1e/0x30 > Aug 15 23:14:03 security kernel: [ 103.654151] <EOI> [<ffffffff810012eb>] ? hypercall_page+0x2eb/0x1006 > Aug 15 23:14:03 security kernel: [ 103.654151] kmemleak: Kernel memory leak detector disabled > Aug 15 23:14:03 security kernel: [ 103.654151] xhci_hcd 0000:08:00.0: Found td. Clear skip flag. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 0, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 1, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 2, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 3, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 4, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 5, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 6, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 7, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.659299] xhci_hcd 0000:08:00.0: Miss service interval error, set skip flag > Aug 15 23:14:03 security kernel: [ 103.659307] xhci_hcd 0000:08:00.0: Found td. Clear skip flag. > Aug 15 23:14:03 security kernel: [ 103.661748] kmemleak: Automatic memory scanning thread ended > Aug 15 23:14:03 security kernel: [ 103.755620] em28xx #0 print_err_status :URB packet 6, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.755630] em28xx #0 print_err_status :URB packet 7, status -18 [Unknown].> The first fault is kmemleaker not being able to allocate its structure, > and then it throws an error too.> Then the em28xx gets horribly confused that it could not send an URB > anymore and from there one things get worst.> There is something causing this memory problem,and I think we can narrow > it down if you output the kmemleaker output. You might have to > unload/reload the em28xx driver to get an idea of where the leak is.>> So what i have as domU kernel is: >> - from Linus tree 2.6.36-pre with latest commit 8357422d4bf33bc2c35884d4016c3fc9efbbc1d2 >> - pulled your devel/xen-pcifront-0.5 tree, and fixed the merge conflicts due to the pv on hvm patches, in the same way Jeremy did for his 2.6.36 branch. >> Conflicts: >> drivers/xen/events.c >> include/xen/events.h >> >> - Added a patch for xhci isoc that fixes another bug. (attached 0001-xHCI-update-ring-dequeue-pointer-when-process-missed.patch) >> - Added a patch that shows extra debug info for xhci from the author of the xhci-isoc pathes (attached isoc_length5.patch) >> >> - Changed some debug lines from dbg to warn level, (just enabling xhci-debug in kernel config floods the logs too fast, so i just enabled them in xhci-mem.c) >> (Attached a patch with all my changes to xhci*, including the 2 patches above applied.) >> >> - Added some printk''s to swiotlb-xen.c to see which functions were used, and let some of them print the address as well, in the hope i could find some debug info there.(attached)> So the other thing I realized is that xhci can actually do 64bit > transfers. Try booting without the ''swiotlb=force'' (but still have > iommu=soft).I will give this a try !>> >> Apart from fixing the xhci in the end, is there a way to prevent xen from freezing altogether without leaving a trace ?> I am still puzzled as why Xen hypervisor would freeze. It _should_ work > just fine.Yes that''s some concern, and it does with USB 2, just before my holiday I switched back to USB 2 with the same grabber, and it has run for 20 days now with continuous grabbing and a lot of disk and network activity. So it definitely is about XHCI. And freezing the hypervisor without a trace is definitely the last and worst option on my list :-)>> Even an Oops is much easier to debug than a freeze. Due to the nature of DMA that could perhaps be difficult, although there is an DMA API ... >> >> Had a fruitful LinuxCon ?> Quite so! >> >> -- >> Sander-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Pasi, Although I don''t think it is related to ATA (the machine has run fine with a lot of disk and network access, while grabbing via a USB2 instead of USB3 controller for 20 days now). Nevertheless it''s always worth a try. -- Sander Friday, August 27, 2010, 4:32:42 PM, you wrote:> On Fri, Aug 27, 2010 at 10:19:55AM -0400, Konrad Rzeszutek Wilk wrote: >> On Fri, Aug 27, 2010 at 09:46:37AM +0300, Pasi Kärkkäinen wrote: >> > On Thu, Aug 26, 2010 at 05:43:21PM -0400, Konrad Rzeszutek Wilk wrote: >> > > >> > > > >> > > > Apart from fixing the xhci in the end, is there a way to prevent xen from freezing altogether without leaving a trace ? >> > > >> > > I am still puzzled as why Xen hypervisor would freeze. It _should_ work >> > > just fine. >> > > >> > >> > There has been multiple reports about these freezes.. >> > for one user ''acpi=ht'' fixes the freezes. >> > >> > Or another way: libata.noacpi=1 fixes it aswell for that user. >> >> Weren''t those bootup options thought? As in, without those the user was >> not able to boot? Not as something where two days in the machine >> suddenly stops working? >>> They were causing freezes at random during operation, > without any log/serialconsole debug messages..> -- Pasi-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Konrad, It seems i have a working setup now, passing through a xhci controller to a guest, and keep grabbing video, it has ran for 2 days now. But i have changed a lot of parameters, so I will try to find out which one seems to be the problem, or that everything is miraculously fixed :-) (it''s on other hardware (intel platform) now, as the only VM, newer dom0 kernel, newer domU kernel, no swiotlb=force anymore, the boot option pasi suggested, no irq balance etc.) Will post again when i know something more ! -- Sander Thursday, August 26, 2010, 11:43:21 PM, you wrote:>> > Looking at the output, the physical addresses that DMA-ed are: >> >> > 0x1f2962dc0 >> > 0x1f24f2e68 >> >> > and they look to be called quite often. In fact, there looks to be a >> > loop that does something like this: >> >> > again: >> > p = kmalloc(..) >> >> > dma = pci_map_single(p) >> > pci_dma_mapping_error(dma); >> > /* get some data.. */ >> > /* parse the: (pipe 0x80000280): IN: c0 00 00 00 0c 00 01 00 */ >> > pci_unmap_sg(dma); >> > goto again; >> >> > As the virtual address sent to pci_map_single looks to be sequentially >> > increasing.> I am not sure what virtual address I was thinking off. Looking again > at the dmesg output it looks actually correct. SWIOTLB is giving out > chunks, increasing its own index, and then unmaping those pages right > away.> This stack-trace in the middle of the log: > > Aug 15 23:14:02 security kernel: [ 103.646497] em28xx #0 em28xx_isoc_copy :Video frame 1, length=960, odd > Aug 15 23:14:02 security kernel: [ 103.650177] xhci_hcd 0000:08:00.0: Miss service interval error, set skip flag > Aug 15 23:14:02 security kernel: [ 103.651371] em28xx #0 em28xx_irq_callback :urb completition error -18. > Aug 15 23:14:02 security kernel: [ 103.651406] em28xx #0 print_err_status :URB status -18 [Unknown]. > Aug 15 23:14:02 security kernel: [ 103.651432] em28xx #0 em28xx_isoc_copy :Video frame 2, length=2888, even > Aug 15 23:14:02 security kernel: [ 103.651454] em28xx #0 get_next_buf :No active queue to serve > Aug 15 23:14:02 security kernel: [ 103.651477] em28xx #0 print_err_status :URB packet 63, status -18 [Unknown]. > Aug 15 23:14:02 security kernel: [ 103.653143] motion: page allocation failure. order:1, mode:0x20 > Aug 15 23:14:02 security kernel: [ 103.653177] Pid: 1418, comm: motion Not tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14 > Aug 15 23:14:02 security kernel: [ 103.653205] Call Trace: > Aug 15 23:14:02 security kernel: [ 103.653219] <IRQ> [<ffffffff810a7ebf>] __alloc_pages_nodemask+0x667/0x6c7 > Aug 15 23:14:02 security kernel: [ 103.653272] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:02 security kernel: [ 103.653300] [<ffffffff810ca32f>] cache_alloc_refill+0x3b3/0x851 > Aug 15 23:14:02 security kernel: [ 103.653326] [<ffffffff81006b4d>] ? xen_force_evtchn_callback+0xd/0xf > Aug 15 23:14:02 security kernel: [ 103.653352] [<ffffffff81007202>] ? check_events+0x12/0x20 > Aug 15 23:14:02 security kernel: [ 103.653375] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:02 security kernel: [ 103.653401] [<ffffffff810ce9bd>] ? create_object+0x2e/0x2a1 > Aug 15 23:14:02 security kernel: [ 103.653425] [<ffffffff810c9e3f>] kmem_cache_alloc+0xf7/0x1c4 > Aug 15 23:14:02 security kernel: [ 103.653451] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:02 security kernel: [ 103.653475] [<ffffffff810ce9bd>] create_object+0x2e/0x2a1 > Aug 15 23:14:02 security kernel: [ 103.653499] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:02 security kernel: [ 103.653523] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:02 security kernel: [ 103.653548] [<ffffffff810c8f68>] ? cache_alloc_debugcheck_after+0x1bf/0x239 > Aug 15 23:14:02 security kernel: [ 103.653575] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:02 security kernel: [ 103.653602] [<ffffffff8143e7d7>] kmemleak_alloc+0x21/0x3e > Aug 15 23:14:02 security kernel: [ 103.653625] [<ffffffff810c9edf>] kmem_cache_alloc+0x197/0x1c4 > Aug 15 23:14:02 security kernel: [ 103.653649] [<ffffffff8135e085>] xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:02 security kernel: [ 103.653675] [<ffffffff81340450>] usb_hcd_submit_urb+0xb1f/0xbf5 > Aug 15 23:14:02 security kernel: [ 103.653701] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:02 security kernel: [ 103.653726] [<ffffffff81006b4d>] ? xen_force_evtchn_callback+0xd/0xf > Aug 15 23:14:02 security kernel: [ 103.653751] [<ffffffff81007202>] ? check_events+0x12/0x20 > Aug 15 23:14:02 security kernel: [ 103.653777] [<ffffffff812b6267>] ? vt_console_print+0x43/0x346 > Aug 15 23:14:02 security kernel: [ 103.653802] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:02 security kernel: [ 103.653830] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:02 security kernel: [ 103.653857] [<ffffffff8106d5b0>] ? lock_release+0x1e0/0x1ed > Aug 15 23:14:03 security kernel: [ 103.653882] [<ffffffff81340d45>] usb_submit_urb+0x249/0x260 > Aug 15 23:14:03 security kernel: [ 103.653919] [<ffffffffa0060615>] em28xx_irq_callback+0xc9/0xfd [em28xx] > Aug 15 23:14:03 security kernel: [ 103.653946] [<ffffffff8133f16a>] usb_hcd_giveback_urb+0x84/0xb8 > Aug 15 23:14:03 security kernel: [ 103.653972] [<ffffffff8136480a>] xhci_handle_event+0xced/0xdf7 > Aug 15 23:14:03 security kernel: [ 103.653998] [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7 > Aug 15 23:14:03 security kernel: [ 103.654024] [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7 > Aug 15 23:14:03 security kernel: [ 103.654050] [<ffffffff81364a31>] xhci_irq+0x11d/0x1af > Aug 15 23:14:03 security kernel: [ 103.654074] [<ffffffff81364ade>] xhci_msi_irq+0x1b/0x1d > Aug 15 23:14:03 security kernel: [ 103.654103] [<ffffffff8108e4aa>] handle_IRQ_event+0x20/0xc7 > Aug 15 23:14:03 security kernel: [ 103.654128] [<ffffffff810902a4>] handle_level_irq+0x98/0xf3 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff812997a8>] __xen_evtchn_do_upcall+0xe1/0x183 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8129a423>] xen_evtchn_do_upcall+0x2a/0x3c > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8100a9ee>] xen_do_hypervisor_callback+0x1e/0x30 > Aug 15 23:14:03 security kernel: [ 103.654151] <EOI> [<ffffffff810012eb>] ? hypercall_page+0x2eb/0x1006 > Aug 15 23:14:03 security kernel: [ 103.654151] Mem-Info: > Aug 15 23:14:03 security kernel: [ 103.654151] DMA per-cpu: > Aug 15 23:14:03 security kernel: [ 103.654151] CPU 0: hi: 0, btch: 1 usd: 0 > Aug 15 23:14:03 security kernel: [ 103.654151] DMA32 per-cpu: > Aug 15 23:14:03 security kernel: [ 103.654151] CPU 0: hi: 186, btch: 31 usd: 168 > Aug 15 23:14:03 security kernel: [ 103.654151] active_anon:4503 inactive_anon:4767 isolated_anon:0 > Aug 15 23:14:03 security kernel: [ 103.654151] active_file:4071 inactive_file:9087 isolated_file:0 > Aug 15 23:14:03 security kernel: [ 103.654151] unevictable:0 dirty:1081 writeback:0 unstable:0 > Aug 15 23:14:03 security kernel: [ 103.654151] free:1584 slab_reclaimable:5991 slab_unreclaimable:153956 > Aug 15 23:14:03 security kernel: [ 103.654151] mapped:3401 shmem:48 pagetables:1452 bounce:0 > Aug 15 23:14:03 security kernel: [ 103.654151] DMA free:3992kB min:60kB low:72kB high:88kB active_anon:84kB inactive_anon:436kB active_file:24kB inactive_file:148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15712kB mlocked:0kB dirty:20kB writeback:0kB mapped:16kB shmem:0kB slab_reclaimable:264kB slab_unreclaimable:8932kB kernel_stack:0kB pagetables:384kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > Aug 15 23:14:03 security kernel: [ 103.654151] lowmem_reserve[]: 0 994 994 994 > Aug 15 23:14:03 security kernel: [ 103.654151] DMA32 free:2344kB min:4000kB low:5000kB high:6000kB active_anon:17928kB inactive_anon:18632kB active_file:16260kB inactive_file:36200kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018080kB mlocked:0kB dirty:4304kB writeback:0kB mapped:13588kB shmem:192kB slab_reclaimable:23700kB slab_unreclaimable:606892kB kernel_stack:720kB pagetables:5424kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > Aug 15 23:14:03 security kernel: [ 103.654151] lowmem_reserve[]: 0 0 0 0 > Aug 15 23:14:03 security kernel: [ 103.654151] DMA: 88*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3992kB > Aug 15 23:14:03 security kernel: [ 103.654151] DMA32: 406*4kB 0*8kB 1*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2344kB > Aug 15 23:14:03 security kernel: [ 103.654151] 13236 total pagecache pages > Aug 15 23:14:03 security kernel: [ 103.654151] 21 pages in swap cache > Aug 15 23:14:03 security kernel: [ 103.654151] Swap cache stats: add 32, delete 11, find 27/27 > Aug 15 23:14:03 security kernel: [ 103.654151] Free swap = 524200kB > Aug 15 23:14:03 security kernel: [ 103.654151] Total swap = 524284kB > Aug 15 23:14:03 security kernel: [ 103.654151] 262128 pages RAM > Aug 15 23:14:03 security kernel: [ 103.654151] 73203 pages reserved > Aug 15 23:14:03 security kernel: [ 103.654151] 21103 pages shared > Aug 15 23:14:03 security kernel: [ 103.654151] 105944 pages non-shared > Aug 15 23:14:03 security kernel: [ 103.654151] kmemleak: Cannot allocate a kmemleak_object structure > Aug 15 23:14:03 security kernel: [ 103.654151] Pid: 1418, comm: motion Not tainted 2.6.352.6.36pre+pci-front-xhci-20100813-xhcidebug+ #14 > Aug 15 23:14:03 security kernel: [ 103.654151] Call Trace: > Aug 15 23:14:03 security kernel: [ 103.654151] <IRQ> [<ffffffff810ce9d8>] create_object+0x49/0x2a1 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810c8f68>] ? cache_alloc_debugcheck_after+0x1bf/0x239 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] ? xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8143e7d7>] kmemleak_alloc+0x21/0x3e > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810c9edf>] kmem_cache_alloc+0x197/0x1c4 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8135e085>] xhci_urb_enqueue+0x14c/0x3f5 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81340450>] usb_hcd_submit_urb+0xb1f/0xbf5 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81006b4d>] ? xen_force_evtchn_callback+0xd/0xf > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81007202>] ? check_events+0x12/0x20 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff812b6267>] ? vt_console_print+0x43/0x346 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810071ef>] ? xen_restore_fl_direct_end+0x0/0x1 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8106d5b0>] ? lock_release+0x1e0/0x1ed > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81340d45>] usb_submit_urb+0x249/0x260 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffffa0060615>] em28xx_irq_callback+0xc9/0xfd [em28xx] > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8133f16a>] usb_hcd_giveback_urb+0x84/0xb8 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8136480a>] xhci_handle_event+0xced/0xdf7 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff813648a8>] xhci_handle_event+0xd8b/0xdf7 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81364a31>] xhci_irq+0x11d/0x1af > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff81364ade>] xhci_msi_irq+0x1b/0x1d > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8108e4aa>] handle_IRQ_event+0x20/0xc7 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff810902a4>] handle_level_irq+0x98/0xf3 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff812997a8>] __xen_evtchn_do_upcall+0xe1/0x183 > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8129a423>] xen_evtchn_do_upcall+0x2a/0x3c > Aug 15 23:14:03 security kernel: [ 103.654151] [<ffffffff8100a9ee>] xen_do_hypervisor_callback+0x1e/0x30 > Aug 15 23:14:03 security kernel: [ 103.654151] <EOI> [<ffffffff810012eb>] ? hypercall_page+0x2eb/0x1006 > Aug 15 23:14:03 security kernel: [ 103.654151] kmemleak: Kernel memory leak detector disabled > Aug 15 23:14:03 security kernel: [ 103.654151] xhci_hcd 0000:08:00.0: Found td. Clear skip flag. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 0, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 1, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 2, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 3, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 4, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 5, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 6, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.654151] em28xx #0 print_err_status :URB packet 7, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.659299] xhci_hcd 0000:08:00.0: Miss service interval error, set skip flag > Aug 15 23:14:03 security kernel: [ 103.659307] xhci_hcd 0000:08:00.0: Found td. Clear skip flag. > Aug 15 23:14:03 security kernel: [ 103.661748] kmemleak: Automatic memory scanning thread ended > Aug 15 23:14:03 security kernel: [ 103.755620] em28xx #0 print_err_status :URB packet 6, status -18 [Unknown]. > Aug 15 23:14:03 security kernel: [ 103.755630] em28xx #0 print_err_status :URB packet 7, status -18 [Unknown].> The first fault is kmemleaker not being able to allocate its structure, > and then it throws an error too.> Then the em28xx gets horribly confused that it could not send an URB > anymore and from there one things get worst.> There is something causing this memory problem,and I think we can narrow > it down if you output the kmemleaker output. You might have to > unload/reload the em28xx driver to get an idea of where the leak is.>> So what i have as domU kernel is: >> - from Linus tree 2.6.36-pre with latest commit 8357422d4bf33bc2c35884d4016c3fc9efbbc1d2 >> - pulled your devel/xen-pcifront-0.5 tree, and fixed the merge conflicts due to the pv on hvm patches, in the same way Jeremy did for his 2.6.36 branch. >> Conflicts: >> drivers/xen/events.c >> include/xen/events.h >> >> - Added a patch for xhci isoc that fixes another bug. (attached 0001-xHCI-update-ring-dequeue-pointer-when-process-missed.patch) >> - Added a patch that shows extra debug info for xhci from the author of the xhci-isoc pathes (attached isoc_length5.patch) >> >> - Changed some debug lines from dbg to warn level, (just enabling xhci-debug in kernel config floods the logs too fast, so i just enabled them in xhci-mem.c) >> (Attached a patch with all my changes to xhci*, including the 2 patches above applied.) >> >> - Added some printk''s to swiotlb-xen.c to see which functions were used, and let some of them print the address as well, in the hope i could find some debug info there.(attached)> So the other thing I realized is that xhci can actually do 64bit > transfers. Try booting without the ''swiotlb=force'' (but still have > iommu=soft).>> >> Apart from fixing the xhci in the end, is there a way to prevent xen from freezing altogether without leaving a trace ?> I am still puzzled as why Xen hypervisor would freeze. It _should_ work > just fine.>> Even an Oops is much easier to debug than a freeze. Due to the nature of DMA that could perhaps be difficult, although there is an DMA API ... >> >> Had a fruitful LinuxCon ?> Quite so! >> >> -- >> Sander-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel