Sander Eikelenboom
2010-Aug-03 15:30 UTC
[Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
Hi All, I''m experiencing for what it seems a random freeze with current xen-4.0-testing, pvops dom0 2.6.32.16 kernel, most of the time within 2 days after rebooting. Symptoms: - Complete freeze, only power cycle does work. - No bug output/stacktrace in serial log / on screen. - Not able to get into hypervisor with ctrl-a (doesn''t react to keyboard) - No info in syslog. Are there any more boot options I could give a try in the hope it will give some debug output ? title xen-4.0.1-rc5-pre.gz / Debian GNU/Linux, 2.6.32.16+xen-2.6.32.x-20100731 root (hd0,0) kernel /xen-4.0.1-rc5-pre.gz dom0_mem=768M loglvl=all loglvl_guest=all com1=115200,8n1 sync_console console_to_ring console=com1,vga iommu=0,verbose,amd_iommu_debug lapic=debug apic_verbosity=debug apic=debug module /vmlinuz-2.6.32.16+xen-2.6.32.x-20100731 root=/dev/mapper/serveerstertje-root ro earlyprintk=xen max_loop=255 loop_max_part=63 console=hvc0 xen-pciback.hide=(03:06.0)(04:00.0)(08:00.0)(0a:01.1)(0a:01.2)(0f:00.0) pci=resource_alignment=03:06.0;04:00.0;08:00.0;0a:01.0;0a:01.1;0a:01.2;0f:00.0 module /initrd.img-2.6.32.16+xen-2.6.32.x-20100731 serveerstertje:~# xm info host : serveerstertje release : 2.6.32.16+xen-2.6.32.x-20100731 version : #1 SMP Sat Jul 31 14:27:35 CEST 2010 machine : x86_64 nr_cpus : 6 nr_nodes : 1 cores_per_socket : 6 threads_per_core : 1 cpu_mhz : 3200 hw_caps : 178bf3ff:efd3fbff:00000000:00001310:00802001:00000000:000037ff:00000000 virt_caps : hvm total_memory : 8191 free_memory : 6312 node_to_cpu : node0:0-5 node_to_memory : node0:6312 node_to_dma32_mem : node0:2876 max_node_id : 0 xen_major : 4 xen_minor : 0 xen_extra : .1-rc5-pre xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : Sun Jul 25 22:22:43 2010 +0100 21287:e6b5b2cb8146 xen_commandline : dom0_mem=768M loglvl=all loglvl_guest=all com1=115200,8n1 sync_console console_to_ring console=com1,vga iommu=0,verbose,amd_iommu_debug lapic=debug apic_verbosity=debug apic=debug cc_compiler : gcc version 4.3.2 (Debian 4.3.2-1.1) cc_compile_by : root cc_compile_domain : cc_compile_date : Sat Jul 31 12:06:59 CEST 2010 xend_config_format : 4 commit 78b55f90e72348e231092dbe3e50ac7414b9e1af Merge: c0a00fb... dee9469... Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Date: Wed Jul 28 00:19:39 2010 -0700 Merge branch ''xen/next-2.6.32'' into xen/stable-2.6.32.x * xen/next-2.6.32: (24 commits) cmpxchg: fix some 32-bit typos x86/cmpxchg: fix asm constraints to mention memory modification apply_to_page_range: fix compilation warning xen/blktap: #if CONFIG_XEN -> #ifdef CONFIG_XEN x86/hugepte: use set_pgd for 2-level pagetables implement O_NONBLOCK for /proc/xen/xenbus xen-pcifront: Remove usage of spin-locks. xen-pciback: Redo spinlock usage. xen-pcifront: Fix spinlock usage. xen-pcifront: Don''t race with udev when discovering new devices. xen/blktap: make protocol specific usage of shared sring explicit xen/netback: make protocol specific usage of shared sring explicit xen/rings: make protocol specific usage of shared sring explicit xen/rings: make protocol specific usage of shared sring explicit xen/netfront: make protocol specific usage of shared sring explicit xen/rings: make protocol specific usage of shared sring explicit xen/rings: make protocol specific usage of shared sring explicit xen/blkback: Flush blkback data when connecting. xen: support large numbers of CPUs with vcpu info placement rtnetlink: make SR-IOV VF interface symmetric ... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Aug-03 15:45 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
On Tue, Aug 03, 2010 at 05:30:57PM +0200, Sander Eikelenboom wrote:> Hi All, > > I''m experiencing for what it seems a random freeze with current xen-4.0-testing, pvops dom0 2.6.32.16 kernel, most of the time within 2 days after rebooting. >You did not experience the freeze with 2.6.32.15?> Symptoms: > - Complete freeze, only power cycle does work. > - No bug output/stacktrace in serial log / on screen. > - Not able to get into hypervisor with ctrl-a (doesn''t react to keyboard) > - No info in syslog. > > Are there any more boot options I could give a try in the hope it will give some debug output ?The Linux kernel has some of those ''DETECT_SPINLOCK_HANG'' or ''DETECT_WORK..something'' flags. It might be a good idea to compile those and see when your machine freezes if after 2 minutes the kernel starts spitting out what is hung. That could give some idea. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-03 15:51 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
On 08/03/2010 08:45 AM, Konrad Rzeszutek Wilk wrote:> On Tue, Aug 03, 2010 at 05:30:57PM +0200, Sander Eikelenboom wrote: >> Hi All, >> >> I''m experiencing for what it seems a random freeze with current xen-4.0-testing, pvops dom0 2.6.32.16 kernel, most of the time within 2 days after rebooting. >> > You did not experience the freeze with 2.6.32.15?There have been a few updates to the .32.16 kernel too (and now its .17...). But it would be very useful to identify which the last working kernel was.>> Symptoms: >> - Complete freeze, only power cycle does work. >> - No bug output/stacktrace in serial log / on screen. >> - Not able to get into hypervisor with ctrl-a (doesn''t react to keyboard) >> - No info in syslog. >> >> Are there any more boot options I could give a try in the hope it will give some debug output ? > The Linux kernel has some of those ''DETECT_SPINLOCK_HANG'' or > ''DETECT_WORK..something'' flags. It might be a good idea to compile those > and see when your machine freezes if after 2 minutes the kernel starts > spitting out what is hung. That could give some idea. >If Xen doesn''t respond then it isn''t a kernel spinlock problem; it looks more system-wide than that. I notice the kernel command line has lots of hidden PCI devices. Sander, is there any particular activity (esp passthrough device activity) which might correspond to the hang? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2010-Aug-03 16:18 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
Hi Jeremy, Yes, i have a domU with 2 usb cards passed through, with 2 USB videograbbers attached. This domain is running Konrad''s devel/merge.2.6.35-rc6.t2 tree and some additional patches for the usb3/xhci card, which still give some trouble. But i didn''t expect both dom0 and the hypervisor to freeze as well, and leaving no clues :( -- Sander Tuesday, August 3, 2010, 5:51:26 PM, you wrote:> On 08/03/2010 08:45 AM, Konrad Rzeszutek Wilk wrote: >> On Tue, Aug 03, 2010 at 05:30:57PM +0200, Sander Eikelenboom wrote: >>> Hi All, >>> >>> I''m experiencing for what it seems a random freeze with current xen-4.0-testing, pvops dom0 2.6.32.16 kernel, most of the time within 2 days after rebooting. >>> >> You did not experience the freeze with 2.6.32.15?> There have been a few updates to the .32.16 kernel too (and now its > .17...). But it would be very useful to identify which the last working > kernel was.>>> Symptoms: >>> - Complete freeze, only power cycle does work. >>> - No bug output/stacktrace in serial log / on screen. >>> - Not able to get into hypervisor with ctrl-a (doesn''t react to keyboard) >>> - No info in syslog. >>> >>> Are there any more boot options I could give a try in the hope it will give some debug output ? >> The Linux kernel has some of those ''DETECT_SPINLOCK_HANG'' or >> ''DETECT_WORK..something'' flags. It might be a good idea to compile those >> and see when your machine freezes if after 2 minutes the kernel starts >> spitting out what is hung. That could give some idea. >>> If Xen doesn''t respond then it isn''t a kernel spinlock problem; it looks > more system-wide than that. I notice the kernel command line has lots > of hidden PCI devices. Sander, is there any particular activity (esp > passthrough device activity) which might correspond to the hang?> J-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2010-Aug-03 17:18 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
Hi Jeremy, The last kernel that worked is quite hard to say, since the xhci patches depend on a 2.6.35-rc6 kernel, so i took that one from Konrad''s tree. But those pciback changes make it so i have to use a recent dom0 kernel as well (if i recall correct form some spinlock cleanups to work). -- Sander Tuesday, August 3, 2010, 5:51:26 PM, you wrote:> On 08/03/2010 08:45 AM, Konrad Rzeszutek Wilk wrote: >> On Tue, Aug 03, 2010 at 05:30:57PM +0200, Sander Eikelenboom wrote: >>> Hi All, >>> >>> I''m experiencing for what it seems a random freeze with current xen-4.0-testing, pvops dom0 2.6.32.16 kernel, most of the time within 2 days after rebooting. >>> >> You did not experience the freeze with 2.6.32.15?> There have been a few updates to the .32.16 kernel too (and now its > .17...). But it would be very useful to identify which the last working > kernel was.>>> Symptoms: >>> - Complete freeze, only power cycle does work. >>> - No bug output/stacktrace in serial log / on screen. >>> - Not able to get into hypervisor with ctrl-a (doesn''t react to keyboard) >>> - No info in syslog. >>> >>> Are there any more boot options I could give a try in the hope it will give some debug output ? >> The Linux kernel has some of those ''DETECT_SPINLOCK_HANG'' or >> ''DETECT_WORK..something'' flags. It might be a good idea to compile those >> and see when your machine freezes if after 2 minutes the kernel starts >> spitting out what is hung. That could give some idea. >>> If Xen doesn''t respond then it isn''t a kernel spinlock problem; it looks > more system-wide than that. I notice the kernel command line has lots > of hidden PCI devices. Sander, is there any particular activity (esp > passthrough device activity) which might correspond to the hang?> J-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2010-Aug-05 09:48 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
Hi Konrad/Jeremy, I have tested the last 2 days with the vm''s with passthroughed devices shutdown, and no freeze so far. I''m running now with one of the vm''s that runs an old 2.6.33 kernel from an old tree from Konrad together with some hacked up patches for xhci/usb3 support. That seems to be running fine for some time now (although not a full 2 days yet). So my other vm seems to cause the freeze. - This one uses the devel/merge.2.6.35-rc6.t2 as domU kernel, i think i should try an older version of pci-front/xen-swiotlb perhaps. - It has both a usb2 and usb3 controller passed through, but the xhci module has much changed since the hacked up patches from the kernel in de working domU vm - Most probably the drivers for the videograbbers will have changed So i suspect: - newer pci-front / xen-swiotlb - xhci/usb3 driver - drivers videograbber Most probable would be a roque dma transfer that can''t be catched by xen / pciback I guess, and therefore would be hard to debug ? -- Sander Tuesday, August 3, 2010, 5:51:26 PM, you wrote:> On 08/03/2010 08:45 AM, Konrad Rzeszutek Wilk wrote: >> On Tue, Aug 03, 2010 at 05:30:57PM +0200, Sander Eikelenboom wrote: >>> Hi All, >>> >>> I''m experiencing for what it seems a random freeze with current xen-4.0-testing, pvops dom0 2.6.32.16 kernel, most of the time within 2 days after rebooting. >>> >> You did not experience the freeze with 2.6.32.15?> There have been a few updates to the .32.16 kernel too (and now its > .17...). But it would be very useful to identify which the last working > kernel was.>>> Symptoms: >>> - Complete freeze, only power cycle does work. >>> - No bug output/stacktrace in serial log / on screen. >>> - Not able to get into hypervisor with ctrl-a (doesn''t react to keyboard) >>> - No info in syslog. >>> >>> Are there any more boot options I could give a try in the hope it will give some debug output ? >> The Linux kernel has some of those ''DETECT_SPINLOCK_HANG'' or >> ''DETECT_WORK..something'' flags. It might be a good idea to compile those >> and see when your machine freezes if after 2 minutes the kernel starts >> spitting out what is hung. That could give some idea. >>> If Xen doesn''t respond then it isn''t a kernel spinlock problem; it looks > more system-wide than that. I notice the kernel command line has lots > of hidden PCI devices. Sander, is there any particular activity (esp > passthrough device activity) which might correspond to the hang?> J-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Aug-05 14:52 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
On Thu, Aug 05, 2010 at 11:48:44AM +0200, Sander Eikelenboom wrote:> Hi Konrad/Jeremy, > > I have tested the last 2 days with the vm''s with passthroughed devices shutdown, and no freeze so far. > I''m running now with one of the vm''s that runs an old 2.6.33 kernel from an old tree from Konrad together with some hacked up patches for xhci/usb3 support. > That seems to be running fine for some time now (although not a full 2 days yet). > > So my other vm seems to cause the freeze. > > - This one uses the devel/merge.2.6.35-rc6.t2 as domU kernel, i think i should try an older version of pci-front/xen-swiotlb perhaps. > - It has both a usb2 and usb3 controller passed through, but the xhci module has much changed since the hacked up patches from the kernel in de working domU vm > - Most probably the drivers for the videograbbers will have changed > > So i suspect: > - newer pci-front / xen-swiotlb > - xhci/usb3 driver > - drivers videograbber > > Most probable would be a roque dma transfer that can''t be catched by xen / pciback I guess, and therefore would be hard to debug ?The SWIOTLB "brains" by themselves haven''t changed since the uhh...2.6.33. The code internals that just got Ack-ed upstream looks quite similar to the one that Jeremy carries in xen/stable-2.6.32.x. The outside plumbing parts are the ones that changed. The fixes in the pci-front, well, most of those are "burocractic" in nature - set the ownership to this, make hotplug work, etc. The big fixes were the MSI/MSI-X ones but those were big news a couple of months ago (and I think that was when 2.6.34 came out). The videograbber (vl4) stack trace you sent to me some time ago looked liked a mutex was held for a very very long time... which I wonder if that is the cmpxch compiler bug that has hit some folks. Are you using Debian? But we can do something easy. I can rebase my 2.6.33 kernel with the latest Xen-SWIOTLB/SWIOTLB engine + Xen PCI front, and we can eliminate the SWIOTLB/PCIfront being at fault here.. Let me do that if your 2.6.33 VM guest is running fine for the last two days. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2010-Aug-05 15:12 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
Hi Konrad, Thx for your response, and i saw Linus pulled the swiotlb code .. way to go ! Thursday, August 5, 2010, 4:52:14 PM, you wrote:> On Thu, Aug 05, 2010 at 11:48:44AM +0200, Sander Eikelenboom wrote: >> Hi Konrad/Jeremy, >> >> I have tested the last 2 days with the vm''s with passthroughed devices shutdown, and no freeze so far. >> I''m running now with one of the vm''s that runs an old 2.6.33 kernel from an old tree from Konrad together with some hacked up patches for xhci/usb3 support. >> That seems to be running fine for some time now (although not a full 2 days yet). >> >> So my other vm seems to cause the freeze. >> >> - This one uses the devel/merge.2.6.35-rc6.t2 as domU kernel, i think i should try an older version of pci-front/xen-swiotlb perhaps. >> - It has both a usb2 and usb3 controller passed through, but the xhci module has much changed since the hacked up patches from the kernel in de working domU vm >> - Most probably the drivers for the videograbbers will have changed >> >> So i suspect: >> - newer pci-front / xen-swiotlb >> - xhci/usb3 driver >> - drivers videograbber >> >> Most probable would be a roque dma transfer that can''t be catched by xen / pciback I guess, and therefore would be hard to debug ?> The SWIOTLB "brains" by themselves haven''t changed since the > uhh...2.6.33. The code internals that just got Ack-ed upstream looks quite > similar to the one that Jeremy carries in xen/stable-2.6.32.x. The > outside plumbing parts are the ones that changed.> The fixes in the pci-front, well, most of those are "burocractic" in > nature - set the ownership to this, make hotplug work, etc. The big > fixes were the MSI/MSI-X ones but those were big news a couple of months > ago (and I think that was when 2.6.34 came out).> The videograbber (vl4) stack trace you sent to me some time ago looked > liked a mutex was held for a very very long time... which I wonder if > that is the cmpxch compiler bug that has hit some folks. Are you using > Debian?Yes i''m using Debian, i saw that bug fix too, but since Jeremy didn''t include it in stable yet i also didn''t :-) Well you gave me a pointer here, looking again it seems to hang on the device on the usb2 controller and not the usb3. So to rule out the usb3 stuff i will drop that usb2 controller and see if that works. If so, it must be a problem in the driver. Since that grabber + usb2 controller worked for quite a while grabbing perfectly.> But we can do something easy. I can rebase my 2.6.33 kernel with the > latest Xen-SWIOTLB/SWIOTLB engine + Xen PCI front, and we can eliminate the > SWIOTLB/PCIfront being at fault here.. Let me do that if your 2.6.33 > VM guest is running fine for the last two days.I will first try the above, if that doesn''t work out, i will try the 2.6.33 again for longer and report back ! Thx Again ! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-05 16:21 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
On 08/05/2010 07:52 AM, Konrad Rzeszutek Wilk wrote:> The videograbber (vl4) stack trace you sent to me some time ago looked > liked a mutex was held for a very very long time... which I wonder if > that is the cmpxch compiler bug that has hit some folks. Are you using > Debian?The symptom of that bug is that gcc doesn''t see any writes to a static variable, so it puts it in a RO section, causing faults. So while I guess its within the realm of possibilities that this is a different manifestation of the same bug, it doesn''t seem likely. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2010-Aug-06 09:21 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
Hi Konrad, Hmm it seems that 2.6.33 tree does seem to work for 1 VM with a videograbber, but doesn''t for the VM which seem to cause the freeze. It does spit out some stacktraces after a while of not functioning, with since is OOM i will be something else caused by the fall out and not anywhere near the root cause. Although this at least didn''t freeze the complete system :-) I will try some more configurations to see if i can find a pattern somehow ... -- Sander [ 1269.032133] submit of urb 0 failed (error=-90) [ 1274.153341] motion: page allocation failure. order:6, mode:0xd4 [ 1274.153375] Pid: 1884, comm: motion Not tainted 2.6.33 #5 [ 1274.153391] Call Trace: [ 1274.153416] [<ffffffff810e4665>] __alloc_pages_nodemask+0x5b2/0x62b [ 1274.153440] [<ffffffff810338b9>] ? xen_force_evtchn_callback+0xd/0xf [ 1274.153461] [<ffffffff810e46f5>] __get_free_pages+0x17/0x5f [ 1274.153483] [<ffffffff8128042e>] xen_swiotlb_alloc_coherent+0x3c/0xe2 [ 1274.153507] [<ffffffff81410931>] hcd_buffer_alloc+0xfa/0x11f [ 1274.153527] [<ffffffff81403e0c>] usb_buffer_alloc+0x17/0x1d [ 1274.153562] [<ffffffffa003f39e>] em28xx_init_isoc+0x16a/0x32b [em28xx] [ 1274.153585] [<ffffffff815ec0b9>] ? __down_read+0x47/0xed [ 1274.153613] [<ffffffffa003a4ac>] buffer_prepare+0xd7/0x10d [em28xx] [ 1274.153639] [<ffffffffa0016dac>] videobuf_qbuf+0x308/0x3f4 [videobuf_core] [ 1274.153667] [<ffffffffa0039cb3>] vidioc_qbuf+0x35/0x3a [em28xx] [ 1274.153697] [<ffffffffa0028229>] __video_do_ioctl+0x11ab/0x373b [videodev] [ 1274.153720] [<ffffffff814b51cd>] ? sock_def_readable+0x54/0x5f [ 1274.153743] [<ffffffff81541f65>] ? unix_dgram_sendmsg+0x3f1/0x43e [ 1274.153764] [<ffffffff810313b5>] ? __raw_callee_save_xen_pud_val+0x11/0x1e [ 1274.153793] [<ffffffffa0039c7e>] ? vidioc_qbuf+0x0/0x3a [em28xx] [ 1274.153814] [<ffffffff814b208b>] ? sock_sendmsg+0xa3/0xbc [ 1274.153837] [<ffffffff8123349b>] ? avc_has_perm+0x4e/0x60 [ 1274.153855] [<ffffffff810338b9>] ? xen_force_evtchn_callback+0xd/0xf [ 1274.153880] [<ffffffffa002aab1>] video_ioctl2+0x2f8/0x3af [videodev] [ 1274.153901] [<ffffffff810357df>] ? __switch_to+0x265/0x277 [ 1274.153924] [<ffffffffa0026122>] v4l2_ioctl+0x38/0x3a [videodev] [ 1274.153944] [<ffffffff8111ec90>] vfs_ioctl+0x72/0x9e [ 1274.153961] [<ffffffff8111f1d7>] do_vfs_ioctl+0x4a0/0x4e1 [ 1274.153980] [<ffffffff8111f26d>] sys_ioctl+0x55/0x77 [ 1274.154000] [<ffffffff81112e6a>] ? sys_write+0x60/0x70 [ 1274.154009] [<ffffffff81036cc2>] system_call_fastpath+0x16/0x1b [ 1274.154126] Mem-Info: [ 1274.154138] DMA per-cpu: [ 1274.154151] CPU 0: hi: 0, btch: 1 usd: 0 [ 1274.154165] CPU 1: hi: 0, btch: 1 usd: 0 [ 1274.154180] DMA32 per-cpu: [ 1274.154202] CPU 0: hi: 186, btch: 31 usd: 0 [ 1274.154220] CPU 1: hi: 186, btch: 31 usd: 78 [ 1274.154241] active_anon:248 inactive_anon:326 isolated_anon:0 [ 1274.154244] active_file:132 inactive_file:105 isolated_file:41 [ 1274.154247] unevictable:0 dirty:0 writeback:19 unstable:0 [ 1274.154250] free:1309 slab_reclaimable:642 slab_unreclaimable:3111 [ 1274.154254] mapped:100846 shmem:4 pagetables:1187 bounce:0 [ 1274.154313] DMA free:2036kB min:80kB low:100kB high:120kB active_anon:0kB inactive_anon:24kB active_file:20kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14752kB mlocked:0kB dirty:0kB writeback:0kB mapped:12804kB shmem:0kB slab_reclaimable:16kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 1274.154375] lowmem_reserve[]: 0 489 489 489 [ 1274.154415] DMA32 free:3200kB min:2788kB low:3484kB high:4180kB active_anon:992kB inactive_anon:1280kB active_file:508kB inactive_file:420kB unevictable:0kB isolated(anon):0kB isolated(file):164kB present:500960kB mlocked:0kB dirty:0kB writeback:76kB mapped:390580kB shmem:16kB slab_reclaimable:2552kB slab_unreclaimable:12404kB kernel_stack:592kB pagetables:4724kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:160 all_unreclaimable? no [ 1274.154481] lowmem_reserve[]: 0 0 0 0 [ 1274.154508] DMA: 7*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2036kB [ 1274.154571] DMA32: 409*4kB 33*8kB 2*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 3212kB [ 1274.154634] 429 total pagecache pages [ 1274.154646] 161 pages in swap cache [ 1274.154658] Swap cache stats: add 344422, delete 344260, find 99167/143153 [ 1274.154673] Free swap = 476756kB [ 1274.154684] Total swap = 524280kB [ 1274.160880] 131072 pages RAM [ 1274.160902] 21934 pages reserved [ 1274.160914] 101195 pages shared [ 1274.160925] 6309 pages non-shared [ 1274.160963] unable to allocate 185088 bytes for transfer buffer 4 [ 1287.634682] motion invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 [ 1287.634719] motion cpuset=/ mems_allowed=0 Thursday, August 5, 2010, 4:52:14 PM, you wrote:> On Thu, Aug 05, 2010 at 11:48:44AM +0200, Sander Eikelenboom wrote: >> Hi Konrad/Jeremy, >> >> I have tested the last 2 days with the vm''s with passthroughed devices shutdown, and no freeze so far. >> I''m running now with one of the vm''s that runs an old 2.6.33 kernel from an old tree from Konrad together with some hacked up patches for xhci/usb3 support. >> That seems to be running fine for some time now (although not a full 2 days yet). >> >> So my other vm seems to cause the freeze. >> >> - This one uses the devel/merge.2.6.35-rc6.t2 as domU kernel, i think i should try an older version of pci-front/xen-swiotlb perhaps. >> - It has both a usb2 and usb3 controller passed through, but the xhci module has much changed since the hacked up patches from the kernel in de working domU vm >> - Most probably the drivers for the videograbbers will have changed >> >> So i suspect: >> - newer pci-front / xen-swiotlb >> - xhci/usb3 driver >> - drivers videograbber >> >> Most probable would be a roque dma transfer that can''t be catched by xen / pciback I guess, and therefore would be hard to debug ?> The SWIOTLB "brains" by themselves haven''t changed since the > uhh...2.6.33. The code internals that just got Ack-ed upstream looks quite > similar to the one that Jeremy carries in xen/stable-2.6.32.x. The > outside plumbing parts are the ones that changed.> The fixes in the pci-front, well, most of those are "burocractic" in > nature - set the ownership to this, make hotplug work, etc. The big > fixes were the MSI/MSI-X ones but those were big news a couple of months > ago (and I think that was when 2.6.34 came out).> The videograbber (vl4) stack trace you sent to me some time ago looked > liked a mutex was held for a very very long time... which I wonder if > that is the cmpxch compiler bug that has hit some folks. Are you using > Debian?> But we can do something easy. I can rebase my 2.6.33 kernel with the > latest Xen-SWIOTLB/SWIOTLB engine + Xen PCI front, and we can eliminate the > SWIOTLB/PCIfront being at fault here.. Let me do that if your 2.6.33 > VM guest is running fine for the last two days.-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Aug-06 15:17 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
On Fri, Aug 06, 2010 at 11:21:11AM +0200, Sander Eikelenboom wrote:> Hi Konrad, > > Hmm it seems that 2.6.33 tree does seem to work for 1 VM with a videograbber, but doesn''t for the VM which seem to cause the freeze. > It does spit out some stacktraces after a while of not functioning, with since is OOM i will be something else caused by the fall out and not anywhere near the root cause. > Although this at least didn''t freeze the complete system :-) > I will try some more configurations to see if i can find a pattern somehow ... > > -- > Sander > > [ 1269.032133] submit of urb 0 failed (error=-90) > [ 1274.153341] motion: page allocation failure. order:6, mode:0xd4That is a 256kB request for memery.> [ 1274.153375] Pid: 1884, comm: motion Not tainted 2.6.33 #5 > [ 1274.153391] Call Trace: > [ 1274.153416] [<ffffffff810e4665>] __alloc_pages_nodemask+0x5b2/0x62b > [ 1274.153440] [<ffffffff810338b9>] ? xen_force_evtchn_callback+0xd/0xf > [ 1274.153461] [<ffffffff810e46f5>] __get_free_pages+0x17/0x5f > [ 1274.153483] [<ffffffff8128042e>] xen_swiotlb_alloc_coherent+0x3c/0xe2 > [ 1274.153507] [<ffffffff81410931>] hcd_buffer_alloc+0xfa/0x11f > [ 1274.153527] [<ffffffff81403e0c>] usb_buffer_alloc+0x17/0x1d > [ 1274.153562] [<ffffffffa003f39e>] em28xx_init_isoc+0x16a/0x32b [em28xx] > [ 1274.153585] [<ffffffff815ec0b9>] ? __down_read+0x47/0xed > [ 1274.153613] [<ffffffffa003a4ac>] buffer_prepare+0xd7/0x10d [em28xx] > [ 1274.153639] [<ffffffffa0016dac>] videobuf_qbuf+0x308/0x3f4 [videobuf_core] > [ 1274.153667] [<ffffffffa0039cb3>] vidioc_qbuf+0x35/0x3a [em28xx] > [ 1274.153697] [<ffffffffa0028229>] __video_do_ioctl+0x11ab/0x373b [videodev] > [ 1274.153720] [<ffffffff814b51cd>] ? sock_def_readable+0x54/0x5f > [ 1274.153743] [<ffffffff81541f65>] ? unix_dgram_sendmsg+0x3f1/0x43e > [ 1274.153764] [<ffffffff810313b5>] ? __raw_callee_save_xen_pud_val+0x11/0x1e > [ 1274.153793] [<ffffffffa0039c7e>] ? vidioc_qbuf+0x0/0x3a [em28xx] > [ 1274.153814] [<ffffffff814b208b>] ? sock_sendmsg+0xa3/0xbc > [ 1274.153837] [<ffffffff8123349b>] ? avc_has_perm+0x4e/0x60 > [ 1274.153855] [<ffffffff810338b9>] ? xen_force_evtchn_callback+0xd/0xf > [ 1274.153880] [<ffffffffa002aab1>] video_ioctl2+0x2f8/0x3af [videodev] > [ 1274.153901] [<ffffffff810357df>] ? __switch_to+0x265/0x277 > [ 1274.153924] [<ffffffffa0026122>] v4l2_ioctl+0x38/0x3a [videodev] > [ 1274.153944] [<ffffffff8111ec90>] vfs_ioctl+0x72/0x9e > [ 1274.153961] [<ffffffff8111f1d7>] do_vfs_ioctl+0x4a0/0x4e1 > [ 1274.153980] [<ffffffff8111f26d>] sys_ioctl+0x55/0x77 > [ 1274.154000] [<ffffffff81112e6a>] ? sys_write+0x60/0x70 > [ 1274.154009] [<ffffffff81036cc2>] system_call_fastpath+0x16/0x1b > [ 1274.154126] Mem-Info: > [ 1274.154138] DMA per-cpu: > [ 1274.154151] CPU 0: hi: 0, btch: 1 usd: 0 > [ 1274.154165] CPU 1: hi: 0, btch: 1 usd: 0 > [ 1274.154180] DMA32 per-cpu: > [ 1274.154202] CPU 0: hi: 186, btch: 31 usd: 0 > [ 1274.154220] CPU 1: hi: 186, btch: 31 usd: 78 > [ 1274.154241] active_anon:248 inactive_anon:326 isolated_anon:0 > [ 1274.154244] active_file:132 inactive_file:105 isolated_file:41 > [ 1274.154247] unevictable:0 dirty:0 writeback:19 unstable:0 > [ 1274.154250] free:1309 slab_reclaimable:642 slab_unreclaimable:3111 > [ 1274.154254] mapped:100846 shmem:4 pagetables:1187 bounce:0 > [ 1274.154313] DMA free:2036kB min:80kB low:100kB high:120kB active_anon:0kB inactive_anon:24kB active_file:20kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14752kB mlocked:0kB dirty:0kB writeback:0kB mapped:12804kB shmem:0kB slab_reclaimable:16kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > [ 1274.154375] lowmem_reserve[]: 0 489 489 489 > [ 1274.154415] DMA32 free:3200kB min:2788kB low:3484kB high:4180kB active_anon:992kB inactive_anon:1280kB active_file:508kB inactive_file:420kB unevictable:0kB isolated(anon):0kB isolated(file):164kB present:500960kB mlocked:0kB dirty:0kB writeback:76kB mapped:390580kB shmem:16kB slab_reclaimable:2552kB slab_unreclaimable:12404kB kernel_stack:592kB pagetables:4724kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:160 all_unreclaimable? no > [ 1274.154481] lowmem_reserve[]: 0 0 0 0 > [ 1274.154508] DMA: 7*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2036kB > [ 1274.154571] DMA32: 409*4kB 33*8kB 2*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 3212kB > [ 1274.154634] 429 total pagecache pages > [ 1274.154646] 161 pages in swap cache > [ 1274.154658] Swap cache stats: add 344422, delete 344260, find 99167/143153 > [ 1274.154673] Free swap = 476756kB > [ 1274.154684] Total swap = 524280kB > [ 1274.160880] 131072 pages RAM > [ 1274.160902] 21934 pages reserved > [ 1274.160914] 101195 pages shared > [ 1274.160925] 6309 pages non-shared > [ 1274.160963] unable to allocate 185088 bytes for transfer buffer 4Though here it says it is 185 kbytes. Hmm.. You got 3MB in DMA32 and 2MB in DMA so that should be enough. I am not that familiar with the VM, so the instinctive thing I can think of is to raise the amount of memory your guest has from the 512MB to 768MB. Does ''proc/meminfo'' when this happens show you an excedingly small amount of MemFree?> [ 1287.634682] motion invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 > [ 1287.634719] motion cpuset=/ mems_allowed=0 > > > > > Thursday, August 5, 2010, 4:52:14 PM, you wrote: > > > On Thu, Aug 05, 2010 at 11:48:44AM +0200, Sander Eikelenboom wrote: > >> Hi Konrad/Jeremy, > >> > >> I have tested the last 2 days with the vm''s with passthroughed devices shutdown, and no freeze so far. > >> I''m running now with one of the vm''s that runs an old 2.6.33 kernel from an old tree from Konrad together with some hacked up patches for xhci/usb3 support. > >> That seems to be running fine for some time now (although not a full 2 days yet). > >> > >> So my other vm seems to cause the freeze. > >> > >> - This one uses the devel/merge.2.6.35-rc6.t2 as domU kernel, i think i should try an older version of pci-front/xen-swiotlb perhaps. > >> - It has both a usb2 and usb3 controller passed through, but the xhci module has much changed since the hacked up patches from the kernel in de working domU vm > >> - Most probably the drivers for the videograbbers will have changed > >> > >> So i suspect: > >> - newer pci-front / xen-swiotlb > >> - xhci/usb3 driver > >> - drivers videograbber > >> > >> Most probable would be a roque dma transfer that can''t be catched by xen / pciback I guess, and therefore would be hard to debug ? > > > The SWIOTLB "brains" by themselves haven''t changed since the > > uhh...2.6.33. The code internals that just got Ack-ed upstream looks quite > > similar to the one that Jeremy carries in xen/stable-2.6.32.x. The > > outside plumbing parts are the ones that changed. > > > The fixes in the pci-front, well, most of those are "burocractic" in > > nature - set the ownership to this, make hotplug work, etc. The big > > fixes were the MSI/MSI-X ones but those were big news a couple of months > > ago (and I think that was when 2.6.34 came out). > > > The videograbber (vl4) stack trace you sent to me some time ago looked > > liked a mutex was held for a very very long time... which I wonder if > > that is the cmpxch compiler bug that has hit some folks. Are you using > > Debian? > > > But we can do something easy. I can rebase my 2.6.33 kernel with the > > latest Xen-SWIOTLB/SWIOTLB engine + Xen PCI front, and we can eliminate the > > SWIOTLB/PCIfront being at fault here.. Let me do that if your 2.6.33 > > VM guest is running fine for the last two days. > > > > > -- > Best regards, > Sander mailto:linux@eikelenboom.it_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-06 20:44 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
On 08/06/2010 08:17 AM, Konrad Rzeszutek Wilk wrote:> On Fri, Aug 06, 2010 at 11:21:11AM +0200, Sander Eikelenboom wrote: >> Hi Konrad, >> >> Hmm it seems that 2.6.33 tree does seem to work for 1 VM with a videograbber, but doesn''t for the VM which seem to cause the freeze. >> It does spit out some stacktraces after a while of not functioning, with since is OOM i will be something else caused by the fall out and not anywhere near the root cause. >> Although this at least didn''t freeze the complete system :-) >> I will try some more configurations to see if i can find a pattern somehow ... >> >> -- >> Sander >> >> [ 1269.032133] submit of urb 0 failed (error=-90) >> [ 1274.153341] motion: page allocation failure. order:6, mode:0xd4 > That is a 256kB request for memery. >> [ 1274.153375] Pid: 1884, comm: motion Not tainted 2.6.33 #5 >> [ 1274.153391] Call Trace: >> [ 1274.153416] [<ffffffff810e4665>] __alloc_pages_nodemask+0x5b2/0x62b >> [ 1274.153440] [<ffffffff810338b9>] ? xen_force_evtchn_callback+0xd/0xf >> [ 1274.153461] [<ffffffff810e46f5>] __get_free_pages+0x17/0x5f >> [ 1274.153483] [<ffffffff8128042e>] xen_swiotlb_alloc_coherent+0x3c/0xe2 >> [ 1274.153507] [<ffffffff81410931>] hcd_buffer_alloc+0xfa/0x11f >> [ 1274.153527] [<ffffffff81403e0c>] usb_buffer_alloc+0x17/0x1d >> [ 1274.153562] [<ffffffffa003f39e>] em28xx_init_isoc+0x16a/0x32b [em28xx] >> [ 1274.153585] [<ffffffff815ec0b9>] ? __down_read+0x47/0xed >> [ 1274.153613] [<ffffffffa003a4ac>] buffer_prepare+0xd7/0x10d [em28xx] >> [ 1274.153639] [<ffffffffa0016dac>] videobuf_qbuf+0x308/0x3f4 [videobuf_core] >> [ 1274.153667] [<ffffffffa0039cb3>] vidioc_qbuf+0x35/0x3a [em28xx] >> [ 1274.153697] [<ffffffffa0028229>] __video_do_ioctl+0x11ab/0x373b [videodev] >> [ 1274.153720] [<ffffffff814b51cd>] ? sock_def_readable+0x54/0x5f >> [ 1274.153743] [<ffffffff81541f65>] ? unix_dgram_sendmsg+0x3f1/0x43e >> [ 1274.153764] [<ffffffff810313b5>] ? __raw_callee_save_xen_pud_val+0x11/0x1e >> [ 1274.153793] [<ffffffffa0039c7e>] ? vidioc_qbuf+0x0/0x3a [em28xx] >> [ 1274.153814] [<ffffffff814b208b>] ? sock_sendmsg+0xa3/0xbc >> [ 1274.153837] [<ffffffff8123349b>] ? avc_has_perm+0x4e/0x60 >> [ 1274.153855] [<ffffffff810338b9>] ? xen_force_evtchn_callback+0xd/0xf >> [ 1274.153880] [<ffffffffa002aab1>] video_ioctl2+0x2f8/0x3af [videodev] >> [ 1274.153901] [<ffffffff810357df>] ? __switch_to+0x265/0x277 >> [ 1274.153924] [<ffffffffa0026122>] v4l2_ioctl+0x38/0x3a [videodev] >> [ 1274.153944] [<ffffffff8111ec90>] vfs_ioctl+0x72/0x9e >> [ 1274.153961] [<ffffffff8111f1d7>] do_vfs_ioctl+0x4a0/0x4e1 >> [ 1274.153980] [<ffffffff8111f26d>] sys_ioctl+0x55/0x77 >> [ 1274.154000] [<ffffffff81112e6a>] ? sys_write+0x60/0x70 >> [ 1274.154009] [<ffffffff81036cc2>] system_call_fastpath+0x16/0x1b >> [ 1274.154126] Mem-Info: >> [ 1274.154138] DMA per-cpu: >> [ 1274.154151] CPU 0: hi: 0, btch: 1 usd: 0 >> [ 1274.154165] CPU 1: hi: 0, btch: 1 usd: 0 >> [ 1274.154180] DMA32 per-cpu: >> [ 1274.154202] CPU 0: hi: 186, btch: 31 usd: 0 >> [ 1274.154220] CPU 1: hi: 186, btch: 31 usd: 78 >> [ 1274.154241] active_anon:248 inactive_anon:326 isolated_anon:0 >> [ 1274.154244] active_file:132 inactive_file:105 isolated_file:41 >> [ 1274.154247] unevictable:0 dirty:0 writeback:19 unstable:0 >> [ 1274.154250] free:1309 slab_reclaimable:642 slab_unreclaimable:3111 >> [ 1274.154254] mapped:100846 shmem:4 pagetables:1187 bounce:0 >> [ 1274.154313] DMA free:2036kB min:80kB low:100kB high:120kB active_anon:0kB inactive_anon:24kB active_file:20kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14752kB mlocked:0kB dirty:0kB writeback:0kB mapped:12804kB shmem:0kB slab_reclaimable:16kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >> [ 1274.154375] lowmem_reserve[]: 0 489 489 489 >> [ 1274.154415] DMA32 free:3200kB min:2788kB low:3484kB high:4180kB active_anon:992kB inactive_anon:1280kB active_file:508kB inactive_file:420kB unevictable:0kB isolated(anon):0kB isolated(file):164kB present:500960kB mlocked:0kB dirty:0kB writeback:76kB mapped:390580kB shmem:16kB slab_reclaimable:2552kB slab_unreclaimable:12404kB kernel_stack:592kB pagetables:4724kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:160 all_unreclaimable? no >> [ 1274.154481] lowmem_reserve[]: 0 0 0 0 >> [ 1274.154508] DMA: 7*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2036kB >> [ 1274.154571] DMA32: 409*4kB 33*8kB 2*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 3212kB >> [ 1274.154634] 429 total pagecache pages >> [ 1274.154646] 161 pages in swap cache >> [ 1274.154658] Swap cache stats: add 344422, delete 344260, find 99167/143153 >> [ 1274.154673] Free swap = 476756kB >> [ 1274.154684] Total swap = 524280kB >> [ 1274.160880] 131072 pages RAM >> [ 1274.160902] 21934 pages reserved >> [ 1274.160914] 101195 pages shared >> [ 1274.160925] 6309 pages non-shared >> [ 1274.160963] unable to allocate 185088 bytes for transfer buffer 4 > Though here it says it is 185 kbytes. Hmm.. You got 3MB in DMA32 and 2MB > in DMA so that should be enough. > > I am not that familiar with the VM, so the instinctive thing I can think > of is to raise the amount of memory your guest has from the 512MB to > 768MB. Does ''proc/meminfo'' when this happens show you an excedingly > small amount of MemFree?Memory allocations are rounded up to the next order, so 185k -> 256k. It''s also a contiguous allocation, so it needs to find 64 contiguous pages, which is pretty much impossible in a system which has been running for a while. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2010-Aug-08 13:54 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
Hi Konrad/Jeremy, Previously i had a working setup, with the same VM i''m using now, but quite a few things have changed: - Another motherboard - Another processor - Another USB controller (now usb3 pci-e instead of usb2 pci) The freezes seem to be related to the USB3, but i''m not quite sure, since this also can give another workload. What i have tried: - latest xen-4.0-testing compiled with debug=y - for dom0: latest 2.6.32 pvops stable kernel from Jeremy - for dom0: latest 2.6.32 xen-next kernel from Jeremy - added some kernel debug options(apart from some noise about hardirq''s this doesn''t seem to deliver a lot) - for domU: 2.6.35-rc6 tree (devel/2.6.25-rc6-t2) from Konrad''s tree. - I also tried running without msi It allways freezes at some point, but the time when seems to vary, although most of the time within a day. - Done another memtest to be sure it isn''t faulty memory, cooling is on max and temperatures are good, so that seems to be ruled out as well. - I tried using the IOMMU, this should rule out DMA one should say, but it also froze with the IOMMU enabled and working, and again nothing in serial log :-( The things i''m about to try after i have finished some backups are: - xen-unstable - running the VM as a HVM What i''m wondering about is: - Could the 4GB memory barrier still be a problem ? The machine has 8GB, and the domain normally would be started as one of the last, which totals up to around the 4GB of domains running. This night i let it run with only the troublesome pv domain, and it seems to work so far. - Is there a way to force a domain to live underneath the 4GB or any other thing i could try out (besides ripping the ram out of the machine) - Are there any other things that could prevent a full freeze by making things more strict cq provide addiontal debug info ? -- Sander _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2010-Aug-08 16:57 UTC
Re: [Xen-devel] [xen-4.0.1-rc5-pre] [pvops 2.6.32.16] Complete freeze within 2 days, no info in serial log
Hi Konrad, This time the grabbing application became an hung application again in the VM, it seems you are right, available mem is down to 0. It always used to work with 512mb assigned to the domain. Most probably a bug in the xhci code i assume ? Attached: Some hopefully relevant data from /proc -- Sander Aug 8 20:16:17 security kernel: [ 721.555787] BUG: soft lockup - CPU#0 stuck for 82s! [kmemleak:374] Aug 8 20:16:17 security kernel: [ 721.555790] Modules linked in: fuse saa7115 em28xx v4l2_common videodev v4l1_compat v4l2_compat_ioctl32 videobuf_vmalloc videobuf_core tveeprom evdev i2c_core pcspkr thermal_sys [last unloaded: scsi_wait_scan] Aug 8 20:16:17 security kernel: [ 721.555814] CPU 0 Aug 8 20:16:17 security kernel: [ 721.555816] Modules linked in: fuse saa7115 em28xx v4l2_common videodev v4l1_compat v4l2_compat_ioctl32 videobuf_vmalloc videobuf_core tveeprom evdev i2c_core pcspkr thermal_sys [last unloaded: scsi_wait_scan] Aug 8 20:16:17 security kernel: [ 721.555838] Aug 8 20:16:17 security kernel: [ 721.555841] Pid: 374, comm: kmemleak Not tainted 2.6.35-rc6+xen-2.6.35-rc6-xen-isoc-20100808-l3-mutex-dma-ed+ #7 / Aug 8 20:16:17 security kernel: [ 721.555847] RIP: e030:[<ffffffff81006318>] [<ffffffff81006318>] xen_restore_fl_direct+0x18/0x1b Aug 8 20:16:17 security kernel: [ 721.555858] RSP: e02b:ffff88001d8abe40 EFLAGS: 00000246 Aug 8 20:16:17 security kernel: [ 721.555861] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88001f7626d0 Aug 8 20:16:17 security kernel: [ 721.555865] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000200 Aug 8 20:16:17 security kernel: [ 721.555869] RBP: 0000000000000001 R08: fffc000000000000 R09: ffff88001d8abdb0 Aug 8 20:16:17 security kernel: [ 721.555873] R10: 000000000000000c R11: ffffea00002fdef8 R12: 0000000000000200 Aug 8 20:16:17 security kernel: [ 721.555877] R13: 0000000000000000 R14: ffffea00002fdf01 R15: 0000000000000001 Aug 8 20:16:17 security kernel: [ 721.555886] FS: 00007fc794dfc910(0000) GS:ffff880002ced000(0000) knlGS:0000000000000000 Aug 8 20:16:17 security kernel: [ 721.555891] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Aug 8 20:16:17 security kernel: [ 721.555894] CR2: 0000000001840078 CR3: 000000001e40b000 CR4: 0000000000000660 Aug 8 20:16:17 security kernel: [ 721.559780] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 8 20:16:17 security kernel: [ 721.559780] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Aug 8 20:16:17 security kernel: [ 721.559780] Process kmemleak (pid: 374, threadinfo ffff88001d8aa000, task ffff88001fd918d0) Aug 8 20:16:17 security kernel: [ 721.559780] Stack: Aug 8 20:16:17 security kernel: [ 721.559780] ffffffff8142d77e ffffffff810c638e 0000000000000000 ffff8800126e0260 Aug 8 20:16:17 security kernel: [ 721.559780] <0> ffffea00002fdf00 ffffffff810c6967 ffff8800149b02b0 ffffea00002fded0 Aug 8 20:16:17 security kernel: [ 721.559780] <0> 000000000000dad6 ffffea00002fdf08 0000000000020000 0000000000000000 Aug 8 20:16:17 security kernel: [ 721.559780] Call Trace: Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff8142d77e>] ? _raw_read_unlock_irqrestore+0xd/0xe Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810c638e>] ? find_and_get_object+0x4a/0x75 Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810c6967>] ? scan_block+0x4a/0xf7 Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810c6ce9>] ? kmemleak_scan+0x1a2/0x3e9 Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810c737c>] ? kmemleak_scan_thread+0x0/0x9b Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810c737c>] ? kmemleak_scan_thread+0x0/0x9b Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810c73d5>] ? kmemleak_scan_thread+0x59/0x9b Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff81054051>] ? kthread+0x79/0x81 Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810094e4>] ? kernel_thread_helper+0x4/0x10 Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810088e3>] ? int_ret_from_sys_call+0x7/0x1b Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff8142dadd>] ? retint_restore_args+0x5/0x6 Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810094e0>] ? kernel_thread_helper+0x0/0x10 Aug 8 20:16:17 security kernel: [ 721.559780] Code: 44 00 00 65 f6 04 25 21 b0 00 00 ff 0f 94 c4 00 e4 c3 90 66 f7 c7 00 02 65 0f 94 04 25 21 b0 00 00 65 66 83 3c 25 20 b0 00 00 01 <74> 05 e8 01 00 00 00 c3 50 51 52 56 57 41 50 41 51 41 52 41 53 Aug 8 20:16:17 security kernel: [ 721.559780] Call Trace: Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff8142d77e>] ? _raw_read_unlock_irqrestore+0xd/0xe Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810c638e>] ? find_and_get_object+0x4a/0x75 Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810c6967>] ? scan_block+0x4a/0xf7 Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810c6ce9>] ? kmemleak_scan+0x1a2/0x3e9 Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810c737c>] ? kmemleak_scan_thread+0x0/0x9b Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810c737c>] ? kmemleak_scan_thread+0x0/0x9b Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810c73d5>] ? kmemleak_scan_thread+0x59/0x9b Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff81054051>] ? kthread+0x79/0x81 Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810094e4>] ? kernel_thread_helper+0x4/0x10 Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810088e3>] ? int_ret_from_sys_call+0x7/0x1b Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff8142dadd>] ? retint_restore_args+0x5/0x6 Aug 8 20:16:17 security kernel: [ 721.559780] [<ffffffff810094e0>] ? kernel_thread_helper+0x0/0x10 Aug 8 20:16:19 security kernel: [ 724.187104] kmemleak: 5 new suspected memory leaks (see /sys/kernel/debug/kmemleak) Aug 8 20:16:46 security motion: [0] Thread 1 - Watchdog timeout, trying to do a graceful restart Aug 8 20:17:01 security /USR/SBIN/CRON[1865]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Aug 8 20:17:46 security motion: [0] Thread 1 - Watchdog timeout, did NOT restart graceful,killing it! Aug 8 20:17:46 security motion: [0] Calling vid_close() from motion_cleanup Aug 8 20:17:46 security motion: [0] Closing video device /dev/kworld Aug 8 20:20:17 security kernel: [ 961.780121] INFO: task motion:1257 blocked for more than 120 seconds. Aug 8 20:20:17 security kernel: [ 961.780155] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 8 20:20:17 security kernel: [ 961.780177] motion D ffff88001ef60bc0 0 1257 1 0x00000000 Aug 8 20:20:17 security kernel: [ 961.780207] ffff88001fd155d0 0000000000000282 ffffffff81005cc5 00000000000145c0 Aug 8 20:20:17 security kernel: [ 961.780243] ffff88001e41dfd8 ffff88001e41dfd8 ffff88001ef60930 00000000000145c0 Aug 8 20:20:17 security kernel: [ 961.780278] 00000000000145c0 00000000000145c0 ffff88001ef60930 0000000000000000 Aug 8 20:20:17 security kernel: [ 961.780313] Call Trace: Aug 8 20:20:17 security kernel: [ 961.780337] [<ffffffff81005cc5>] ? xen_force_evtchn_callback+0x9/0xa Aug 8 20:20:17 security kernel: [ 961.780365] [<ffffffffa002eaa6>] ? video_ioctl2+0x0/0x32e [videodev] Aug 8 20:20:17 security kernel: [ 961.780388] [<ffffffff8142c544>] ? __mutex_lock_slowpath+0x12f/0x22c Aug 8 20:20:17 security kernel: [ 961.780409] [<ffffffff8142c64a>] ? mutex_lock+0x9/0x1e Aug 8 20:20:17 security kernel: [ 961.780430] [<ffffffffa0017e58>] ? videobuf_streamoff+0x13/0x35 [videobuf_core] Aug 8 20:20:17 security kernel: [ 961.780454] [<ffffffff81005cc5>] ? xen_force_evtchn_callback+0x9/0xa Aug 8 20:20:17 security kernel: [ 961.780478] [<ffffffffa003d573>] ? vidioc_streamoff+0x7e/0xb5 [em28xx] Aug 8 20:20:17 security kernel: [ 961.780500] [<ffffffffa002c5fe>] ? __video_do_ioctl+0x181f/0x3cc7 [videodev] Aug 8 20:20:17 security kernel: [ 961.780523] [<ffffffff8100631f>] ? xen_restore_fl_direct_end+0x0/0x1 Aug 8 20:20:17 security kernel: [ 961.780544] [<ffffffff8142d714>] ? _raw_spin_unlock_irqrestore+0xc/0xd Aug 8 20:20:17 security kernel: [ 961.780564] [<ffffffff813941dd>] ? sock_def_readable+0x3b/0x5d Aug 8 20:20:17 security kernel: [ 961.780585] [<ffffffff814043a6>] ? unix_dgram_sendmsg+0x428/0x4b2 Aug 8 20:20:17 security kernel: [ 961.780606] [<ffffffff810058fa>] ? xen_set_pte_at+0x196/0x1b6 Aug 8 20:20:17 security kernel: [ 961.780625] [<ffffffff810036bd>] ? __raw_callee_save_xen_make_pte+0x11/0x1e Aug 8 20:20:17 security kernel: [ 961.780648] [<ffffffff8139115e>] ? sock_sendmsg+0xd1/0xec Aug 8 20:20:17 security kernel: [ 961.780669] [<ffffffff810b0b00>] ? __do_fault+0x40f/0x44a Aug 8 20:20:17 security kernel: [ 961.780689] [<ffffffff81005cc5>] ? xen_force_evtchn_callback+0x9/0xa Aug 8 20:20:17 security kernel: [ 961.780709] [<ffffffff81006332>] ? check_events+0x12/0x20 Aug 8 20:20:17 security kernel: [ 961.780730] [<ffffffffa002ed38>] ? video_ioctl2+0x292/0x32e [videodev] Aug 8 20:20:17 security kernel: [ 961.780750] [<ffffffff81002616>] ? xen_write_msr_safe+0x5d/0x79 Aug 8 20:20:17 security kernel: [ 961.780770] [<ffffffff81007337>] ? __switch_to+0x1b3/0x2a4 Aug 8 20:20:17 security kernel: [ 961.780790] [<ffffffff8100622a>] ? xen_sched_clock+0xf/0x8c Aug 8 20:20:17 security kernel: [ 961.780810] [<ffffffff81005cc5>] ? xen_force_evtchn_callback+0x9/0xa Aug 8 20:20:17 security kernel: [ 961.780830] [<ffffffff81006332>] ? check_events+0x12/0x20 Aug 8 20:20:17 security kernel: [ 961.780850] [<ffffffffa002a10b>] ? v4l2_ioctl+0x38/0x3a [videodev] Aug 8 20:20:17 security kernel: [ 961.780870] [<ffffffff810d54be>] ? vfs_ioctl+0x69/0x92 Aug 8 20:20:17 security kernel: [ 961.780889] [<ffffffff810d596e>] ? do_vfs_ioctl+0x411/0x43c Aug 8 20:20:17 security kernel: [ 961.780909] [<ffffffff810c96b4>] ? vfs_write+0x134/0x169 Aug 8 20:20:17 security kernel: [ 961.780928] [<ffffffff810d59ea>] ? sys_ioctl+0x51/0x70 Aug 8 20:20:17 security kernel: [ 961.780947] [<ffffffff810086c2>] ? system_call_fastpath+0x16/0x1b Aug 8 20:22:17 security kernel: [ 1081.780140] INFO: task motion:1257 blocked for more than 120 seconds. Aug 8 20:22:17 security kernel: [ 1081.780172] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 8 20:22:17 security kernel: [ 1081.780194] motion D ffff88001ef60bc0 0 1257 1 0x00000000 Aug 8 20:22:17 security kernel: [ 1081.780224] ffff88001fd155d0 0000000000000282 ffffffff81005cc5 00000000000145c0 Aug 8 20:22:17 security kernel: [ 1081.780261] ffff88001e41dfd8 ffff88001e41dfd8 ffff88001ef60930 00000000000145c0 Aug 8 20:22:17 security kernel: [ 1081.780295] 00000000000145c0 00000000000145c0 ffff88001ef60930 0000000000000000 Aug 8 20:22:17 security kernel: [ 1081.780330] Call Trace: Aug 8 20:22:17 security kernel: [ 1081.780355] [<ffffffff81005cc5>] ? xen_force_evtchn_callback+0x9/0xa Aug 8 20:22:17 security kernel: [ 1081.780382] [<ffffffffa002eaa6>] ? video_ioctl2+0x0/0x32e [videodev] Aug 8 20:22:17 security kernel: [ 1081.780405] [<ffffffff8142c544>] ? __mutex_lock_slowpath+0x12f/0x22c Aug 8 20:22:17 security kernel: [ 1081.780426] [<ffffffff8142c64a>] ? mutex_lock+0x9/0x1e Aug 8 20:22:17 security kernel: [ 1081.780447] [<ffffffffa0017e58>] ? videobuf_streamoff+0x13/0x35 [videobuf_core] Aug 8 20:22:17 security kernel: [ 1081.780471] [<ffffffff81005cc5>] ? xen_force_evtchn_callback+0x9/0xa Aug 8 20:22:17 security kernel: [ 1081.780495] [<ffffffffa003d573>] ? vidioc_streamoff+0x7e/0xb5 [em28xx] Aug 8 20:22:17 security kernel: [ 1081.780517] [<ffffffffa002c5fe>] ? __video_do_ioctl+0x181f/0x3cc7 [videodev] Aug 8 20:22:17 security kernel: [ 1081.780540] [<ffffffff8100631f>] ? xen_restore_fl_direct_end+0x0/0x1 Aug 8 20:22:17 security kernel: [ 1081.780561] [<ffffffff8142d714>] ? _raw_spin_unlock_irqrestore+0xc/0xd Aug 8 20:22:17 security kernel: [ 1081.780581] [<ffffffff813941dd>] ? sock_def_readable+0x3b/0x5d Aug 8 20:22:17 security kernel: [ 1081.780602] [<ffffffff814043a6>] ? unix_dgram_sendmsg+0x428/0x4b2 Aug 8 20:22:17 security kernel: [ 1081.780622] [<ffffffff810058fa>] ? xen_set_pte_at+0x196/0x1b6 Aug 8 20:22:17 security kernel: [ 1081.780642] [<ffffffff810036bd>] ? __raw_callee_save_xen_make_pte+0x11/0x1e Aug 8 20:22:17 security kernel: [ 1081.780666] [<ffffffff8139115e>] ? sock_sendmsg+0xd1/0xec Aug 8 20:22:17 security kernel: [ 1081.780686] [<ffffffff810b0b00>] ? __do_fault+0x40f/0x44a Aug 8 20:22:17 security kernel: [ 1081.780706] [<ffffffff81005cc5>] ? xen_force_evtchn_callback+0x9/0xa Aug 8 20:22:17 security kernel: [ 1081.780726] [<ffffffff81006332>] ? check_events+0x12/0x20 Aug 8 20:22:17 security kernel: [ 1081.780747] [<ffffffffa002ed38>] ? video_ioctl2+0x292/0x32e [videodev] Aug 8 20:22:17 security kernel: [ 1081.780767] [<ffffffff81002616>] ? xen_write_msr_safe+0x5d/0x79 Aug 8 20:22:17 security kernel: [ 1081.780787] [<ffffffff81007337>] ? __switch_to+0x1b3/0x2a4 Aug 8 20:22:17 security kernel: [ 1081.780806] [<ffffffff8100622a>] ? xen_sched_clock+0xf/0x8c Aug 8 20:22:17 security kernel: [ 1081.780826] [<ffffffff81005cc5>] ? xen_force_evtchn_callback+0x9/0xa Aug 8 20:22:17 security kernel: [ 1081.780847] [<ffffffff81006332>] ? check_events+0x12/0x20 Aug 8 20:22:17 security kernel: [ 1081.780866] [<ffffffffa002a10b>] ? v4l2_ioctl+0x38/0x3a [videodev] Aug 8 20:22:17 security kernel: [ 1081.780886] [<ffffffff810d54be>] ? vfs_ioctl+0x69/0x92 Aug 8 20:22:17 security kernel: [ 1081.780905] [<ffffffff810d596e>] ? do_vfs_ioctl+0x411/0x43c Aug 8 20:22:17 security kernel: [ 1081.780925] [<ffffffff810c96b4>] ? vfs_write+0x134/0x169 Aug 8 20:22:17 security kernel: [ 1081.780943] [<ffffffff810d59ea>] ? sys_ioctl+0x51/0x70 Aug 8 20:22:17 security kernel: [ 1081.780961] [<ffffffff810086c2>] ? system_call_fastpath+0x16/0x1b Friday, August 6, 2010, 5:17:43 PM, you wrote:> On Fri, Aug 06, 2010 at 11:21:11AM +0200, Sander Eikelenboom wrote: >> Hi Konrad, >> >> Hmm it seems that 2.6.33 tree does seem to work for 1 VM with a videograbber, but doesn''t for the VM which seem to cause the freeze. >> It does spit out some stacktraces after a while of not functioning, with since is OOM i will be something else caused by the fall out and not anywhere near the root cause. >> Although this at least didn''t freeze the complete system :-) >> I will try some more configurations to see if i can find a pattern somehow ... >> >> -- >> Sander >> >> [ 1269.032133] submit of urb 0 failed (error=-90) >> [ 1274.153341] motion: page allocation failure. order:6, mode:0xd4> That is a 256kB request for memery. >> [ 1274.153375] Pid: 1884, comm: motion Not tainted 2.6.33 #5 >> [ 1274.153391] Call Trace: >> [ 1274.153416] [<ffffffff810e4665>] __alloc_pages_nodemask+0x5b2/0x62b >> [ 1274.153440] [<ffffffff810338b9>] ? xen_force_evtchn_callback+0xd/0xf >> [ 1274.153461] [<ffffffff810e46f5>] __get_free_pages+0x17/0x5f >> [ 1274.153483] [<ffffffff8128042e>] xen_swiotlb_alloc_coherent+0x3c/0xe2 >> [ 1274.153507] [<ffffffff81410931>] hcd_buffer_alloc+0xfa/0x11f >> [ 1274.153527] [<ffffffff81403e0c>] usb_buffer_alloc+0x17/0x1d >> [ 1274.153562] [<ffffffffa003f39e>] em28xx_init_isoc+0x16a/0x32b [em28xx] >> [ 1274.153585] [<ffffffff815ec0b9>] ? __down_read+0x47/0xed >> [ 1274.153613] [<ffffffffa003a4ac>] buffer_prepare+0xd7/0x10d [em28xx] >> [ 1274.153639] [<ffffffffa0016dac>] videobuf_qbuf+0x308/0x3f4 [videobuf_core] >> [ 1274.153667] [<ffffffffa0039cb3>] vidioc_qbuf+0x35/0x3a [em28xx] >> [ 1274.153697] [<ffffffffa0028229>] __video_do_ioctl+0x11ab/0x373b [videodev] >> [ 1274.153720] [<ffffffff814b51cd>] ? sock_def_readable+0x54/0x5f >> [ 1274.153743] [<ffffffff81541f65>] ? unix_dgram_sendmsg+0x3f1/0x43e >> [ 1274.153764] [<ffffffff810313b5>] ? __raw_callee_save_xen_pud_val+0x11/0x1e >> [ 1274.153793] [<ffffffffa0039c7e>] ? vidioc_qbuf+0x0/0x3a [em28xx] >> [ 1274.153814] [<ffffffff814b208b>] ? sock_sendmsg+0xa3/0xbc >> [ 1274.153837] [<ffffffff8123349b>] ? avc_has_perm+0x4e/0x60 >> [ 1274.153855] [<ffffffff810338b9>] ? xen_force_evtchn_callback+0xd/0xf >> [ 1274.153880] [<ffffffffa002aab1>] video_ioctl2+0x2f8/0x3af [videodev] >> [ 1274.153901] [<ffffffff810357df>] ? __switch_to+0x265/0x277 >> [ 1274.153924] [<ffffffffa0026122>] v4l2_ioctl+0x38/0x3a [videodev] >> [ 1274.153944] [<ffffffff8111ec90>] vfs_ioctl+0x72/0x9e >> [ 1274.153961] [<ffffffff8111f1d7>] do_vfs_ioctl+0x4a0/0x4e1 >> [ 1274.153980] [<ffffffff8111f26d>] sys_ioctl+0x55/0x77 >> [ 1274.154000] [<ffffffff81112e6a>] ? sys_write+0x60/0x70 >> [ 1274.154009] [<ffffffff81036cc2>] system_call_fastpath+0x16/0x1b >> [ 1274.154126] Mem-Info: >> [ 1274.154138] DMA per-cpu: >> [ 1274.154151] CPU 0: hi: 0, btch: 1 usd: 0 >> [ 1274.154165] CPU 1: hi: 0, btch: 1 usd: 0 >> [ 1274.154180] DMA32 per-cpu: >> [ 1274.154202] CPU 0: hi: 186, btch: 31 usd: 0 >> [ 1274.154220] CPU 1: hi: 186, btch: 31 usd: 78 >> [ 1274.154241] active_anon:248 inactive_anon:326 isolated_anon:0 >> [ 1274.154244] active_file:132 inactive_file:105 isolated_file:41 >> [ 1274.154247] unevictable:0 dirty:0 writeback:19 unstable:0 >> [ 1274.154250] free:1309 slab_reclaimable:642 slab_unreclaimable:3111 >> [ 1274.154254] mapped:100846 shmem:4 pagetables:1187 bounce:0 >> [ 1274.154313] DMA free:2036kB min:80kB low:100kB high:120kB active_anon:0kB inactive_anon:24kB active_file:20kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14752kB mlocked:0kB dirty:0kB writeback:0kB mapped:12804kB shmem:0kB slab_reclaimable:16kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >> [ 1274.154375] lowmem_reserve[]: 0 489 489 489 >> [ 1274.154415] DMA32 free:3200kB min:2788kB low:3484kB high:4180kB active_anon:992kB inactive_anon:1280kB active_file:508kB inactive_file:420kB unevictable:0kB isolated(anon):0kB isolated(file):164kB present:500960kB mlocked:0kB dirty:0kB writeback:76kB mapped:390580kB shmem:16kB slab_reclaimable:2552kB slab_unreclaimable:12404kB kernel_stack:592kB pagetables:4724kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:160 all_unreclaimable? no >> [ 1274.154481] lowmem_reserve[]: 0 0 0 0 >> [ 1274.154508] DMA: 7*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2036kB >> [ 1274.154571] DMA32: 409*4kB 33*8kB 2*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 3212kB >> [ 1274.154634] 429 total pagecache pages >> [ 1274.154646] 161 pages in swap cache >> [ 1274.154658] Swap cache stats: add 344422, delete 344260, find 99167/143153 >> [ 1274.154673] Free swap = 476756kB >> [ 1274.154684] Total swap = 524280kB >> [ 1274.160880] 131072 pages RAM >> [ 1274.160902] 21934 pages reserved >> [ 1274.160914] 101195 pages shared >> [ 1274.160925] 6309 pages non-shared >> [ 1274.160963] unable to allocate 185088 bytes for transfer buffer 4> Though here it says it is 185 kbytes. Hmm.. You got 3MB in DMA32 and 2MB > in DMA so that should be enough.> I am not that familiar with the VM, so the instinctive thing I can think > of is to raise the amount of memory your guest has from the 512MB to > 768MB. Does ''proc/meminfo'' when this happens show you an excedingly > small amount of MemFree?>> [ 1287.634682] motion invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 >> [ 1287.634719] motion cpuset=/ mems_allowed=0 >> >> >> >> >> Thursday, August 5, 2010, 4:52:14 PM, you wrote: >> >> > On Thu, Aug 05, 2010 at 11:48:44AM +0200, Sander Eikelenboom wrote: >> >> Hi Konrad/Jeremy, >> >> >> >> I have tested the last 2 days with the vm''s with passthroughed devices shutdown, and no freeze so far. >> >> I''m running now with one of the vm''s that runs an old 2.6.33 kernel from an old tree from Konrad together with some hacked up patches for xhci/usb3 support. >> >> That seems to be running fine for some time now (although not a full 2 days yet). >> >> >> >> So my other vm seems to cause the freeze. >> >> >> >> - This one uses the devel/merge.2.6.35-rc6.t2 as domU kernel, i think i should try an older version of pci-front/xen-swiotlb perhaps. >> >> - It has both a usb2 and usb3 controller passed through, but the xhci module has much changed since the hacked up patches from the kernel in de working domU vm >> >> - Most probably the drivers for the videograbbers will have changed >> >> >> >> So i suspect: >> >> - newer pci-front / xen-swiotlb >> >> - xhci/usb3 driver >> >> - drivers videograbber >> >> >> >> Most probable would be a roque dma transfer that can''t be catched by xen / pciback I guess, and therefore would be hard to debug ? >> >> > The SWIOTLB "brains" by themselves haven''t changed since the >> > uhh...2.6.33. The code internals that just got Ack-ed upstream looks quite >> > similar to the one that Jeremy carries in xen/stable-2.6.32.x. The >> > outside plumbing parts are the ones that changed. >> >> > The fixes in the pci-front, well, most of those are "burocractic" in >> > nature - set the ownership to this, make hotplug work, etc. The big >> > fixes were the MSI/MSI-X ones but those were big news a couple of months >> > ago (and I think that was when 2.6.34 came out). >> >> > The videograbber (vl4) stack trace you sent to me some time ago looked >> > liked a mutex was held for a very very long time... which I wonder if >> > that is the cmpxch compiler bug that has hit some folks. Are you using >> > Debian? >> >> > But we can do something easy. I can rebase my 2.6.33 kernel with the >> > latest Xen-SWIOTLB/SWIOTLB engine + Xen PCI front, and we can eliminate the >> > SWIOTLB/PCIfront being at fault here.. Let me do that if your 2.6.33 >> > VM guest is running fine for the last two days. >> >> >> >> >> -- >> Best regards, >> Sander mailto:linux@eikelenboom.it-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel