Ian Campbell
2010-Nov-23 11:51 UTC
[Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
Thanks for the report Vincent. I''ve added xen-devel to the CC as well as Cris Daniluk who previously reported a very similar issue[0] also on an R410 -- Cris did you ever get a resolution to your issue? Vincent''s full report is at: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603632 I''ve also attached the boot log here of which the interesting part looks to be: [ 8.422639] xen: acpi sci 9 [ 8.434217] Console: colour VGA+ 80x25 [ 8.441350] console [hvc0] enabled, bootconsole disabled [ 8.441350] console [hvc0] enabled, bootconsole disabled [ 8.462694] Xen: using vcpuop timer interface [ 8.471508] installing Xen timer for CPU 0 [ 8.479841] BUG: unable to handle kernel paging request at 0000000000005a08 [ 8.493868] IP: [<ffffffff810badce>] __alloc_pages_nodemask+0x8f/0x5f5 [ 8.507041] PGD 0 [ 8.511199] Thread overran stack, or stack corrupted [ 8.521253] Oops: 0000 [#1] SMP [ 8.527838] last sysfs file: [ 8.533941] CPU 0 [ 8.538100] Modules linked in: [ 8.544342] Pid: 0, comm: swapper Not tainted 2.6.32-5-xen-amd64 #1 PowerEdge R410 [ 8.559594] RIP: e030:[<ffffffff810badce>] [<ffffffff810badce>] __alloc_pages_nodemask+0x8f/0x5f5 [ 8.577620] RSP: e02b:ffffffff81443c88 EFLAGS: 00010046 [ 8.588366] RAX: 0000000000000000 RBX: 0000000000005220 RCX: 0000000000005a00 [ 8.602752] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000005220 [ 8.617139] RBP: 0000000000004020 R08: 0000000000000002 R09: ffff88003fc1c010 [ 8.631525] R10: ffffffff813c2700 R11: 00000000000186a0 R12: 0000000000005220 [ 8.645910] R13: 0000000000000002 R14: 0000000000000000 R15: ffff88000000da28 [ 8.660300] FS: 0000000000000000(0000) GS:ffff88000349b000(0000) knlGS:0000000000000000 [ 8.676591] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 8.688203] CR2: 0000000000005a08 CR3: 0000000001001000 CR4: 0000000000002660 [ 8.702589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8.716975] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 8.731361] Process swapper (pid: 0, threadinfo ffffffff81442000, task ffffffff814771f0) [ 8.747654] Stack: [ 8.751813] ffff88000000da00 00000010813c2765 00000000000212d0 00000000000186a0 [ 8.766199] <0> ffff88000000ac10 ffffffff8100e5b5 ffffffff8100ec72 00000000000186a0 [ 8.781625] <0> 00000000000186a0 0000000000000000 0000000000005a00 0000000000000000 [ 8.797572] Call Trace: [ 8.802603] [<ffffffff8100e5b5>] ? xen_force_evtchn_callback+0x9/0xa [ 8.815600] [<ffffffff8100ec72>] ? check_events+0x12/0x20 [ 8.826695] [<ffffffff810e759d>] ? new_slab+0x42/0x1ca [ 8.837267] [<ffffffff810e7915>] ? __slab_alloc+0x1f0/0x39b [ 8.848707] [<ffffffff812f87d8>] ? irq_to_desc_alloc_node+0x96/0x195 [ 8.861704] [<ffffffff810e85cb>] ? __kmalloc_node+0xe8/0x146 [ 8.873317] [<ffffffff812f87d8>] ? irq_to_desc_alloc_node+0x96/0x195 [ 8.886316] [<ffffffff812f87d8>] ? irq_to_desc_alloc_node+0x96/0x195 [ 8.899317] [<ffffffff811f24df>] ? find_unbound_irq+0x67/0xae [ 8.911103] [<ffffffff811f259e>] ? bind_virq_to_irq+0x78/0x126 [ 8.923062] [<ffffffff8100e5b5>] ? xen_force_evtchn_callback+0x9/0xa [ 8.936063] [<ffffffff8100e8f6>] ? xen_timer_interrupt+0x0/0x18d [ 8.948368] [<ffffffff811f29f6>] ? bind_virq_to_irqhandler+0x19/0x4a [ 8.961368] [<ffffffff8100e884>] ? xen_setup_timer+0x55/0xaa [ 8.972982] [<ffffffff81509a5e>] ? xen_time_init+0xaf/0xb5 [ 8.984247] [<ffffffff8150a491>] ? x86_late_time_init+0xa/0x10 [ 8.996206] [<ffffffff81506c3d>] ? start_kernel+0x348/0x3e8 [ 9.007646] [<ffffffff81508c7d>] ? xen_start_kernel+0x57c/0x581 [ 9.019777] Code: d8 c1 e8 13 83 e0 01 09 44 24 64 41 89 dc 44 23 25 28 01 43 00 44 89 e2 83 e2 10 89 54 24 5c 74 05 e8 16 03 25 00 48 8b 4c 24 50 <48> 83 79 08 00 0f 84 30 05 00 00 83 e3 0f 48 8b 44 24 50 41 bf [ 9.057561] RIP [<ffffffff810badce>] __alloc_pages_nodemask+0x8f/0x5f5 [ 9.070909] RSP <ffffffff81443c88> [ 9.078015] CR2: 0000000000005a08 [ 9.084780] ---[ end trace a7919e7f17c0a725 ]--- [ 9.094136] Kernel panic - not syncing: Attempted to kill the idle task! It''s worth noting that the Debian kernels are based on e73f4955a821f850f5b88c32d12a81714523a95f (less the GPU fixes merged by bcf16b6b4f34fb40a7aaf637947c7d3bce0be671, which the Debian kernel maintainer chose to exclude). The baseline is slightly old but Debian is now pretty deeply frozen so a wholesale rebase is not possible, if either of you have run a more recent kernel the result would be interesting to know. The actual crashing RIP corresponds to mm/page_alloc.c:1975 which is in __alloc_pages_nodemask: /* * Check the zones suitable for the gfp_mask contain at least one * valid zone. It''s possible to have an empty zonelist as a result * of GFP_THISNODE and a memoryless node */ if (unlikely(!zonelist->_zonerefs->zone)) return NULL; zonelist->_zonerefs is an array but looking at the disassembly and the register dump zonelist itself appears to be 0x5a00 which seems unlikely to be valid. The zonelist ultimately comes from node which is always passed as 0 in the outer most caller in this stack trace (find_unbound_irq calling irq_to_desc_alloc_node). I''m not sure but looking at the complete bootlog it looks as if the system may only have node==1 i.e. no 0 node which could plausibly lead to this sort of issue: [ 0.000000] Bootmem setup node 1 0000000000000000-0000000040000000 [ 0.000000] NODE_DATA [0000000000008000 - 000000000000ffff] [ 0.000000] bootmap [0000000000010000 - 0000000000017fff] pages 8 [ 0.000000] (8 early reservations) ==> bootmem [0000000000 - 0040000000] [ 0.000000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] [ 0.000000] #1 [0003446000 - 0003465000] XEN PAGETABLES ==> [0003446000 - 0003465000] [ 0.000000] #2 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] [ 0.000000] #3 [0001000000 - 0001694994] TEXT DATA BSS ==> [0001000000 - 0001694994] [ 0.000000] #4 [00016b5000 - 0003244e00] RAMDISK ==> [00016b5000 - 0003244e00] [ 0.000000] #5 [0003245000 - 0003446000] XEN START INFO ==> [0003245000 - 0003446000] [ 0.000000] #6 [0001695000 - 000169532d] BRK ==> [0001695000 - 000169532d] [ 0.000000] #7 [0000100000 - 00002e0000] PGTABLE ==> [0000100000 - 00002e0000] [ 0.000000] found SMP MP-table at [ffff8800000fe710] fe710 [ 0.000000] Zone PFN ranges: [ 0.000000] DMA 0x00000000 -> 0x00001000 [ 0.000000] DMA32 0x00001000 -> 0x00100000 [ 0.000000] Normal 0x00100000 -> 0x00100000 [ 0.000000] Movable zone start PFN for each node [ 0.000000] early_node_map[2] active PFN ranges [ 0.000000] 1: 0x00000000 -> 0x000000a0 [ 0.000000] 1: 0x00000100 -> 0x00040000 [ 0.000000] On node 1 totalpages: 262048 [ 0.000000] DMA zone: 56 pages used for memmap [ 0.000000] DMA zone: 483 pages reserved [ 0.000000] DMA zone: 3461 pages, LIFO batch:0 [ 0.000000] DMA32 zone: 3528 pages used for memmap [ 0.000000] DMA32 zone: 254520 pages, LIFO batch:31 Perhaps we should be passing numa_node_id() (e.g. current node) instead of node 0? There doesn''t seem to be another obvious alternative to passing in an explicit node number to this callchain (some places cope with -1 but not this path AFAICT). It''s also not obvious if dom0 should be seeing the tables which describe the hosts nodes anyway or if we should be clobbering something. Given that dom0 sees a pseudo-physical address map I''m not convinced seeing the real SRAT is in any way beneficial. Perhaps we should simply be clobbering NUMAness until actual PV understanding of NUMA is ready? One thing I notice when googling R410 issues is that they apparently have a "Cores per CPU" BIOS option which might be worth playing with, since configuring a reduced number of cores might remove node 0 but not node 1 (odd but not invalid?). Presumably it is also worth making sure you have the latest BIOS etc. It''s very much an outside possibility but it is also worth trying the packages at http://xenbits.xen.org/people/ianc/ which reinstates the changesets from bcf16b6b4f34fb40a7aaf637947c7d3bce0be671 Ian. [0] http://lists.xensource.com/archives/html/xen-devel/2010-06/msg01140.html On Tue, 2010-11-16 at 00:32 +0100, Vincent CARON wrote:> Package: linux-image-2.6.32-5-xen-amd64 > Version: 2.6.32-27 > Severity: important > > I just tried d-i 6beta1 and booted Squeeeze and its 2.6.32 kernel for > the first time on my usual server hardware (Dell R410). > > I opted for the xen-amd64 kernel, and it boots fine on bare metal. But > as soon as I tried to boot it as dom0 over Xen hypervisor, it BUG''s: > > [ 8.479841] BUG: unable to handle kernel paging request at > 0000000000005a08^M > [ 8.493868] IP: [<ffffffff810badce>] > __alloc_pages_nodemask+0x8f/0x5f5^M > > Then quickly oopses and panics. I tried various flags: > - upping dom0_mem from 256M to 1024M (I''ve been running Lenny/Xen 3.2 > with 256M happily for several months on the same hw) > - using Xen ''nommu'' > - using Linux nomodeset > > Then I followed instructions on a Xen wiki page to provide verbose > traces (although they do not look much more verbose than the regular > boot). > > I''m using an IPMI serial-over-lan console which appears as a regular > UART to Xen. > > I''m attaching a boot log to this report. > > -- System Information: > Debian Release: squeeze/sid > APT prefers testing > APT policy: (500, ''testing'') > Architecture: amd64 (x86_64) > > Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores) > Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) > Shell: /bin/sh linked to /bin/bash > > >-- Ian Campbell Current Noise: Wolf - Seize The Night If you will practice being fictional for a while, you will understand that fictional characters are sometimes more real than people with bodies and heartbeats. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cris Daniluk
2010-Nov-23 12:44 UTC
[Xen-devel] Re: PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
I was unable to, and this does look similar indeed. I tried a variety of pvops kernels and kernel configs and was unable to get past this. I never found resolution and eventually fell back to 3.4.3 w/a xenlinux kernel. Much less sexy but very stable on the same hardware. I also had related but different problems on IBM 3650 M2s and IBM 3500s with pvops kernels. It seems very prone to crashing at any APIC/ACPI bugs, of which there seem to be quite a bit of in both Dell and IBM. I was toying with the idea of downgrading BIOS''s based on the success someone else on xen-devel list reported with that, but I didn''t have the time to see that idea through. On Tue, Nov 23, 2010 at 6:51 AM, Ian Campbell <ijc@hellion.org.uk> wrote:> Thanks for the report Vincent. > > I''ve added xen-devel to the CC as well as Cris Daniluk who previously > reported a very similar issue[0] also on an R410 -- Cris did you ever > get a resolution to your issue? > > Vincent''s full report is at: > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=603632 > I''ve also attached the boot log here of which the interesting part looks > to be: > > [ 8.422639] xen: acpi sci 9 > [ 8.434217] Console: colour VGA+ 80x25 > [ 8.441350] console [hvc0] enabled, bootconsole disabled > [ 8.441350] console [hvc0] enabled, bootconsole disabled > [ 8.462694] Xen: using vcpuop timer interface > [ 8.471508] installing Xen timer for CPU 0 > [ 8.479841] BUG: unable to handle kernel paging request at > 0000000000005a08 > [ 8.493868] IP: [<ffffffff810badce>] > __alloc_pages_nodemask+0x8f/0x5f5 > [ 8.507041] PGD 0 > [ 8.511199] Thread overran stack, or stack corrupted > [ 8.521253] Oops: 0000 [#1] SMP > [ 8.527838] last sysfs file: > [ 8.533941] CPU 0 > [ 8.538100] Modules linked in: > [ 8.544342] Pid: 0, comm: swapper Not tainted 2.6.32-5-xen-amd64 > #1 PowerEdge R410 > [ 8.559594] RIP: e030:[<ffffffff810badce>] [<ffffffff810badce>] > __alloc_pages_nodemask+0x8f/0x5f5 > [ 8.577620] RSP: e02b:ffffffff81443c88 EFLAGS: 00010046 > [ 8.588366] RAX: 0000000000000000 RBX: 0000000000005220 RCX: > 0000000000005a00 > [ 8.602752] RDX: 0000000000000000 RSI: 0000000000000002 RDI: > 0000000000005220 > [ 8.617139] RBP: 0000000000004020 R08: 0000000000000002 R09: > ffff88003fc1c010 > [ 8.631525] R10: ffffffff813c2700 R11: 00000000000186a0 R12: > 0000000000005220 > [ 8.645910] R13: 0000000000000002 R14: 0000000000000000 R15: > ffff88000000da28 > [ 8.660300] FS: 0000000000000000(0000) GS:ffff88000349b000(0000) > knlGS:0000000000000000 > [ 8.676591] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 8.688203] CR2: 0000000000005a08 CR3: 0000000001001000 CR4: > 0000000000002660 > [ 8.702589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 8.716975] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 8.731361] Process swapper (pid: 0, threadinfo ffffffff81442000, > task ffffffff814771f0) > [ 8.747654] Stack: > [ 8.751813] ffff88000000da00 00000010813c2765 00000000000212d0 > 00000000000186a0 > [ 8.766199] <0> ffff88000000ac10 ffffffff8100e5b5 > ffffffff8100ec72 00000000000186a0 > [ 8.781625] <0> 00000000000186a0 0000000000000000 > 0000000000005a00 0000000000000000 > [ 8.797572] Call Trace: > [ 8.802603] [<ffffffff8100e5b5>] ? > xen_force_evtchn_callback+0x9/0xa > [ 8.815600] [<ffffffff8100ec72>] ? check_events+0x12/0x20 > [ 8.826695] [<ffffffff810e759d>] ? new_slab+0x42/0x1ca > [ 8.837267] [<ffffffff810e7915>] ? __slab_alloc+0x1f0/0x39b > [ 8.848707] [<ffffffff812f87d8>] ? > irq_to_desc_alloc_node+0x96/0x195 > [ 8.861704] [<ffffffff810e85cb>] ? __kmalloc_node+0xe8/0x146 > [ 8.873317] [<ffffffff812f87d8>] ? > irq_to_desc_alloc_node+0x96/0x195 > [ 8.886316] [<ffffffff812f87d8>] ? > irq_to_desc_alloc_node+0x96/0x195 > [ 8.899317] [<ffffffff811f24df>] ? find_unbound_irq+0x67/0xae > [ 8.911103] [<ffffffff811f259e>] ? bind_virq_to_irq+0x78/0x126 > [ 8.923062] [<ffffffff8100e5b5>] ? > xen_force_evtchn_callback+0x9/0xa > [ 8.936063] [<ffffffff8100e8f6>] ? xen_timer_interrupt+0x0/0x18d > [ 8.948368] [<ffffffff811f29f6>] ? > bind_virq_to_irqhandler+0x19/0x4a > [ 8.961368] [<ffffffff8100e884>] ? xen_setup_timer+0x55/0xaa > [ 8.972982] [<ffffffff81509a5e>] ? xen_time_init+0xaf/0xb5 > [ 8.984247] [<ffffffff8150a491>] ? x86_late_time_init+0xa/0x10 > [ 8.996206] [<ffffffff81506c3d>] ? start_kernel+0x348/0x3e8 > [ 9.007646] [<ffffffff81508c7d>] ? xen_start_kernel+0x57c/0x581 > [ 9.019777] Code: d8 c1 e8 13 83 e0 01 09 44 24 64 41 89 dc 44 23 > 25 28 01 43 00 44 89 e2 83 e2 10 89 54 24 5c 74 05 e8 16 03 25 00 48 8b 4c > 24 50 <48> 83 79 08 00 0f 84 30 05 00 00 83 e3 0f 48 8b 44 24 50 41 bf > [ 9.057561] RIP [<ffffffff810badce>] > __alloc_pages_nodemask+0x8f/0x5f5 > [ 9.070909] RSP <ffffffff81443c88> > [ 9.078015] CR2: 0000000000005a08 > [ 9.084780] ---[ end trace a7919e7f17c0a725 ]--- > [ 9.094136] Kernel panic - not syncing: Attempted to kill the > idle task! > > It''s worth noting that the Debian kernels are based on > e73f4955a821f850f5b88c32d12a81714523a95f (less the GPU fixes merged by > bcf16b6b4f34fb40a7aaf637947c7d3bce0be671, which the Debian kernel > maintainer chose to exclude). > > The baseline is slightly old but Debian is now pretty deeply frozen so a > wholesale rebase is not possible, if either of you have run a more > recent kernel the result would be interesting to know. > > The actual crashing RIP corresponds to mm/page_alloc.c:1975 which is in > __alloc_pages_nodemask: > > /* > * Check the zones suitable for the gfp_mask contain at least one > * valid zone. It''s possible to have an empty zonelist as a result > * of GFP_THISNODE and a memoryless node > */ > if (unlikely(!zonelist->_zonerefs->zone)) > return NULL; > > zonelist->_zonerefs is an array but looking at the disassembly and the > register dump zonelist itself appears to be 0x5a00 which seems unlikely > to be valid. > > The zonelist ultimately comes from node which is always passed as 0 in > the outer most caller in this stack trace (find_unbound_irq calling > irq_to_desc_alloc_node). > > I''m not sure but looking at the complete bootlog it looks as if the > system may only have node==1 i.e. no 0 node which could plausibly lead > to this sort of issue: > [ 0.000000] Bootmem setup node 1 > 0000000000000000-0000000040000000 > [ 0.000000] NODE_DATA [0000000000008000 - 000000000000ffff] > [ 0.000000] bootmap [0000000000010000 - 0000000000017fff] > pages 8 > [ 0.000000] (8 early reservations) ==> bootmem [0000000000 - > 0040000000] > [ 0.000000] #0 [0000000000 - 0000001000] BIOS data page ==> > [0000000000 - 0000001000] > [ 0.000000] #1 [0003446000 - 0003465000] XEN PAGETABLES ==> > [0003446000 - 0003465000] > [ 0.000000] #2 [0000006000 - 0000008000] TRAMPOLINE ==> > [0000006000 - 0000008000] > [ 0.000000] #3 [0001000000 - 0001694994] TEXT DATA BSS ==> > [0001000000 - 0001694994] > [ 0.000000] #4 [00016b5000 - 0003244e00] RAMDISK ==> > [00016b5000 - 0003244e00] > [ 0.000000] #5 [0003245000 - 0003446000] XEN START INFO ==> > [0003245000 - 0003446000] > [ 0.000000] #6 [0001695000 - 000169532d] BRK ==> > [0001695000 - 000169532d] > [ 0.000000] #7 [0000100000 - 00002e0000] PGTABLE ==> > [0000100000 - 00002e0000] > [ 0.000000] found SMP MP-table at [ffff8800000fe710] fe710 > [ 0.000000] Zone PFN ranges: > [ 0.000000] DMA 0x00000000 -> 0x00001000 > [ 0.000000] DMA32 0x00001000 -> 0x00100000 > [ 0.000000] Normal 0x00100000 -> 0x00100000 > [ 0.000000] Movable zone start PFN for each node > [ 0.000000] early_node_map[2] active PFN ranges > [ 0.000000] 1: 0x00000000 -> 0x000000a0 > [ 0.000000] 1: 0x00000100 -> 0x00040000 > [ 0.000000] On node 1 totalpages: 262048 > [ 0.000000] DMA zone: 56 pages used for memmap > [ 0.000000] DMA zone: 483 pages reserved > [ 0.000000] DMA zone: 3461 pages, LIFO batch:0 > [ 0.000000] DMA32 zone: 3528 pages used for memmap > [ 0.000000] DMA32 zone: 254520 pages, LIFO batch:31 > > Perhaps we should be passing numa_node_id() (e.g. current node) instead > of node 0? There doesn''t seem to be another obvious alternative to > passing in an explicit node number to this callchain (some places cope > with -1 but not this path AFAICT). > > It''s also not obvious if dom0 should be seeing the tables which describe > the hosts nodes anyway or if we should be clobbering something. Given > that dom0 sees a pseudo-physical address map I''m not convinced seeing > the real SRAT is in any way beneficial. Perhaps we should simply be > clobbering NUMAness until actual PV understanding of NUMA is ready? > > One thing I notice when googling R410 issues is that they apparently > have a "Cores per CPU" BIOS option which might be worth playing with, > since configuring a reduced number of cores might remove node 0 but not > node 1 (odd but not invalid?). Presumably it is also worth making sure > you have the latest BIOS etc. > > It''s very much an outside possibility but it is also worth trying the > packages at http://xenbits.xen.org/people/ianc/ which reinstates the > changesets from bcf16b6b4f34fb40a7aaf637947c7d3bce0be671 > > Ian. > > [0] > http://lists.xensource.com/archives/html/xen-devel/2010-06/msg01140.html > > On Tue, 2010-11-16 at 00:32 +0100, Vincent CARON wrote: > > Package: linux-image-2.6.32-5-xen-amd64 > > Version: 2.6.32-27 > > Severity: important > > > > I just tried d-i 6beta1 and booted Squeeeze and its 2.6.32 kernel for > > the first time on my usual server hardware (Dell R410). > > > > I opted for the xen-amd64 kernel, and it boots fine on bare metal. But > > as soon as I tried to boot it as dom0 over Xen hypervisor, it BUG''s: > > > > [ 8.479841] BUG: unable to handle kernel paging request at > > 0000000000005a08^M > > [ 8.493868] IP: [<ffffffff810badce>] > > __alloc_pages_nodemask+0x8f/0x5f5^M > > > > Then quickly oopses and panics. I tried various flags: > > - upping dom0_mem from 256M to 1024M (I''ve been running Lenny/Xen 3.2 > > with 256M happily for several months on the same hw) > > - using Xen ''nommu'' > > - using Linux nomodeset > > > > Then I followed instructions on a Xen wiki page to provide verbose > > traces (although they do not look much more verbose than the regular > > boot). > > > > I''m using an IPMI serial-over-lan console which appears as a regular > > UART to Xen. > > > > I''m attaching a boot log to this report. > > > > -- System Information: > > Debian Release: squeeze/sid > > APT prefers testing > > APT policy: (500, ''testing'') > > Architecture: amd64 (x86_64) > > > > Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores) > > Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) > > Shell: /bin/sh linked to /bin/bash > > > > > > > > -- > Ian Campbell > Current Noise: Wolf - Seize The Night > > If you will practice being fictional for a while, you will understand that > fictional characters are sometimes more real than people with bodies and > heartbeats. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Nov-23 18:24 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On 11/23/2010 03:51 AM, Ian Campbell wrote:> I''m not sure but looking at the complete bootlog it looks as if the > system may only have node==1 i.e. no 0 node which could plausibly lead > to this sort of issue: > [ 0.000000] Bootmem setup node 1 0000000000000000-0000000040000000 > [ 0.000000] NODE_DATA [0000000000008000 - 000000000000ffff] > [ 0.000000] bootmap [0000000000010000 - 0000000000017fff] pages 8 > [ 0.000000] (8 early reservations) ==> bootmem [0000000000 - 0040000000] > [ 0.000000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] > [ 0.000000] #1 [0003446000 - 0003465000] XEN PAGETABLES ==> [0003446000 - 0003465000] > [ 0.000000] #2 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] > [ 0.000000] #3 [0001000000 - 0001694994] TEXT DATA BSS ==> [0001000000 - 0001694994] > [ 0.000000] #4 [00016b5000 - 0003244e00] RAMDISK ==> [00016b5000 - 0003244e00] > [ 0.000000] #5 [0003245000 - 0003446000] XEN START INFO ==> [0003245000 - 0003446000] > [ 0.000000] #6 [0001695000 - 000169532d] BRK ==> [0001695000 - 000169532d] > [ 0.000000] #7 [0000100000 - 00002e0000] PGTABLE ==> [0000100000 - 00002e0000] > [ 0.000000] found SMP MP-table at [ffff8800000fe710] fe710 > [ 0.000000] Zone PFN ranges: > [ 0.000000] DMA 0x00000000 -> 0x00001000 > [ 0.000000] DMA32 0x00001000 -> 0x00100000 > [ 0.000000] Normal 0x00100000 -> 0x00100000 > [ 0.000000] Movable zone start PFN for each node > [ 0.000000] early_node_map[2] active PFN ranges > [ 0.000000] 1: 0x00000000 -> 0x000000a0 > [ 0.000000] 1: 0x00000100 -> 0x00040000 > [ 0.000000] On node 1 totalpages: 262048 > [ 0.000000] DMA zone: 56 pages used for memmap > [ 0.000000] DMA zone: 483 pages reserved > [ 0.000000] DMA zone: 3461 pages, LIFO batch:0 > [ 0.000000] DMA32 zone: 3528 pages used for memmap > [ 0.000000] DMA32 zone: 254520 pages, LIFO batch:31 > > Perhaps we should be passing numa_node_id() (e.g. current node) instead > of node 0? There doesn''t seem to be another obvious alternative to > passing in an explicit node number to this callchain (some places cope > with -1 but not this path AFAICT).Does booting native get the same configuration?> It''s also not obvious if dom0 should be seeing the tables which describe > the hosts nodes anyway or if we should be clobbering something. Given > that dom0 sees a pseudo-physical address map I''m not convinced seeing > the real SRAT is in any way beneficial. Perhaps we should simply be > clobbering NUMAness until actual PV understanding of NUMA is ready?Yes, the host SRAT is meaningless in the domain and we really should ignore it. I''m not sure what happens if you boot on a really NUMA system.> One thing I notice when googling R410 issues is that they apparently > have a "Cores per CPU" BIOS option which might be worth playing with, > since configuring a reduced number of cores might remove node 0 but not > node 1 (odd but not invalid?). Presumably it is also worth making sure > you have the latest BIOS etc.Also, what''s the DIMM configuration? Are the slots fully populated? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Nov-23 18:52 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Tue, 2010-11-23 at 11:51 +0000, Ian Campbell wrote:> > Perhaps we should be passing numa_node_id() (e.g. current node) > instead of node 0?I''ve just kicked off a build of the 2.6.32-27 Debian kernel with the following additional patch, I will hopefully post the binaries tomorrow. If you already have the capability to build a custom kernel in place you might like to try it before then. Ian. diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 7b29ae1..868b172 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -418,7 +418,7 @@ static int find_unbound_irq(void) if (irq == start) goto no_irqs; - desc = irq_to_desc_alloc_node(irq, 0); + desc = irq_to_desc_alloc_node(irq, numa_node_id()); if (WARN_ON(desc == NULL)) return -1; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Nov-23 22:12 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Tue, 2010-11-23 at 18:52 +0000, Ian Campbell wrote:> On Tue, 2010-11-23 at 11:51 +0000, Ian Campbell wrote: > > > > Perhaps we should be passing numa_node_id() (e.g. current node) > > instead of node 0? > > I''ve just kicked off a build of the 2.6.32-27 Debian kernel with the > following additional patch, I will hopefully post the binaries tomorrow.Build was quicker than I thought... Vincent, Cris if you get a chance please can you test the kernel from: http://xenbits.xen.org/people/ianc/2.6.32-27+numa1/ Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Nov-23 22:18 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Tue, 2010-11-23 at 22:12 +0000, Ian Campbell wrote:> On Tue, 2010-11-23 at 18:52 +0000, Ian Campbell wrote: > > On Tue, 2010-11-23 at 11:51 +0000, Ian Campbell wrote: > > > > > > Perhaps we should be passing numa_node_id() (e.g. current node) > > > instead of node 0? > > > > I''ve just kicked off a build of the 2.6.32-27 Debian kernel with the > > following additional patch, I will hopefully post the binaries tomorrow. > > Build was quicker than I thought... Vincent, Cris if you get a chance > please can you test the kernel from: > http://xenbits.xen.org/people/ianc/2.6.32-27+numa1/Also, please can you try adding "numa=noacpi" to your kernel command line when running with the standard Debian kernel (not the one above). Thanks! Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vincent Caron
2010-Nov-25 12:51 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Tue, 2010-11-23 at 10:24 -0800, Jeremy Fitzhardinge wrote:> On 11/23/2010 03:51 AM, Ian Campbell wrote: > > I''m not sure but looking at the complete bootlog it looks as if the > > system may only have node==1 i.e. no 0 node which could plausibly lead > > to this sort of issue: > > [ 0.000000] Bootmem setup node 1 0000000000000000-0000000040000000 > > [ 0.000000] NODE_DATA [0000000000008000 - 000000000000ffff] > > [ 0.000000] bootmap [0000000000010000 - 0000000000017fff] pages 8 > > [ 0.000000] (8 early reservations) ==> bootmem [0000000000 - 0040000000] > > [ 0.000000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] > > [ 0.000000] #1 [0003446000 - 0003465000] XEN PAGETABLES ==> [0003446000 - 0003465000] > > [ 0.000000] #2 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] > > [ 0.000000] #3 [0001000000 - 0001694994] TEXT DATA BSS ==> [0001000000 - 0001694994] > > [ 0.000000] #4 [00016b5000 - 0003244e00] RAMDISK ==> [00016b5000 - 0003244e00] > > [ 0.000000] #5 [0003245000 - 0003446000] XEN START INFO ==> [0003245000 - 0003446000] > > [ 0.000000] #6 [0001695000 - 000169532d] BRK ==> [0001695000 - 000169532d] > > [ 0.000000] #7 [0000100000 - 00002e0000] PGTABLE ==> [0000100000 - 00002e0000] > > [ 0.000000] found SMP MP-table at [ffff8800000fe710] fe710 > > [ 0.000000] Zone PFN ranges: > > [ 0.000000] DMA 0x00000000 -> 0x00001000 > > [ 0.000000] DMA32 0x00001000 -> 0x00100000 > > [ 0.000000] Normal 0x00100000 -> 0x00100000 > > [ 0.000000] Movable zone start PFN for each node > > [ 0.000000] early_node_map[2] active PFN ranges > > [ 0.000000] 1: 0x00000000 -> 0x000000a0 > > [ 0.000000] 1: 0x00000100 -> 0x00040000 > > [ 0.000000] On node 1 totalpages: 262048 > > [ 0.000000] DMA zone: 56 pages used for memmap > > [ 0.000000] DMA zone: 483 pages reserved > > [ 0.000000] DMA zone: 3461 pages, LIFO batch:0 > > [ 0.000000] DMA32 zone: 3528 pages used for memmap > > [ 0.000000] DMA32 zone: 254520 pages, LIFO batch:31 > > > > Perhaps we should be passing numa_node_id() (e.g. current node) instead > > of node 0? There doesn''t seem to be another obvious alternative to > > passing in an explicit node number to this callchain (some places cope > > with -1 but not this path AFAICT). > > Does booting native get the same configuration?Booting native with the same Xen-enabled kernel gives: [ 0.000000] Bootmem setup node 0 0000000130000000-0000000230000000 [ 0.000000] NODE_DATA [0000000130000000 - 0000000130007fff] [ 0.000000] bootmap [0000000130008000 - 0000000130027fff] pages 20 [ 0.000000] (8 early reservations) ==> bootmem [0130000000 - 0230000000] [ 0.000000] #0 [0000000000 - 0000001000] BIOS data page [ 0.000000] #1 [0000006000 - 0000008000] TRAMPOLINE [ 0.000000] #2 [0001000000 - 0001694994] TEXT DATA BSS [ 0.000000] #3 [0037656000 - 0037fefb18] RAMDISK [ 0.000000] #4 [000009ec00 - 0000100000] BIOS reserved [ 0.000000] #5 [0001695000 - 000169532d] BRK [ 0.000000] #6 [0000008000 - 000000c000] PGTABLE [ 0.000000] #7 [000000c000 - 0000011000] PGTABLE [ 0.000000] Bootmem setup node 1 0000000000000000-0000000130000000 [ 0.000000] NODE_DATA [0000000000011000 - 0000000000018fff] [ 0.000000] bootmap [0000000000019000 - 000000000003efff] pages 26 [ 0.000000] (8 early reservations) ==> bootmem [0000000000 - 0130000000] [ 0.000000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] [ 0.000000] #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] [ 0.000000] #2 [0001000000 - 0001694994] TEXT DATA BSS ==> [0001000000 - 0001694994] [ 0.000000] #3 [0037656000 - 0037fefb18] RAMDISK ==> [0037656000 - 0037fefb18] [ 0.000000] #4 [000009ec00 - 0000100000] BIOS reserved ==> [000009ec00 - 0000100000] [ 0.000000] #5 [0001695000 - 000169532d] BRK ==> [0001695000 - 000169532d] [ 0.000000] #6 [0000008000 - 000000c000] PGTABLE ==> [0000008000 - 000000c000] [ 0.000000] #7 [000000c000 - 0000011000] PGTABLE ==> [000000c000 - 0000011000] [ 0.000000] found SMP MP-table at [ffff8800000fe710] fe710 [ 0.000000] [ffffea0004280000-ffffea00043fffff] potential offnode page_structs [ 0.000000] [ffffea0000000000-ffffea00043fffff] PMD -> [ffff880001800000-ffff8800051fffff] on node 1 [ 0.000000] [ffffea0004400000-ffffea0007bfffff] PMD -> [ffff880130200000-ffff8801339fffff] on node 0 [ 0.000000] Zone PFN ranges: [ 0.000000] DMA 0x00000000 -> 0x00001000 [ 0.000000] DMA32 0x00001000 -> 0x00100000 [ 0.000000] Normal 0x00100000 -> 0x00230000 [ 0.000000] Movable zone start PFN for each node [ 0.000000] early_node_map[4] active PFN ranges [ 0.000000] 1: 0x00000000 -> 0x000000a0 [ 0.000000] 1: 0x00000100 -> 0x000cf679 [ 0.000000] 1: 0x00100000 -> 0x00130000 [ 0.000000] 0: 0x00130000 -> 0x00230000 [ 0.000000] On node 0 totalpages: 1048576 [ 0.000000] Normal zone: 14336 pages used for memmap [ 0.000000] Normal zone: 1034240 pages, LIFO batch:31 [ 0.000000] On node 1 totalpages: 1046041 [ 0.000000] DMA zone: 56 pages used for memmap [ 0.000000] DMA zone: 109 pages reserved [ 0.000000] DMA zone: 3835 pages, LIFO batch:0 [ 0.000000] DMA32 zone: 14280 pages used for memmap [ 0.000000] DMA32 zone: 831153 pages, LIFO batch:31 [ 0.000000] Normal zone: 2688 pages used for memmap [ 0.000000] Normal zone: 193920 pages, LIFO batch:31> > It''s also not obvious if dom0 should be seeing the tables which describe > > the hosts nodes anyway or if we should be clobbering something. Given > > that dom0 sees a pseudo-physical address map I''m not convinced seeing > > the real SRAT is in any way beneficial. Perhaps we should simply be > > clobbering NUMAness until actual PV understanding of NUMA is ready? > > Yes, the host SRAT is meaningless in the domain and we really should > ignore it. I''m not sure what happens if you boot on a really NUMA system. > > > One thing I notice when googling R410 issues is that they apparently > > have a "Cores per CPU" BIOS option which might be worth playing with, > > since configuring a reduced number of cores might remove node 0 but not > > node 1 (odd but not invalid?). Presumably it is also worth making sure > > you have the latest BIOS etc. > > Also, what''s the DIMM configuration? Are the slots fully populated?8 slots, 4 populated; slots #0, #1, #4 and #5 populated with 2GiB dimms (according to lshw, setup by Dell). I switched off hyperthreading in the BIOS settings (default is ''on''), I had issues with Xen 3.2 on this topic (related to floating vcpus, which I had to pin to fix random crashes). Also I don''t think HT is significant for my usage. I''m used to see strange bugs as soon as I tweak Dell BIOSes, so I thought I''d mention that. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vincent Caron
2010-Nov-25 13:29 UTC
[Xen-devel] Re: PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Tue, 2010-11-23 at 11:51 +0000, Ian Campbell wrote:> > One thing I notice when googling R410 issues is that they apparently > have a "Cores per CPU" BIOS option which might be worth playing with, > since configuring a reduced number of cores might remove node 0 but not > node 1 (odd but not invalid?). Presumably it is also worth making sure > you have the latest BIOS etc.Indeed, I tweaked one BIOS setting, I disabled hyperthreading. The R410 BIOS has a CPU menu with (among other options): - Logical Processor: this is ''HT'', default is enabled, I have disabled it. - Virtualization Technology: default is disabled. Kept disabled. - Number of Cores per Processor: default is ''All''. Options are single/dual/all. Kept ''all'' defalut. I just tried to boot Xen 4.0 hypervisor after reverting to the factory default ''HT enabled'' (aka Logical Processor: Yes), same bug. Then I swithed off HT again, and enabled ''Virtualization Technology'' (setting which I totally overlooked, but why is it disabled as default ?). Same bug.> It''s very much an outside possibility but it is also worth trying the > packages at http://xenbits.xen.org/people/ianc/ which reinstates the > changesets from bcf16b6b4f34fb40a7aaf637947c7d3bce0be671OK, should I pick the one in http://xenbits.xen.org/people/ianc/squeeze/ ? (I''m running Debian Squeeze on this machine) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Nov-25 14:44 UTC
[Xen-devel] Re: PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Thu, 2010-11-25 at 14:29 +0100, Vincent Caron wrote:> On Tue, 2010-11-23 at 11:51 +0000, Ian Campbell wrote: > > > > One thing I notice when googling R410 issues is that they apparently > > have a "Cores per CPU" BIOS option which might be worth playing with, > > since configuring a reduced number of cores might remove node 0 but not > > node 1 (odd but not invalid?). Presumably it is also worth making sure > > you have the latest BIOS etc. > > Indeed, I tweaked one BIOS setting, I disabled hyperthreading. > > The R410 BIOS has a CPU menu with (among other options): > > - Logical Processor: this is ''HT'', default is enabled, I have disabled > it. > - Virtualization Technology: default is disabled. Kept disabled. > - Number of Cores per Processor: default is ''All''. Options are > single/dual/all. Kept ''all'' defalut. > > I just tried to boot Xen 4.0 hypervisor after reverting to the factory > default ''HT enabled'' (aka Logical Processor: Yes), same bug. > > Then I swithed off HT again, and enabled ''Virtualization > Technology'' (setting which I totally overlooked, but why is it disabled > as default ?). Same bug.Oh well, thanks for trying it!> > It''s very much an outside possibility but it is also worth trying the > > packages at http://xenbits.xen.org/people/ianc/ which reinstates the > > changesets from bcf16b6b4f34fb40a7aaf637947c7d3bce0be671 > > OK, should I pick the one in > http://xenbits.xen.org/people/ianc/squeeze/ ? (I''m running Debian > Squeeze on this machine)Yes but that''s a bit of a long shot (those packages only put back some GPU related fixes. It would be more useful to test the packages at http://xenbits.xen.org/people/ianc/2.6.32-27+numa1/ first. Ian. -- Ian Campbell Current Noise: Candlemass - The Well Of Souls (Live) "Pok pok pok, P''kok!" -- Superchicken _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Nov-25 15:01 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Thu, 2010-11-25 at 13:51 +0100, Vincent Caron wrote:> On Tue, 2010-11-23 at 10:24 -0800, Jeremy Fitzhardinge wrote: > > On 11/23/2010 03:51 AM, Ian Campbell wrote: > > > I''m not sure but looking at the complete bootlog it looks as if the > > > system may only have node==1 i.e. no 0 node which could plausibly lead > > > to this sort of issue: > > > [...] > > > Perhaps we should be passing numa_node_id() (e.g. current node) instead > > > of node 0? There doesn''t seem to be another obvious alternative to > > > passing in an explicit node number to this callchain (some places cope > > > with -1 but not this path AFAICT). > > > > Does booting native get the same configuration? > > Booting native with the same Xen-enabled kernel gives:[...] Which looks to be pretty much completely different (this was the exact same system and BIOS settings as the original Xen log, right?). The diff is below but it''s so different as to be nearly unreadable. I''m surprised at this but I''ve not been following NUMA on Xen. Does the hypervisor deliberately munge the SRAT for PV guests in some way? Ian. --- /home/ianc/native.txt 2010-11-25 14:46:54.458656000 +0000 +++ /home/ianc/xen.txt 2010-11-25 14:46:31.745482000 +0000 @@ -1,51 +1,27 @@ - -[ 0.000000] Bootmem setup node 0 0000000130000000-0000000230000000 -[ 0.000000] NODE_DATA [0000000130000000 - 0000000130007fff] -[ 0.000000] bootmap [0000000130008000 - 0000000130027fff] pages 20 -[ 0.000000] (8 early reservations) ==> bootmem [0130000000 - 0230000000] -[ 0.000000] #0 [0000000000 - 0000001000] BIOS data page -[ 0.000000] #1 [0000006000 - 0000008000] TRAMPOLINE -[ 0.000000] #2 [0001000000 - 0001694994] TEXT DATA BSS -[ 0.000000] #3 [0037656000 - 0037fefb18] RAMDISK -[ 0.000000] #4 [000009ec00 - 0000100000] BIOS reserved -[ 0.000000] #5 [0001695000 - 000169532d] BRK -[ 0.000000] #6 [0000008000 - 000000c000] PGTABLE -[ 0.000000] #7 [000000c000 - 0000011000] PGTABLE -[ 0.000000] Bootmem setup node 1 0000000000000000-0000000130000000 -[ 0.000000] NODE_DATA [0000000000011000 - 0000000000018fff] -[ 0.000000] bootmap [0000000000019000 - 000000000003efff] pages 26 -[ 0.000000] (8 early reservations) ==> bootmem [0000000000 - 0130000000] +[ 0.000000] Bootmem setup node 1 0000000000000000-0000000040000000 +[ 0.000000] NODE_DATA [0000000000008000 - 000000000000ffff] +[ 0.000000] bootmap [0000000000010000 - 0000000000017fff] pages 8 +[ 0.000000] (8 early reservations) ==> bootmem [0000000000 - 0040000000] [ 0.000000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] -[ 0.000000] #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] -[ 0.000000] #2 [0001000000 - 0001694994] TEXT DATA BSS ==> [0001000000 - 0001694994] -[ 0.000000] #3 [0037656000 - 0037fefb18] RAMDISK ==> [0037656000 - 0037fefb18] -[ 0.000000] #4 [000009ec00 - 0000100000] BIOS reserved ==> [000009ec00 - 0000100000] -[ 0.000000] #5 [0001695000 - 000169532d] BRK ==> [0001695000 - 000169532d] -[ 0.000000] #6 [0000008000 - 000000c000] PGTABLE ==> [0000008000 - 000000c000] -[ 0.000000] #7 [000000c000 - 0000011000] PGTABLE ==> [000000c000 - 0000011000] +[ 0.000000] #1 [0003446000 - 0003465000] XEN PAGETABLES ==> [0003446000 - 0003465000] +[ 0.000000] #2 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] +[ 0.000000] #3 [0001000000 - 0001694994] TEXT DATA BSS ==> [0001000000 - 0001694994] +[ 0.000000] #4 [00016b5000 - 0003244e00] RAMDISK ==> [00016b5000 - 0003244e00] +[ 0.000000] #5 [0003245000 - 0003446000] XEN START INFO ==> [0003245000 - 0003446000] +[ 0.000000] #6 [0001695000 - 000169532d] BRK ==> [0001695000 - 000169532d] +[ 0.000000] #7 [0000100000 - 00002e0000] PGTABLE ==> [0000100000 - 00002e0000] [ 0.000000] found SMP MP-table at [ffff8800000fe710] fe710 -[ 0.000000] [ffffea0004280000-ffffea00043fffff] potential offnode page_structs -[ 0.000000] [ffffea0000000000-ffffea00043fffff] PMD -> [ffff880001800000-ffff8800051fffff] on node 1 -[ 0.000000] [ffffea0004400000-ffffea0007bfffff] PMD -> [ffff880130200000-ffff8801339fffff] on node 0 [ 0.000000] Zone PFN ranges: [ 0.000000] DMA 0x00000000 -> 0x00001000 [ 0.000000] DMA32 0x00001000 -> 0x00100000 -[ 0.000000] Normal 0x00100000 -> 0x00230000 +[ 0.000000] Normal 0x00100000 -> 0x00100000 [ 0.000000] Movable zone start PFN for each node -[ 0.000000] early_node_map[4] active PFN ranges +[ 0.000000] early_node_map[2] active PFN ranges [ 0.000000] 1: 0x00000000 -> 0x000000a0 -[ 0.000000] 1: 0x00000100 -> 0x000cf679 -[ 0.000000] 1: 0x00100000 -> 0x00130000 -[ 0.000000] 0: 0x00130000 -> 0x00230000 -[ 0.000000] On node 0 totalpages: 1048576 -[ 0.000000] Normal zone: 14336 pages used for memmap -[ 0.000000] Normal zone: 1034240 pages, LIFO batch:31 -[ 0.000000] On node 1 totalpages: 1046041 +[ 0.000000] 1: 0x00000100 -> 0x00040000 +[ 0.000000] On node 1 totalpages: 262048 [ 0.000000] DMA zone: 56 pages used for memmap -[ 0.000000] DMA zone: 109 pages reserved -[ 0.000000] DMA zone: 3835 pages, LIFO batch:0 -[ 0.000000] DMA32 zone: 14280 pages used for memmap -[ 0.000000] DMA32 zone: 831153 pages, LIFO batch:31 -[ 0.000000] Normal zone: 2688 pages used for memmap -[ 0.000000] Normal zone: 193920 pages, LIFO batch:31 - +[ 0.000000] DMA zone: 483 pages reserved +[ 0.000000] DMA zone: 3461 pages, LIFO batch:0 +[ 0.000000] DMA32 zone: 3528 pages used for memmap +[ 0.000000] DMA32 zone: 254520 pages, LIFO batch:31 -- Ian Campbell Current Noise: Candlemass - At The Gallows End (Studio Outtake) But it does move! -- Galileo Galilei _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vincent Caron
2010-Nov-25 15:49 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Thu, 2010-11-25 at 15:01 +0000, Ian Campbell wrote:> > > > Booting native with the same Xen-enabled kernel gives: > [...] > > Which looks to be pretty much completely different (this was the exact > same system and BIOS settings as the original Xen log, right?). > > The diff is below but it''s so different as to be nearly unreadable.I''ve just rebooted my Squeeze with and without the hypervisor with no BIOS tweaking at all to be sure. Enclosed are the full logs for both cases. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vincent Caron
2010-Nov-25 16:38 UTC
[Xen-devel] Re: PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Thu, 2010-11-25 at 14:44 +0000, Ian Campbell wrote:> > > OK, should I pick the one in > > http://xenbits.xen.org/people/ianc/squeeze/ ? (I''m running Debian > > Squeeze on this machine) > > Yes but that''s a bit of a long shot (those packages only put back some > GPU related fixes. > > It would be more useful to test the packages at > http://xenbits.xen.org/people/ianc/2.6.32-27+numa1/ > first.Sorry, you answered that already a few days ago. I had your @hellion.org.uk email whitelisted, but not your @citrix.com (fixed !). I booted this 2.6.32-27+numa1 on my Dell R410 and had a different crash. At least it crashed later, right after the kernel message ''xen_balloon: Initialising balloon driver with page order 0''. Actually Linux just hangs with no information, it does not even answers to sysreqs. Since the hypervisor was still up, I switched to its console and asked for all diagnostics with ''*''. Full log enclosed. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vincent Caron
2010-Dec-02 23:47 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Tue, 2010-11-23 at 22:18 +0000, Ian Campbell wrote:> On Tue, 2010-11-23 at 22:12 +0000, Ian Campbell wrote: > > On Tue, 2010-11-23 at 18:52 +0000, Ian Campbell wrote: > > > On Tue, 2010-11-23 at 11:51 +0000, Ian Campbell wrote: > > > > > > > > Perhaps we should be passing numa_node_id() (e.g. current node) > > > > instead of node 0? > > > > > > I''ve just kicked off a build of the 2.6.32-27 Debian kernel with the > > > following additional patch, I will hopefully post the binaries tomorrow. > > > > Build was quicker than I thought... Vincent, Cris if you get a chance > > please can you test the kernel from: > > http://xenbits.xen.org/people/ianc/2.6.32-27+numa1/ > > Also, please can you try adding "numa=noacpi" to your kernel command > line when running with the standard Debian kernel (not the one above). > > Thanks!It just happens that your kernel above (2.6.32-27+numa1) boots fine under hypervisor _when_ passed ''numa=noacpi''. Yeah ! I then tried again with Debian Squeeze''s latest 2.6.32-28, which crashes as -27 under hypervisor (and changelog show no xen or numa-related thingies). Then I added ''numa=noacpi'', and it boots fine too. I got my 8 cores, networking, etc. Enclosed is the dmesg for the latter, Debian, kernel. Is the ''numa=noacpi'' a "production acceptable" workaround ? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Dec-03 00:12 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On 12/02/2010 03:47 PM, Vincent Caron wrote:> It just happens that your kernel above (2.6.32-27+numa1) boots fine > under hypervisor _when_ passed ''numa=noacpi''. Yeah ! > > I then tried again with Debian Squeeze''s latest 2.6.32-28, which > crashes as -27 under hypervisor (and changelog show no xen or > numa-related thingies). Then I added ''numa=noacpi'', and it boots fine > too. I got my 8 cores, networking, etc. > > Enclosed is the dmesg for the latter, Debian, kernel. > > Is the ''numa=noacpi'' a "production acceptable" workaround ?What about "numa=fake=1"? I think that should force it to create a single NUMA node. IanC: it looks like passing a node id of "-1" is the correct way to say "I don''t care". J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vincent Caron
2010-Dec-03 00:27 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Thu, 2010-12-02 at 16:12 -0800, Jeremy Fitzhardinge wrote:> On 12/02/2010 03:47 PM, Vincent Caron wrote: > > It just happens that your kernel above (2.6.32-27+numa1) boots fine > > under hypervisor _when_ passed ''numa=noacpi''. Yeah ! > > > > I then tried again with Debian Squeeze''s latest 2.6.32-28, which > > crashes as -27 under hypervisor (and changelog show no xen or > > numa-related thingies). Then I added ''numa=noacpi'', and it boots fine > > too. I got my 8 cores, networking, etc. > > > > Enclosed is the dmesg for the latter, Debian, kernel. > > > > Is the ''numa=noacpi'' a "production acceptable" workaround ? > > What about "numa=fake=1"? I think that should force it to create a > single NUMA node.Debian AMD64 2.6.32-28 on hypervisor 4.0.1, option ''numa=fake=1'', boots fine too on my R410. Dmesg attached. PS: in both "numa=noacpi" and "numa=fake=1" cases where I can actually run with a 4.0 hypervisor, I can''t reboot. Dmesg ends with ''Restarting system.'' but the machine never reboots. OTOH reboots works while booting native with the same kernel. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-03 08:51 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Thu, 2010-12-02 at 23:47 +0000, Vincent Caron wrote:> On Tue, 2010-11-23 at 22:18 +0000, Ian Campbell wrote: > > On Tue, 2010-11-23 at 22:12 +0000, Ian Campbell wrote: > > > On Tue, 2010-11-23 at 18:52 +0000, Ian Campbell wrote: > > > > On Tue, 2010-11-23 at 11:51 +0000, Ian Campbell wrote: > > > > > > > > > > Perhaps we should be passing numa_node_id() (e.g. current node) > > > > > instead of node 0? > > > > > > > > I''ve just kicked off a build of the 2.6.32-27 Debian kernel with the > > > > following additional patch, I will hopefully post the binaries tomorrow. > > > > > > Build was quicker than I thought... Vincent, Cris if you get a chance > > > please can you test the kernel from: > > > http://xenbits.xen.org/people/ianc/2.6.32-27+numa1/ > > > > Also, please can you try adding "numa=noacpi" to your kernel command > > line when running with the standard Debian kernel (not the one above). > > > > Thanks! > > It just happens that your kernel above (2.6.32-27+numa1) boots fine > under hypervisor _when_ passed ''numa=noacpi''. Yeah ! > > I then tried again with Debian Squeeze''s latest 2.6.32-28, which > crashes as -27 under hypervisor (and changelog show no xen or > numa-related thingies). Then I added ''numa=noacpi'', and it boots fine > too. I got my 8 cores, networking, etc. > > Enclosed is the dmesg for the latter, Debian, kernel. > > Is the ''numa=noacpi'' a "production acceptable" workaround ?Yes and in fact I think the actual fix is simply to have Xen fake out the behaviour of numa=noacpi as below. I''ll send this plus the other fix out after I''ve given it a bit of proper testing. Ian. xen: disable ACPI NUMA for PV guests Xen does not currently expose PV-NUMA information to PV guests. Therefore disable NUMA for the time being. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 02c710b..5c55e1b 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1178,6 +1178,10 @@ asmlinkage void __init xen_start_kernel(void) xen_smp_init(); +#ifdef CONFIG_ACPI_NUMA + acpi_numa = -1; +#endif + pgd = (pgd_t *)xen_start_info->pt_base; if (!xen_initial_domain()) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-03 08:52 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Fri, 2010-12-03 at 00:12 +0000, Jeremy Fitzhardinge wrote:> What about "numa=fake=1"? I think that should force it to create a > single NUMA node.Is there any advantage to this vs numa=noacpi? Do they effectively end up doing the same thing?> IanC: it looks like passing a node id of "-1" is the correct way to > say "I don''t care".I thought so too but convinced myself from staring at the code that it wouldn''t work in this case -- I''ll double check before I resubmit. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-03 08:54 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Fri, 2010-12-03 at 00:27 +0000, Vincent Caron wrote:> On Thu, 2010-12-02 at 16:12 -0800, Jeremy Fitzhardinge wrote: > > On 12/02/2010 03:47 PM, Vincent Caron wrote: > > > It just happens that your kernel above (2.6.32-27+numa1) boots fine > > > under hypervisor _when_ passed ''numa=noacpi''. Yeah ! > > > > > > I then tried again with Debian Squeeze''s latest 2.6.32-28, which > > > crashes as -27 under hypervisor (and changelog show no xen or > > > numa-related thingies). Then I added ''numa=noacpi'', and it boots fine > > > too. I got my 8 cores, networking, etc. > > > > > > Enclosed is the dmesg for the latter, Debian, kernel. > > > > > > Is the ''numa=noacpi'' a "production acceptable" workaround ? > > > > What about "numa=fake=1"? I think that should force it to create a > > single NUMA node. > > Debian AMD64 2.6.32-28 on hypervisor 4.0.1, option ''numa=fake=1'', > boots fine too on my R410. Dmesg attached.Thanks.> PS: in both "numa=noacpi" and "numa=fake=1" cases where I can actually > run with a 4.0 hypervisor, I can''t reboot. Dmesg ends with ''Restarting > system.'' but the machine never reboots. OTOH reboots works while booting > native with the same kernel.That was a known bug in 2.6.32.<something-recent>. AIUI a revert is in the pipeline for the next 2.6.32.y and Jeremy has a proper fix queued too, which I presume will make its way to stable in due course. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-03 09:20 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Fri, 2010-12-03 at 08:52 +0000, Ian Campbell wrote:> On Fri, 2010-12-03 at 00:12 +0000, Jeremy Fitzhardinge wrote: > > What about "numa=fake=1"? I think that should force it to create a > > single NUMA node. > > Is there any advantage to this vs numa=noacpi? Do they effectively end > up doing the same thing?The help text for CONFIG_NUMA_EMU (which enables numa=fake=N) says: Enable NUMA emulation. A flat machine will be split into virtual nodes when booted with "numa=fake=N", where N is the number of nodes. This is only useful for debugging. I don''t think this is the option we want. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-03 09:49 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Fri, 2010-12-03 at 08:52 +0000, Ian Campbell wrote:> > > IanC: it looks like passing a node id of "-1" is the correct way to > > say "I don''t care". > > I thought so too but convinced myself from staring at the code that it > wouldn''t work in this case -- I''ll double check before I resubmit.I was mislead because node gets placed verbatim into desc->node which at first glance seemed invalid but on second look it seems to be fine, everything which reads that value is prepared to accept the -1. There is precedent for using -1 in hpet_assign_irq, although most other places seem to use numa_node_id(). I''ve tested with -1 and it seems ok. I see (too late) that you''ve already made the change so I guess you knew this ;-) Are you going to put this change into next-2.6.32 as well as 37? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Dec-03 09:54 UTC
[Xen-devel] [PATCH] xen: disable ACPI NUMA for PV guests
Xen does not currently expose PV-NUMA information to PV guests. Therefore disable NUMA for the time being to prevent the kernel picking up on an host-level NUMA information which it might come across in the firmware. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> --- arch/x86/xen/enlighten.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index fbb35cd..7cdf2f3 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1195,6 +1195,10 @@ asmlinkage void __init xen_start_kernel(void) xen_smp_init(); +#ifdef CONFIG_ACPI_NUMA + acpi_numa = -1; +#endif + pgd = (pgd_t *)xen_start_info->pt_base; __supported_pte_mask |= _PAGE_IOMAP; -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vincent Caron
2010-Dec-03 10:30 UTC
Re: [Xen-devel] PVops domain 0 crash on NUMA system only Node==1 present (Was: Re: Bug#603632: linux-image-2.6.32-5-xen-amd64: Linux kernel 2.6.32/xen/amd64 booting fine on bare metal, but not as dom0 with Xen 4.0.1 (Dell R410))
On Fri, 2010-12-03 at 08:51 +0000, Ian Campbell wrote:> On Thu, 2010-12-02 at 23:47 +0000, Vincent Caron wrote: > > On Tue, 2010-11-23 at 22:18 +0000, Ian Campbell wrote: > > > On Tue, 2010-11-23 at 22:12 +0000, Ian Campbell wrote: > > > > On Tue, 2010-11-23 at 18:52 +0000, Ian Campbell wrote: > > > > > On Tue, 2010-11-23 at 11:51 +0000, Ian Campbell wrote: > > > > > > > > > > > > Perhaps we should be passing numa_node_id() (e.g. current node) > > > > > > instead of node 0? > > > > > > > > > > I''ve just kicked off a build of the 2.6.32-27 Debian kernel with the > > > > > following additional patch, I will hopefully post the binaries tomorrow. > > > > > > > > Build was quicker than I thought... Vincent, Cris if you get a chance > > > > please can you test the kernel from: > > > > http://xenbits.xen.org/people/ianc/2.6.32-27+numa1/ > > > > > > Also, please can you try adding "numa=noacpi" to your kernel command > > > line when running with the standard Debian kernel (not the one above). > > > > > > Thanks! > > > > It just happens that your kernel above (2.6.32-27+numa1) boots fine > > under hypervisor _when_ passed ''numa=noacpi''. Yeah ! > > > > I then tried again with Debian Squeeze''s latest 2.6.32-28, which > > crashes as -27 under hypervisor (and changelog show no xen or > > numa-related thingies). Then I added ''numa=noacpi'', and it boots fine > > too. I got my 8 cores, networking, etc. > > > > Enclosed is the dmesg for the latter, Debian, kernel. > > > > Is the ''numa=noacpi'' a "production acceptable" workaround ? > > Yes and in fact I think the actual fix is simply to have Xen fake out > the behaviour of numa=noacpi as below. I''ll send this plus the other fix > out after I''ve given it a bit of proper testing.OK, I''ll follow the patch until it makes it to Squeeze''s kernel. Thanks a lot for your help and fix ! Now going to stress some Xen 4.0 domains... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vincent Caron
2010-Dec-13 22:17 UTC
[Xen-devel] Re: [PATCH] xen: disable ACPI NUMA for PV guests
On Fri, 2010-12-03 at 09:54 +0000, Ian Campbell wrote:> Xen does not currently expose PV-NUMA information to PV > guests. Therefore disable NUMA for the time being to prevent the > kernel picking up on an host-level NUMA information which it might > come across in the firmware. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > --- > arch/x86/xen/enlighten.c | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index fbb35cd..7cdf2f3 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -1195,6 +1195,10 @@ asmlinkage void __init xen_start_kernel(void) > > xen_smp_init(); > > +#ifdef CONFIG_ACPI_NUMA > + acpi_numa = -1; > +#endif > + > pgd = (pgd_t *)xen_start_info->pt_base; > > __supported_pte_mask |= _PAGE_IOMAP;This made it thru 2.6.32-29 in Debian on 10 dec and works like a charm, Dell R410 properly booting with hypervisor Xen 4.0.1. Thanks a lot for your help and responsiveness. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel