Hi, we see Dom0 crashes due to the kernel detecting the NUMA topology not by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA). This will detect the actual NUMA config of the physical machine, but will crash about the mismatch with Dom0''s virtual memory. Variation of the theme: Dom0 sees what it''s not supposed to see. This happens with the said config option enabled and on a machine where this scanning is still enabled (K8 and Fam10h, not Bulldozer class) We have this dump then: [ 0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10 [ 0.000000] Scanning NUMA topology in Northbridge 24 [ 0.000000] Number of physical nodes 4 [ 0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000 [ 0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000 [ 0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000 [ 0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000 [ 0.000000] Initmem setup node 0 0000000000000000-0000000040000000 [ 0.000000] NODE_DATA [000000003ffd9000 - 000000003fffffff] [ 0.000000] Initmem setup node 1 0000000040000000-0000000138000000 [ 0.000000] NODE_DATA [0000000137fd9000 - 0000000137ffffff] [ 0.000000] Initmem setup node 2 0000000138000000-00000001f8000000 [ 0.000000] NODE_DATA [00000001f095e000 - 00000001f0984fff] [ 0.000000] Initmem setup node 3 00000001f8000000-0000000238000000 [ 0.000000] Cannot find 159744 bytes in node 3 [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null) [ 0.000000] IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 [ 0.000000] PGD 0 [ 0.000000] Oops: 0000 [#1] SMP [ 0.000000] CPU 0 [ 0.000000] Modules linked in: [ 0.000000] [ 0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar [ 0.000000] RIP: e030:[<ffffffff81d220e6>] [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 [ 0.000000] RSP: e02b:ffffffff81c01de8 EFLAGS: 00010046 [ 0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 0000000000000000 [ 0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: 0000000000000000 [ 0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: 0000000000000000 [ 0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: 0000000000000000 [ 0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: 0000000000000003 [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81ced000(0000) knlGS:0000000000000000 [ 0.000000] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: 0000000000000660 [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0d020) [ 0.000000] Stack: [ 0.000000] 00000000000000c0 0000000000000003 0000000000000000 000000000000003f [ 0.000000] ffffffff81c01e68 ffffffff81d23024 0000000000400000 0000000000000002 [ 0.000000] 0000000000080000 ffff8801f055e000 ffff8801f055e1f8 0000000000000000 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff81d23024>] sparse_early_usemaps_alloc_node+0x64/0x178 [ 0.000000] [<ffffffff81d23348>] sparse_init+0xe4/0x25a [ 0.000000] [<ffffffff81d16840>] paging_init+0x13/0x22 [ 0.000000] [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b [ 0.000000] [<ffffffff81683954>] ? printk+0x3c/0x3e [ 0.000000] [<ffffffff81d01a38>] start_kernel+0xe5/0x468 [ 0.000000] [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1 [ 0.000000] [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36 [ 0.000000] [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c [ 0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59 be 2a 01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f <41> 8b bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de [ 0.000000] RIP [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 [ 0.000000] RSP <ffffffff81c01de8> [ 0.000000] CR2: 0000000000000000 [ 0.000000] ---[ end trace a7919e7f17c0a725 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. The obvious solution would be to explicitly deny northbridge scanning when running as Dom0, though I am not sure how to implement this without upsetting the other kernel folks about "that crappy Xen thing" again ;-) Could someone propose a fix for this (I am OoO for the next two weeks). Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany
Konrad Rzeszutek Wilk
2012-Aug-03 12:36 UTC
Re: Dom0 crash with old style AMD NUMA detection
On Fri, Aug 03, 2012 at 02:20:31PM +0200, Andre Przywara wrote:> Hi, > > we see Dom0 crashes due to the kernel detecting the NUMA topology not by > ACPI, but directly from the northbridge (CONFIG_AMD_NUMA). > > This will detect the actual NUMA config of the physical machine, but > will crash about the mismatch with Dom0''s virtual memory. Variation of > the theme: Dom0 sees what it''s not supposed to see. > > This happens with the said config option enabled and on a machine where > this scanning is still enabled (K8 and Fam10h, not Bulldozer class) > > We have this dump then: > [ 0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1 > distance=10 > [ 0.000000] Scanning NUMA topology in Northbridge 24 > [ 0.000000] Number of physical nodes 4 > [ 0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000 > [ 0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000 > [ 0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000 > [ 0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000 > [ 0.000000] Initmem setup node 0 0000000000000000-0000000040000000 > [ 0.000000] NODE_DATA [000000003ffd9000 - 000000003fffffff] > [ 0.000000] Initmem setup node 1 0000000040000000-0000000138000000 > [ 0.000000] NODE_DATA [0000000137fd9000 - 0000000137ffffff] > [ 0.000000] Initmem setup node 2 0000000138000000-00000001f8000000 > [ 0.000000] NODE_DATA [00000001f095e000 - 00000001f0984fff] > [ 0.000000] Initmem setup node 3 00000001f8000000-0000000238000000 > [ 0.000000] Cannot find 159744 bytes in node 3 > [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at > (null) > [ 0.000000] IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 > [ 0.000000] PGD 0 > [ 0.000000] Oops: 0000 [#1] SMP > [ 0.000000] CPU 0 > [ 0.000000] Modules linked in: > [ 0.000000] > [ 0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar > [ 0.000000] RIP: e030:[<ffffffff81d220e6>] [<ffffffff81d220e6>] > __alloc_bootmem_node+0x43/0x96 > [ 0.000000] RSP: e02b:ffffffff81c01de8 EFLAGS: 00010046 > [ 0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: > 0000000000000000 > [ 0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: > 0000000000000000 > [ 0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: > 0000000000000000 > [ 0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: > 0000000000000000 > [ 0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: > 0000000000000003 > [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81ced000(0000) > knlGS:0000000000000000 > [ 0.000000] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: > 0000000000000660 > [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: > 0000000000000000 > [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, > task ffffffff81c0d020) > [ 0.000000] Stack: > [ 0.000000] 00000000000000c0 0000000000000003 0000000000000000 > 000000000000003f > [ 0.000000] ffffffff81c01e68 ffffffff81d23024 0000000000400000 > 0000000000000002 > [ 0.000000] 0000000000080000 ffff8801f055e000 ffff8801f055e1f8 > 0000000000000000 > [ 0.000000] Call Trace: > [ 0.000000] [<ffffffff81d23024>] > sparse_early_usemaps_alloc_node+0x64/0x178 > [ 0.000000] [<ffffffff81d23348>] sparse_init+0xe4/0x25a > [ 0.000000] [<ffffffff81d16840>] paging_init+0x13/0x22 > [ 0.000000] [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b > [ 0.000000] [<ffffffff81683954>] ? printk+0x3c/0x3e > [ 0.000000] [<ffffffff81d01a38>] start_kernel+0xe5/0x468 > [ 0.000000] [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1 > [ 0.000000] [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36 > [ 0.000000] [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c > [ 0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59 > be 2a > 01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f > <41> 8b > bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de > [ 0.000000] RIP [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 > [ 0.000000] RSP <ffffffff81c01de8> > [ 0.000000] CR2: 0000000000000000 > [ 0.000000] ---[ end trace a7919e7f17c0a725 ]--- > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. > > > > The obvious solution would be to explicitly deny northbridge scanning > when running as Dom0, though I am not sure how to implement this without > upsetting the other kernel folks about "that crappy Xen thing" again ;-)Heh. Is there a numa=0 option that could be used to override it to turn it off?> > Could someone propose a fix for this (I am OoO for the next two weeks). > > Regards, > Andre. > > -- > Andre Przywara > AMD-Operating System Research Center (OSRC), Dresden, Germany > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Aug-17 14:22 UTC
Re: Dom0 crash with old style AMD NUMA detection
On Fri, Aug 03, 2012 at 08:36:28AM -0400, Konrad Rzeszutek Wilk wrote:> On Fri, Aug 03, 2012 at 02:20:31PM +0200, Andre Przywara wrote: > > Hi, > > > > we see Dom0 crashes due to the kernel detecting the NUMA topology not by > > ACPI, but directly from the northbridge (CONFIG_AMD_NUMA). > > > > This will detect the actual NUMA config of the physical machine, but > > will crash about the mismatch with Dom0''s virtual memory. Variation of > > the theme: Dom0 sees what it''s not supposed to see. > > > > This happens with the said config option enabled and on a machine where > > this scanning is still enabled (K8 and Fam10h, not Bulldozer class) > > > > We have this dump then: > > [ 0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1 > > distance=10 > > [ 0.000000] Scanning NUMA topology in Northbridge 24 > > [ 0.000000] Number of physical nodes 4 > > [ 0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000 > > [ 0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000 > > [ 0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000 > > [ 0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000 > > [ 0.000000] Initmem setup node 0 0000000000000000-0000000040000000 > > [ 0.000000] NODE_DATA [000000003ffd9000 - 000000003fffffff] > > [ 0.000000] Initmem setup node 1 0000000040000000-0000000138000000 > > [ 0.000000] NODE_DATA [0000000137fd9000 - 0000000137ffffff] > > [ 0.000000] Initmem setup node 2 0000000138000000-00000001f8000000 > > [ 0.000000] NODE_DATA [00000001f095e000 - 00000001f0984fff] > > [ 0.000000] Initmem setup node 3 00000001f8000000-0000000238000000 > > [ 0.000000] Cannot find 159744 bytes in node 3 > > [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at > > (null) > > [ 0.000000] IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 > > [ 0.000000] PGD 0 > > [ 0.000000] Oops: 0000 [#1] SMP > > [ 0.000000] CPU 0 > > [ 0.000000] Modules linked in: > > [ 0.000000] > > [ 0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar > > [ 0.000000] RIP: e030:[<ffffffff81d220e6>] [<ffffffff81d220e6>] > > __alloc_bootmem_node+0x43/0x96 > > [ 0.000000] RSP: e02b:ffffffff81c01de8 EFLAGS: 00010046 > > [ 0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: > > 0000000000000000 > > [ 0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: > > 0000000000000000 > > [ 0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: > > 0000000000000000 > > [ 0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: > > 0000000000000000 > > [ 0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: > > 0000000000000003 > > [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81ced000(0000) > > knlGS:0000000000000000 > > [ 0.000000] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: > > 0000000000000660 > > [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 > > [ 0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: > > 0000000000000000 > > [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, > > task ffffffff81c0d020) > > [ 0.000000] Stack: > > [ 0.000000] 00000000000000c0 0000000000000003 0000000000000000 > > 000000000000003f > > [ 0.000000] ffffffff81c01e68 ffffffff81d23024 0000000000400000 > > 0000000000000002 > > [ 0.000000] 0000000000080000 ffff8801f055e000 ffff8801f055e1f8 > > 0000000000000000 > > [ 0.000000] Call Trace: > > [ 0.000000] [<ffffffff81d23024>] > > sparse_early_usemaps_alloc_node+0x64/0x178 > > [ 0.000000] [<ffffffff81d23348>] sparse_init+0xe4/0x25a > > [ 0.000000] [<ffffffff81d16840>] paging_init+0x13/0x22 > > [ 0.000000] [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b > > [ 0.000000] [<ffffffff81683954>] ? printk+0x3c/0x3e > > [ 0.000000] [<ffffffff81d01a38>] start_kernel+0xe5/0x468 > > [ 0.000000] [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1 > > [ 0.000000] [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36 > > [ 0.000000] [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c > > [ 0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59 > > be 2a > > 01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f > > <41> 8b > > bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de > > [ 0.000000] RIP [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 > > [ 0.000000] RSP <ffffffff81c01de8> > > [ 0.000000] CR2: 0000000000000000 > > [ 0.000000] ---[ end trace a7919e7f17c0a725 ]--- > > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > > (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. > > > > > > > > The obvious solution would be to explicitly deny northbridge scanning > > when running as Dom0, though I am not sure how to implement this without > > upsetting the other kernel folks about "that crappy Xen thing" again ;-) > > Heh. > Is there a numa=0 option that could be used to override it to turn it > off?Not compile tested.. but was thinking something like this: diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 43fd630..838cc1f 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -17,6 +17,7 @@ #include <asm/e820.h> #include <asm/setup.h> #include <asm/acpi.h> +#include <asm/numa.h> #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> @@ -528,4 +529,7 @@ void __init xen_arch_setup(void) disable_cpufreq(); WARN_ON(set_pm_idle_to_default()); fiddle_vdso(); +#ifdef CONFIG_NUMA + numa_off = 1; +#endif }
Konrad Rzeszutek Wilk
2012-Sep-14 18:58 UTC
Re: Dom0 crash with old style AMD NUMA detection
> > > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > > > (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. > > > > > > > > > > > > The obvious solution would be to explicitly deny northbridge scanning > > > when running as Dom0, though I am not sure how to implement this without > > > upsetting the other kernel folks about "that crappy Xen thing" again ;-) > > > > Heh. > > Is there a numa=0 option that could be used to override it to turn it > > off? > > Not compile tested.. but was thinking something like this:ping?> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index 43fd630..838cc1f 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -17,6 +17,7 @@ > #include <asm/e820.h> > #include <asm/setup.h> > #include <asm/acpi.h> > +#include <asm/numa.h> > #include <asm/xen/hypervisor.h> > #include <asm/xen/hypercall.h> > > @@ -528,4 +529,7 @@ void __init xen_arch_setup(void) > disable_cpufreq(); > WARN_ON(set_pm_idle_to_default()); > fiddle_vdso(); > +#ifdef CONFIG_NUMA > + numa_off = 1; > +#endif > }
On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:>>>> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! >>>> (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. >>>> >>>> >>>> >>>> The obvious solution would be to explicitly deny northbridge scanning >>>> when running as Dom0, though I am not sure how to implement this without >>>> upsetting the other kernel folks about "that crappy Xen thing" again ;-) >>> >>> Heh. >>> Is there a numa=0 option that could be used to override it to turn it >>> off? >> >> Not compile tested.. but was thinking something like this: > > ping?That looks good to me - at least for the time being. I just want to check how this interacts with upcoming Dom0 NUMA support. It wouldn''t be too clever if we deliberately disable NUMA and future Xen version will allow us to use it. So let me check if I can confine this turn-off to the fallback K8 northbridge reading. Thanks, Andre.>> >> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c >> index 43fd630..838cc1f 100644 >> --- a/arch/x86/xen/setup.c >> +++ b/arch/x86/xen/setup.c >> @@ -17,6 +17,7 @@ >> #include <asm/e820.h> >> #include <asm/setup.h> >> #include <asm/acpi.h> >> +#include <asm/numa.h> >> #include <asm/xen/hypervisor.h> >> #include <asm/xen/hypercall.h> >> >> @@ -528,4 +529,7 @@ void __init xen_arch_setup(void) >> disable_cpufreq(); >> WARN_ON(set_pm_idle_to_default()); >> fiddle_vdso(); >> +#ifdef CONFIG_NUMA >> + numa_off = 1; >> +#endif >> } >-- Andre Przywara AMD-OSRC (Dresden) Tel: x29712
Konrad Rzeszutek Wilk
2012-Sep-17 19:14 UTC
Re: Dom0 crash with old style AMD NUMA detection
On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote:> On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote: > >>>>[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > >>>>(XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. > >>>> > >>>> > >>>> > >>>>The obvious solution would be to explicitly deny northbridge scanning > >>>>when running as Dom0, though I am not sure how to implement this without > >>>>upsetting the other kernel folks about "that crappy Xen thing" again ;-) > >>> > >>>Heh. > >>>Is there a numa=0 option that could be used to override it to turn it > >>>off? > >> > >>Not compile tested.. but was thinking something like this: > > > >ping? > > That looks good to me - at least for the time being.OK, can I''ve your Tested-by/Acked-by on it pls?> I just want to check how this interacts with upcoming Dom0 NUMA > support. It wouldn''t be too clever if we deliberately disable NUMAWe can always revert this patch in future versions of Linux.> and future Xen version will allow us to use it. So let me check if I > can confine this turn-off to the fallback K8 northbridge reading.This potentially could work, but I would prefer to not do it for 3.6. diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index a4790bf..b4edce4 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -17,6 +17,7 @@ #include <asm/e820.h> #include <asm/setup.h> #include <asm/acpi.h> +#include <asm/numa.h> #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> @@ -483,7 +484,32 @@ void __cpuinit xen_enable_sysenter(void) if(ret != 0) setup_clear_cpu_cap(sysenter_feature); } +#ifdef CONFIG_AMD_NUMA +int __cpuinit xen_amd_k8(void) +{ + int num; + + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) + return -ENOENT; + + for (num = 0; num < 32; num++) { + u32 header; + + header = read_pci_config(0, num, 0, 0x00); + if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)) && + header != (PCI_VENDOR_ID_AMD | (0x1200<<16)) && + header != (PCI_VENDOR_ID_AMD | (0x1300<<16))) + continue; + header = read_pci_config(0, num, 1, 0x00); + if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)) && + header != (PCI_VENDOR_ID_AMD | (0x1201<<16)) && + header != (PCI_VENDOR_ID_AMD | (0x1301<<16))) + continue; + return num; + } + return -ENOENT; +#endif void __cpuinit xen_enable_syscall(void) { #ifdef CONFIG_X86_64 @@ -542,4 +568,8 @@ void __init xen_arch_setup(void) disable_cpufreq(); WARN_ON(set_pm_idle_to_default()); fiddle_vdso(); +#ifdef CONFIG_AMD_NUMA + if (xen_amd_k8() >= 0) + numa_off=1; +#endif }
On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote:> On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote: >> On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote: >>>>>> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! >>>>>> (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. >>>>>> >>>>>> >>>>>> >>>>>> The obvious solution would be to explicitly deny northbridge scanning >>>>>> when running as Dom0, though I am not sure how to implement this without >>>>>> upsetting the other kernel folks about "that crappy Xen thing" again ;-) >>>>> >>>>> Heh. >>>>> Is there a numa=0 option that could be used to override it to turn it >>>>> off? >>>> >>>> Not compile tested.. but was thinking something like this: >>> >>> ping? >> >> That looks good to me - at least for the time being. > > OK, can I''ve your Tested-by/Acked-by on it pls? > >> I just want to check how this interacts with upcoming Dom0 NUMA >> support. It wouldn''t be too clever if we deliberately disable NUMA > > We can always revert this patch in future versions of Linux.I don''t like this idea. Then we have Linux kernel up to 3.5 working and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That would be pretty unfortunate. I haven''t checked back with Dario, but I''d suspect that we use ACPI for injecting NUMA topology into Dom0. Even if not, a general "numa=off" for Dom0 is too much of a sledgehammer for me.>> and future Xen version will allow us to use it. So let me check if I >> can confine this turn-off to the fallback K8 northbridge reading. > > This potentially could work, but I would prefer to not do it for 3.6.Mmh, I don''t get the idea of your patch below. One can always read the NUMA topology from the AMD northbridge, but this is deprecated if favor of ACPI. The amdtopology.c stuff was only there to enable NUMA for very early Opterons, where BIOSes didn''t provide (sane) SRAT tables. Though we disallow ACPI for NUMA on Dom0, this northbridge scanning unfortunately "shines through" the virtualization, actually revealing the system''s NUMA topology, which is usually much different from Dom0''s one. So instead I want to do more something like this: diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h index bfacd2c..7811c0d 100644 --- a/arch/x86/include/asm/numa.h +++ b/arch/x86/include/asm/numa.h @@ -20,6 +20,8 @@ extern int numa_off; +extern bool deny_amd_nb_numa_scan; + /* * __apicid_to_node[] stores the raw mapping between physical apicid and * node and is used to initialize cpu_to_node mapping. diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c index 5247d01..f223a67 100644 --- a/arch/x86/mm/amdtopology.c +++ b/arch/x86/mm/amdtopology.c @@ -29,6 +29,8 @@ static unsigned char __initdata nodeids[8]; +bool deny_amd_nb_numa_scan = 0; + static __init int find_northbridge(void) { int num; @@ -78,6 +80,9 @@ int __init amd_numa_init(void) u32 nodeid, reg; unsigned int bits, cores, apicid_base; + if (deny_amd_nb_numa_scan) + return -ENOENT; + if (!early_pci_allowed()) return -EINVAL; diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index d11ca11..6db63c0 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -532,6 +532,8 @@ void __init xen_arch_setup(void) } #endif + deny_amd_nb_numa_scan = 1; + memcpy(boot_command_line, xen_start_info->cmd_line, MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ? COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE); This would just turn off this one kind of NUMA discovery for Dom0. The patch is admittedly a bit rough (not sure about the proper placement into #ifdef''s, for instance) and not well tested yet. Also one could think about using a more general variable name to cover other hardware things in the future that Dom0 shouldn''t use. So this isn''t something still for 3.6, probably not even for 3.7. What about if we drop the patch for this problem at all for 3.6 and recommend "numa=off" as a workaround? This is much less sticky than a kernel patch and could appear in the Xen wiki, for instance. After all this isn''t a strict regression (appears with every 3.x kernel, AFAICT). Most of the time the northbridge scanning will yield bogus results, so the kernel eventually discards it, but sometimes it seems to slip through and causes trouble. Also it does not trigger on newer (Bulldozer) class CPUs, since we deliberately avoided adding the new northbridge PCI-ID for this routine. Regards, Andre.> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index a4790bf..b4edce4 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -17,6 +17,7 @@ > #include <asm/e820.h> > #include <asm/setup.h> > #include <asm/acpi.h> > +#include <asm/numa.h> > #include <asm/xen/hypervisor.h> > #include <asm/xen/hypercall.h> > > @@ -483,7 +484,32 @@ void __cpuinit xen_enable_sysenter(void) > if(ret != 0) > setup_clear_cpu_cap(sysenter_feature); > } > +#ifdef CONFIG_AMD_NUMA > +int __cpuinit xen_amd_k8(void) > +{ > + int num; > + > + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) > + return -ENOENT; > + > + for (num = 0; num < 32; num++) { > + u32 header; > + > + header = read_pci_config(0, num, 0, 0x00); > + if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)) && > + header != (PCI_VENDOR_ID_AMD | (0x1200<<16)) && > + header != (PCI_VENDOR_ID_AMD | (0x1300<<16))) > + continue; > > + header = read_pci_config(0, num, 1, 0x00); > + if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)) && > + header != (PCI_VENDOR_ID_AMD | (0x1201<<16)) && > + header != (PCI_VENDOR_ID_AMD | (0x1301<<16))) > + continue; > + return num; > + } > + return -ENOENT; > +#endif > void __cpuinit xen_enable_syscall(void) > { > #ifdef CONFIG_X86_64 > @@ -542,4 +568,8 @@ void __init xen_arch_setup(void) > disable_cpufreq(); > WARN_ON(set_pm_idle_to_default()); > fiddle_vdso(); > +#ifdef CONFIG_AMD_NUMA > + if (xen_amd_k8() >= 0) > + numa_off=1; > +#endif > } >-- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany
Konrad Rzeszutek Wilk
2012-Sep-18 13:44 UTC
Re: Dom0 crash with old style AMD NUMA detection
On Tue, Sep 18, 2012 at 11:57:33AM +0200, Andre Przywara wrote:> On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote: > >On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote: > >>On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote: > >>>>>>[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > >>>>>>(XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. > >>>>>> > >>>>>> > >>>>>> > >>>>>>The obvious solution would be to explicitly deny northbridge scanning > >>>>>>when running as Dom0, though I am not sure how to implement this without > >>>>>>upsetting the other kernel folks about "that crappy Xen thing" again ;-) > >>>>> > >>>>>Heh. > >>>>>Is there a numa=0 option that could be used to override it to turn it > >>>>>off? > >>>> > >>>>Not compile tested.. but was thinking something like this: > >>> > >>>ping? > >> > >>That looks good to me - at least for the time being. > > > >OK, can I''ve your Tested-by/Acked-by on it pls? > > > >>I just want to check how this interacts with upcoming Dom0 NUMA > >>support. It wouldn''t be too clever if we deliberately disable NUMA > > > >We can always revert this patch in future versions of Linux. > > I don''t like this idea. Then we have Linux kernel up to 3.5 working > and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That > would be pretty unfortunate.Huh? v3.5 working? But it never worked? I would say turn off the NUMA detection (keep in mind it still will set up the dummy NUMA stuff) until there are some PV NUMA capability and then we can revert it.> > I haven''t checked back with Dario, but I''d suspect that we use ACPI > for injecting NUMA topology into Dom0. Even if not, a general > "numa=off" for Dom0 is too much of a sledgehammer for me.How would you inject it in Dom0? It s a PV guest so the hypervisor would have to tweak the SRAT/SLIT tables. That is not going to happen in the very short term.. And I don''t recall seeing any patches, so the dom0 NUMA support is right now non-existent?> > >>and future Xen version will allow us to use it. So let me check if I > >>can confine this turn-off to the fallback K8 northbridge reading. > > > >This potentially could work, but I would prefer to not do it for 3.6. > > Mmh, I don''t get the idea of your patch below. One can always read > the NUMA topology from the AMD northbridge, but this is deprecated > if favor of ACPI. The amdtopology.c stuff was only there to enable > NUMA for very early Opterons, where BIOSes didn''t provide (sane) > SRAT tables. > Though we disallow ACPI for NUMA on Dom0, this northbridge scanning > unfortunately "shines through" the virtualization, actually > revealing the system''s NUMA topology, which is usually much > different from Dom0''s one.Right, but isn''t that what you found broke? It wasn''t ACPI NUMA but the old-style K8 northbridge information? That is what we are trying to fix.> > So instead I want to do more something like this: > > diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h > index bfacd2c..7811c0d 100644 > --- a/arch/x86/include/asm/numa.h > +++ b/arch/x86/include/asm/numa.h > @@ -20,6 +20,8 @@ > > extern int numa_off; > > +extern bool deny_amd_nb_numa_scan; > + > /* > * __apicid_to_node[] stores the raw mapping between physical apicid and > * node and is used to initialize cpu_to_node mapping. > diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c > index 5247d01..f223a67 100644 > --- a/arch/x86/mm/amdtopology.c > +++ b/arch/x86/mm/amdtopology.c > @@ -29,6 +29,8 @@ > > static unsigned char __initdata nodeids[8]; > > +bool deny_amd_nb_numa_scan = 0; > + > static __init int find_northbridge(void) > { > int num; > @@ -78,6 +80,9 @@ int __init amd_numa_init(void) > u32 nodeid, reg; > unsigned int bits, cores, apicid_base; > > + if (deny_amd_nb_numa_scan) > + return -ENOENT; > + > if (!early_pci_allowed()) > return -EINVAL; > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index d11ca11..6db63c0 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -532,6 +532,8 @@ void __init xen_arch_setup(void) > } > #endif > > + deny_amd_nb_numa_scan = 1; > + > memcpy(boot_command_line, xen_start_info->cmd_line, > MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ? > COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE); > > This would just turn off this one kind of NUMA discovery for Dom0. > The patch is admittedly a bit rough (not sure about the proper > placement into #ifdef''s, for instance) and not well tested yet. > Also one could think about using a more general variable name to > cover other hardware things in the future that Dom0 shouldn''t use. > So this isn''t something still for 3.6, probably not even for 3.7. > > What about if we drop the patch for this problem at all for 3.6 and > recommend "numa=off" as a workaround? This is much less sticky than > a kernel patch and could appear in the Xen wiki, for instance.I hate workarounds. People end up using them forever and they get codified.> After all this isn''t a strict regression (appears with every 3.x > kernel, AFAICT). > Most of the time the northbridge scanning will yield bogus results, > so the kernel eventually discards it, but sometimes it seems to slip > through and causes trouble. > Also it does not trigger on newer (Bulldozer) class CPUs, since we > deliberately avoided adding the new northbridge PCI-ID for this > routine.Right, you end up using the ACPI NUMA in them.
Konrad Rzeszutek Wilk
2012-Sep-18 14:55 UTC
Re: Dom0 crash with old style AMD NUMA detection
On Tue, Sep 18, 2012 at 06:50:14PM +0200, Andre Przywara wrote:> On 09/18/2012 03:44 PM, Konrad Rzeszutek Wilk wrote: > >On Tue, Sep 18, 2012 at 11:57:33AM +0200, Andre Przywara wrote: > >>On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote: > >>>On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote: > >>>>On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote: > >>>>>>>>[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > >>>>>>>>(XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>The obvious solution would be to explicitly deny northbridge scanning > >>>>>>>>when running as Dom0, though I am not sure how to implement this without > >>>>>>>>upsetting the other kernel folks about "that crappy Xen thing" again ;-) > >>>>>>> > >>>>>>>Heh. > >>>>>>>Is there a numa=0 option that could be used to override it to turn it > >>>>>>>off? > >>>>>> > >>>>>>Not compile tested.. but was thinking something like this: > >>>>> > >>>>>ping? > >>>> > >>>>That looks good to me - at least for the time being. > >>> > >>>OK, can I''ve your Tested-by/Acked-by on it pls? > >>> > >>>>I just want to check how this interacts with upcoming Dom0 NUMA > >>>>support. It wouldn''t be too clever if we deliberately disable NUMA > >>> > >>>We can always revert this patch in future versions of Linux. > >> > >>I don''t like this idea. Then we have Linux kernel up to 3.5 working > >>and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That > >>would be pretty unfortunate. > > > >Huh? v3.5 working? But it never worked? I would say turn off the NUMA > >detection (keep in mind it still will set up the dummy NUMA stuff) > >until there are some PV NUMA capability and then we can revert it. > > I was under the impression that somehow the Dom0 NUMA would be made > compatible, using some of the existing discovery mechanisms. So we > would enable the hypervisor, and Dom0 would just magically start > working. I am probably rooted too much in the HVM world ;-) > > >> > >>I haven''t checked back with Dario, but I''d suspect that we use ACPI > >>for injecting NUMA topology into Dom0. Even if not, a general > >>"numa=off" for Dom0 is too much of a sledgehammer for me. > > > >How would you inject it in Dom0? It s a PV guest so the hypervisor would > >have to tweak the SRAT/SLIT tables. That is not going to happen > >in the very short term.. And I don''t recall seeing any patches, so > >the dom0 NUMA support is right now non-existent? > > Right, I just don''t wanted to slam the door deliberately. Thinking > more about this, we probably need some kind of PV enablement in > Dom0, even if we could somehow use the ACPI tables (and thus the > ACPI parsing code). > If this is the case, we could at the same time remove this "force > numa off" patch. > > I am almost convinced by now. > Just waiting for Dario''s opinion for a few more hours and will send > my final opinion later today. If you cannot wait, tell me.Couple of days is OK with me. My deadline is Friday as I would like to send a git pull to Linus and include this patch if it makes sense.
On 09/18/2012 03:44 PM, Konrad Rzeszutek Wilk wrote:> On Tue, Sep 18, 2012 at 11:57:33AM +0200, Andre Przywara wrote: >> On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote: >>> On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote: >>>> On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote: >>>>>>>> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! >>>>>>>> (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The obvious solution would be to explicitly deny northbridge scanning >>>>>>>> when running as Dom0, though I am not sure how to implement this without >>>>>>>> upsetting the other kernel folks about "that crappy Xen thing" again ;-) >>>>>>> >>>>>>> Heh. >>>>>>> Is there a numa=0 option that could be used to override it to turn it >>>>>>> off? >>>>>> >>>>>> Not compile tested.. but was thinking something like this: >>>>> >>>>> ping? >>>> >>>> That looks good to me - at least for the time being. >>> >>> OK, can I''ve your Tested-by/Acked-by on it pls? >>> >>>> I just want to check how this interacts with upcoming Dom0 NUMA >>>> support. It wouldn''t be too clever if we deliberately disable NUMA >>> >>> We can always revert this patch in future versions of Linux. >> >> I don''t like this idea. Then we have Linux kernel up to 3.5 working >> and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That >> would be pretty unfortunate. > > Huh? v3.5 working? But it never worked? I would say turn off the NUMA > detection (keep in mind it still will set up the dummy NUMA stuff) > until there are some PV NUMA capability and then we can revert it.I was under the impression that somehow the Dom0 NUMA would be made compatible, using some of the existing discovery mechanisms. So we would enable the hypervisor, and Dom0 would just magically start working. I am probably rooted too much in the HVM world ;-)>> >> I haven''t checked back with Dario, but I''d suspect that we use ACPI >> for injecting NUMA topology into Dom0. Even if not, a general >> "numa=off" for Dom0 is too much of a sledgehammer for me. > > How would you inject it in Dom0? It s a PV guest so the hypervisor would > have to tweak the SRAT/SLIT tables. That is not going to happen > in the very short term.. And I don''t recall seeing any patches, so > the dom0 NUMA support is right now non-existent?Right, I just don''t wanted to slam the door deliberately. Thinking more about this, we probably need some kind of PV enablement in Dom0, even if we could somehow use the ACPI tables (and thus the ACPI parsing code). If this is the case, we could at the same time remove this "force numa off" patch. I am almost convinced by now. Just waiting for Dario''s opinion for a few more hours and will send my final opinion later today. If you cannot wait, tell me. Andre.
Konrad Rzeszutek Wilk
2012-Sep-21 17:48 UTC
Re: Dom0 crash with old style AMD NUMA detection
> Acked-by: Andre Przywara <andre.przywara@amd.com> > > I compiled and boot-tested this on my (single node ;-) test box. > First bare-metal, dmesg: No NUMA configuration found > Then again, but with numa=off on the cmd-line: NUMA turned off > Then under Xen as Dom0 kernel: NUMA turned off > > So the code behaves under Xen as one would have explicitly specified > numa=off, which is what we want.Right.> I couldn''t get hold of the test machine (old K8 server) that the bug > was once triggered, that''s why I''m reluctant to give my Tested-by. > Will try this ASAP.OK, will wait with this - it would be a bit silly if the patch did not fix the issue :-)
On 08/17/2012 04:22 PM, Konrad Rzeszutek Wilk wrote:> On Fri, Aug 03, 2012 at 08:36:28AM -0400, Konrad Rzeszutek Wilk wrote: >> On Fri, Aug 03, 2012 at 02:20:31PM +0200, Andre Przywara wrote:Sorry Konrad, almost forgot. Comment (and Ack) below...>>> we see Dom0 crashes due to the kernel detecting the NUMA topology not by >>> ACPI, but directly from the northbridge (CONFIG_AMD_NUMA). >>> >>> This will detect the actual NUMA config of the physical machine, but >>> will crash about the mismatch with Dom0''s virtual memory. Variation of >>> the theme: Dom0 sees what it''s not supposed to see. >>> >>> This happens with the said config option enabled and on a machine where >>> this scanning is still enabled (K8 and Fam10h, not Bulldozer class) >>> >>> We have this dump then: >>> [ 0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1 >>> distance=10 >>> [ 0.000000] Scanning NUMA topology in Northbridge 24 >>> [ 0.000000] Number of physical nodes 4 >>> [ 0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000 >>> [ 0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000 >>> [ 0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000 >>> [ 0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000 >>> [ 0.000000] Initmem setup node 0 0000000000000000-0000000040000000 >>> [ 0.000000] NODE_DATA [000000003ffd9000 - 000000003fffffff] >>> [ 0.000000] Initmem setup node 1 0000000040000000-0000000138000000 >>> [ 0.000000] NODE_DATA [0000000137fd9000 - 0000000137ffffff] >>> [ 0.000000] Initmem setup node 2 0000000138000000-00000001f8000000 >>> [ 0.000000] NODE_DATA [00000001f095e000 - 00000001f0984fff] >>> [ 0.000000] Initmem setup node 3 00000001f8000000-0000000238000000 >>> [ 0.000000] Cannot find 159744 bytes in node 3 >>> [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at >>> (null) >>> [ 0.000000] IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 >>> [ 0.000000] PGD 0 >>> [ 0.000000] Oops: 0000 [#1] SMP >>> [ 0.000000] CPU 0 >>> [ 0.000000] Modules linked in: >>> [ 0.000000] >>> [ 0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar >>> [ 0.000000] RIP: e030:[<ffffffff81d220e6>] [<ffffffff81d220e6>] >>> __alloc_bootmem_node+0x43/0x96 >>> [ 0.000000] RSP: e02b:ffffffff81c01de8 EFLAGS: 00010046 >>> [ 0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: >>> 0000000000000000 >>> [ 0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: >>> 0000000000000000 >>> [ 0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: >>> 0000000000000000 >>> [ 0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: >>> 0000000000000000 >>> [ 0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: >>> 0000000000000003 >>> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81ced000(0000) >>> knlGS:0000000000000000 >>> [ 0.000000] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: >>> 0000000000000660 >>> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>> 0000000000000000 >>> [ 0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: >>> 0000000000000000 >>> [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, >>> task ffffffff81c0d020) >>> [ 0.000000] Stack: >>> [ 0.000000] 00000000000000c0 0000000000000003 0000000000000000 >>> 000000000000003f >>> [ 0.000000] ffffffff81c01e68 ffffffff81d23024 0000000000400000 >>> 0000000000000002 >>> [ 0.000000] 0000000000080000 ffff8801f055e000 ffff8801f055e1f8 >>> 0000000000000000 >>> [ 0.000000] Call Trace: >>> [ 0.000000] [<ffffffff81d23024>] >>> sparse_early_usemaps_alloc_node+0x64/0x178 >>> [ 0.000000] [<ffffffff81d23348>] sparse_init+0xe4/0x25a >>> [ 0.000000] [<ffffffff81d16840>] paging_init+0x13/0x22 >>> [ 0.000000] [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b >>> [ 0.000000] [<ffffffff81683954>] ? printk+0x3c/0x3e >>> [ 0.000000] [<ffffffff81d01a38>] start_kernel+0xe5/0x468 >>> [ 0.000000] [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1 >>> [ 0.000000] [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36 >>> [ 0.000000] [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c >>> [ 0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59 >>> be 2a >>> 01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f >>> <41> 8b >>> bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de >>> [ 0.000000] RIP [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 >>> [ 0.000000] RSP<ffffffff81c01de8> >>> [ 0.000000] CR2: 0000000000000000 >>> [ 0.000000] ---[ end trace a7919e7f17c0a725 ]--- >>> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! >>> (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting. >>> >>> >>> >>> The obvious solution would be to explicitly deny northbridge scanning >>> when running as Dom0, though I am not sure how to implement this without >>> upsetting the other kernel folks about "that crappy Xen thing" again ;-) >> >> Heh. >> Is there a numa=0 option that could be used to override it to turn it >> off? > > Not compile tested.. but was thinking something like this: > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index 43fd630..838cc1f 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -17,6 +17,7 @@ > #include<asm/e820.h> > #include<asm/setup.h> > #include<asm/acpi.h> > +#include<asm/numa.h> > #include<asm/xen/hypervisor.h> > #include<asm/xen/hypercall.h> > > @@ -528,4 +529,7 @@ void __init xen_arch_setup(void) > disable_cpufreq(); > WARN_ON(set_pm_idle_to_default()); > fiddle_vdso(); > +#ifdef CONFIG_NUMA > + numa_off = 1; > +#endif > } >Acked-by: Andre Przywara <andre.przywara@amd.com> I compiled and boot-tested this on my (single node ;-) test box. First bare-metal, dmesg: No NUMA configuration found Then again, but with numa=off on the cmd-line: NUMA turned off Then under Xen as Dom0 kernel: NUMA turned off So the code behaves under Xen as one would have explicitly specified numa=off, which is what we want. I couldn''t get hold of the test machine (old K8 server) that the bug was once triggered, that''s why I''m reluctant to give my Tested-by. Will try this ASAP. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany
On 09/21/2012 07:48 PM, Konrad Rzeszutek Wilk wrote:>> Acked-by: Andre Przywara<andre.przywara@amd.com> >> >> I compiled and boot-tested this on my (single node ;-) test box. >> First bare-metal, dmesg: No NUMA configuration found >> Then again, but with numa=off on the cmd-line: NUMA turned off >> Then under Xen as Dom0 kernel: NUMA turned off >> >> So the code behaves under Xen as one would have explicitly specified >> numa=off, which is what we want. > > Right. >> I couldn''t get hold of the test machine (old K8 server) that the bug >> was once triggered, that''s why I''m reluctant to give my Tested-by. >> Will try this ASAP. > > OK, will wait with this - it would be a bit silly if the patch did not > fix the issue :-)Thanks for you patience. I tried some machines, it not only affects K8s, but also Barcelonas and Magny-Cours. Boot those with a Xen HV and restrict Dom0''s memory to something well below the first node''s size (say dom0_mem=512M). If the 3.x Dom0 kernel has CONFIG_AMD_NUMA compiled in, the box will crash, because the hardware''s NUMA info read from the northbridge does not fit to Dom0''s understanding of it''s memory. With your fix the box booted fine, NUMA is turned off and everyone is happy. Double checked by commenting the numa_off=1 line in your patch: crash again. So this line definitely fixes this. Tested-by: Andre Przywara <andre.przywara@amd.com> Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany
Konrad Rzeszutek Wilk
2012-Sep-24 13:48 UTC
Re: Dom0 crash with old style AMD NUMA detection
On Sat, Sep 22, 2012 at 01:46:57AM +0200, Andre Przywara wrote:> On 09/21/2012 07:48 PM, Konrad Rzeszutek Wilk wrote: > >>Acked-by: Andre Przywara<andre.przywara@amd.com> > >> > >>I compiled and boot-tested this on my (single node ;-) test box. > >>First bare-metal, dmesg: No NUMA configuration found > >>Then again, but with numa=off on the cmd-line: NUMA turned off > >>Then under Xen as Dom0 kernel: NUMA turned off > >> > >>So the code behaves under Xen as one would have explicitly specified > >>numa=off, which is what we want. > > > >Right. > >>I couldn''t get hold of the test machine (old K8 server) that the bug > >>was once triggered, that''s why I''m reluctant to give my Tested-by. > >>Will try this ASAP. > > > >OK, will wait with this - it would be a bit silly if the patch did not > >fix the issue :-) > > Thanks for you patience. I tried some machines, it not only affects > K8s, but also Barcelonas and Magny-Cours. > Boot those with a Xen HV and restrict Dom0''s memory to something > well below the first node''s size (say dom0_mem=512M). If the 3.x > Dom0 kernel has CONFIG_AMD_NUMA compiled in, the box will crash, > because the hardware''s NUMA info read from the northbridge does not > fit to Dom0''s understanding of it''s memory. > With your fix the box booted fine, NUMA is turned off and everyone is happy. > Double checked by commenting the numa_off=1 line in your patch: > crash again. So this line definitely fixes this. > > Tested-by: Andre Przywara <andre.przywara@amd.com>OK, send out a git pull for it today. If Linus doesn''t take it, I will just have to do it in v3.7 time-frame and do the stable kernel backport. Thanks again for testing and reporting this!> > Regards, > Andre. > > -- > Andre Przywara > AMD-Operating System Research Center (OSRC), Dresden, Germany >