thr3ads.net - Xen devel - Dom0 crash with old style AMD NUMA detection [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Andre Przywara

2012-Aug-03 12:20 UTC

Dom0 crash with old style AMD NUMA detection

Hi,

we see Dom0 crashes due to the kernel detecting the NUMA topology not by 
ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).

This will detect the actual NUMA config of the physical machine, but 
will crash about the mismatch with Dom0''s virtual memory. Variation of 
the theme: Dom0 sees what it''s not supposed to see.

This happens with the said config option enabled and on a machine where 
this scanning is still enabled (K8 and Fam10h, not Bulldozer class)

We have this dump then:
[    0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1
distance=10
[    0.000000] Scanning NUMA topology in Northbridge 24
[    0.000000] Number of physical nodes 4
[    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000
[    0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000
[    0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000
[    0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000
[    0.000000] Initmem setup node 0 0000000000000000-0000000040000000
[    0.000000]   NODE_DATA [000000003ffd9000 - 000000003fffffff]
[    0.000000] Initmem setup node 1 0000000040000000-0000000138000000
[    0.000000]   NODE_DATA [0000000137fd9000 - 0000000137ffffff]
[    0.000000] Initmem setup node 2 0000000138000000-00000001f8000000
[    0.000000]   NODE_DATA [00000001f095e000 - 00000001f0984fff]
[    0.000000] Initmem setup node 3 00000001f8000000-0000000238000000
[    0.000000] Cannot find 159744 bytes in node 3
[    0.000000] BUG: unable to handle kernel NULL pointer dereference at 
(null)
[    0.000000] IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
[    0.000000] PGD 0
[    0.000000] Oops: 0000 [#1] SMP
[    0.000000] CPU 0
[    0.000000] Modules linked in:
[    0.000000]
[    0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
[    0.000000] RIP: e030:[<ffffffff81d220e6>]  [<ffffffff81d220e6>] 
__alloc_bootmem_node+0x43/0x96
[    0.000000] RSP: e02b:ffffffff81c01de8  EFLAGS: 00010046
[    0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 
0000000000000000
[    0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: 
0000000000000000
[    0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: 
0000000000000000
[    0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: 
0000000000000000
[    0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: 
0000000000000003
[    0.000000] FS:  0000000000000000(0000) GS:ffffffff81ced000(0000) 
knlGS:0000000000000000
[    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: 
0000000000000660
[    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 
0000000000000000
[    0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, 
task ffffffff81c0d020)
[    0.000000] Stack:
[    0.000000]  00000000000000c0 0000000000000003 0000000000000000 
000000000000003f
[    0.000000]  ffffffff81c01e68 ffffffff81d23024 0000000000400000 
0000000000000002
[    0.000000]  0000000000080000 ffff8801f055e000 ffff8801f055e1f8 
0000000000000000
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff81d23024>] 
sparse_early_usemaps_alloc_node+0x64/0x178
[    0.000000]  [<ffffffff81d23348>] sparse_init+0xe4/0x25a
[    0.000000]  [<ffffffff81d16840>] paging_init+0x13/0x22
[    0.000000]  [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
[    0.000000]  [<ffffffff81683954>] ? printk+0x3c/0x3e
[    0.000000]  [<ffffffff81d01a38>] start_kernel+0xe5/0x468
[    0.000000]  [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
[    0.000000]  [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
[    0.000000]  [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
[    0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59 
be 2a
01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f 
<41> 8b
bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de
[    0.000000] RIP  [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
[    0.000000]  RSP <ffffffff81c01de8>
[    0.000000] CR2: 0000000000000000
[    0.000000] ---[ end trace a7919e7f17c0a725 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
(XEN) Domain 0 crashed: ''noreboot'' set - not rebooting.



The obvious solution would be to explicitly deny northbridge scanning 
when running as Dom0, though I am not sure how to implement this without 
upsetting the other kernel folks about "that crappy Xen thing" again
;-)

Could someone propose a fix for this (I am OoO for the next two weeks).

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

Konrad Rzeszutek Wilk

2012-Aug-03 12:36 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

On Fri, Aug 03, 2012 at 02:20:31PM +0200, Andre Przywara
wrote:> Hi,
> 
> we see Dom0 crashes due to the kernel detecting the NUMA topology not by 
> ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
> 
> This will detect the actual NUMA config of the physical machine, but 
> will crash about the mismatch with Dom0''s virtual memory.
Variation of
> the theme: Dom0 sees what it''s not supposed to see.
> 
> This happens with the said config option enabled and on a machine where 
> this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
> 
> We have this dump then:
> [    0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1
> distance=10
> [    0.000000] Scanning NUMA topology in Northbridge 24
> [    0.000000] Number of physical nodes 4
> [    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000
> [    0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000
> [    0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000
> [    0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000
> [    0.000000] Initmem setup node 0 0000000000000000-0000000040000000
> [    0.000000]   NODE_DATA [000000003ffd9000 - 000000003fffffff]
> [    0.000000] Initmem setup node 1 0000000040000000-0000000138000000
> [    0.000000]   NODE_DATA [0000000137fd9000 - 0000000137ffffff]
> [    0.000000] Initmem setup node 2 0000000138000000-00000001f8000000
> [    0.000000]   NODE_DATA [00000001f095e000 - 00000001f0984fff]
> [    0.000000] Initmem setup node 3 00000001f8000000-0000000238000000
> [    0.000000] Cannot find 159744 bytes in node 3
> [    0.000000] BUG: unable to handle kernel NULL pointer dereference at 
> (null)
> [    0.000000] IP: [<ffffffff81d220e6>]
__alloc_bootmem_node+0x43/0x96
> [    0.000000] PGD 0
> [    0.000000] Oops: 0000 [#1] SMP
> [    0.000000] CPU 0
> [    0.000000] Modules linked in:
> [    0.000000]
> [    0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
> [    0.000000] RIP: e030:[<ffffffff81d220e6>] 
[<ffffffff81d220e6>]
> __alloc_bootmem_node+0x43/0x96
> [    0.000000] RSP: e02b:ffffffff81c01de8  EFLAGS: 00010046
> [    0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 
> 0000000000000000
> [    0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: 
> 0000000000000000
> [    0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: 
> 0000000000000000
> [    0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: 
> 0000000000000000
> [    0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: 
> 0000000000000003
> [    0.000000] FS:  0000000000000000(0000) GS:ffffffff81ced000(0000) 
> knlGS:0000000000000000
> [    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: 
> 0000000000000660
> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 
> 0000000000000000
> [    0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, 
> task ffffffff81c0d020)
> [    0.000000] Stack:
> [    0.000000]  00000000000000c0 0000000000000003 0000000000000000 
> 000000000000003f
> [    0.000000]  ffffffff81c01e68 ffffffff81d23024 0000000000400000 
> 0000000000000002
> [    0.000000]  0000000000080000 ffff8801f055e000 ffff8801f055e1f8 
> 0000000000000000
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff81d23024>] 
> sparse_early_usemaps_alloc_node+0x64/0x178
> [    0.000000]  [<ffffffff81d23348>] sparse_init+0xe4/0x25a
> [    0.000000]  [<ffffffff81d16840>] paging_init+0x13/0x22
> [    0.000000]  [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
> [    0.000000]  [<ffffffff81683954>] ? printk+0x3c/0x3e
> [    0.000000]  [<ffffffff81d01a38>] start_kernel+0xe5/0x468
> [    0.000000]  [<ffffffff81d012cf>]
x86_64_start_reservations+0xba/0xc1
> [    0.000000]  [<ffffffff81007153>] ?
xen_setup_runstate_info+0x2c/0x36
> [    0.000000]  [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
> [    0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59 
> be 2a
> 01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f 
> <41> 8b
> bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de
> [    0.000000] RIP  [<ffffffff81d220e6>]
__alloc_bootmem_node+0x43/0x96
> [    0.000000]  RSP <ffffffff81c01de8>
> [    0.000000] CR2: 0000000000000000
> [    0.000000] ---[ end trace a7919e7f17c0a725 ]---
> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> (XEN) Domain 0 crashed: ''noreboot'' set - not rebooting.
> 
> 
> 
> The obvious solution would be to explicitly deny northbridge scanning 
> when running as Dom0, though I am not sure how to implement this without 
> upsetting the other kernel folks about "that crappy Xen thing"
again ;-)
Heh.
Is there a numa=0 option that could be used to override it to turn it
off?> 
> Could someone propose a fix for this (I am OoO for the next two weeks).
> 
> Regards,
> Andre.
> 
> -- 
> Andre Przywara
> AMD-Operating System Research Center (OSRC), Dresden, Germany
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2012-Aug-17 14:22 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

On Fri, Aug 03, 2012 at 08:36:28AM -0400, Konrad Rzeszutek Wilk
wrote:> On Fri, Aug 03, 2012 at 02:20:31PM +0200, Andre Przywara wrote:
> > Hi,
> > 
> > we see Dom0 crashes due to the kernel detecting the NUMA topology not
by
> > ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
> > 
> > This will detect the actual NUMA config of the physical machine, but 
> > will crash about the mismatch with Dom0''s virtual memory.
Variation of
> > the theme: Dom0 sees what it''s not supposed to see.
> > 
> > This happens with the said config option enabled and on a machine
where
> > this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
> > 
> > We have this dump then:
> > [    0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1
> > distance=10
> > [    0.000000] Scanning NUMA topology in Northbridge 24
> > [    0.000000] Number of physical nodes 4
> > [    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000
> > [    0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000
> > [    0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000
> > [    0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000
> > [    0.000000] Initmem setup node 0 0000000000000000-0000000040000000
> > [    0.000000]   NODE_DATA [000000003ffd9000 - 000000003fffffff]
> > [    0.000000] Initmem setup node 1 0000000040000000-0000000138000000
> > [    0.000000]   NODE_DATA [0000000137fd9000 - 0000000137ffffff]
> > [    0.000000] Initmem setup node 2 0000000138000000-00000001f8000000
> > [    0.000000]   NODE_DATA [00000001f095e000 - 00000001f0984fff]
> > [    0.000000] Initmem setup node 3 00000001f8000000-0000000238000000
> > [    0.000000] Cannot find 159744 bytes in node 3
> > [    0.000000] BUG: unable to handle kernel NULL pointer dereference
at
> > (null)
> > [    0.000000] IP: [<ffffffff81d220e6>]
__alloc_bootmem_node+0x43/0x96
> > [    0.000000] PGD 0
> > [    0.000000] Oops: 0000 [#1] SMP
> > [    0.000000] CPU 0
> > [    0.000000] Modules linked in:
> > [    0.000000]
> > [    0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD
Dinar/Dinar
> > [    0.000000] RIP: e030:[<ffffffff81d220e6>] 
[<ffffffff81d220e6>]
> > __alloc_bootmem_node+0x43/0x96
> > [    0.000000] RSP: e02b:ffffffff81c01de8  EFLAGS: 00010046
> > [    0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 
> > 0000000000000000
> > [    0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: 
> > 0000000000000000
> > [    0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: 
> > 0000000000000000
> > [    0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: 
> > 0000000000000000
> > [    0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: 
> > 0000000000000003
> > [    0.000000] FS:  0000000000000000(0000) GS:ffffffff81ced000(0000) 
> > knlGS:0000000000000000
> > [    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: 
> > 0000000000000660
> > [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> > 0000000000000000
> > [    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 
> > 0000000000000000
> > [    0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, 
> > task ffffffff81c0d020)
> > [    0.000000] Stack:
> > [    0.000000]  00000000000000c0 0000000000000003 0000000000000000 
> > 000000000000003f
> > [    0.000000]  ffffffff81c01e68 ffffffff81d23024 0000000000400000 
> > 0000000000000002
> > [    0.000000]  0000000000080000 ffff8801f055e000 ffff8801f055e1f8 
> > 0000000000000000
> > [    0.000000] Call Trace:
> > [    0.000000]  [<ffffffff81d23024>] 
> > sparse_early_usemaps_alloc_node+0x64/0x178
> > [    0.000000]  [<ffffffff81d23348>] sparse_init+0xe4/0x25a
> > [    0.000000]  [<ffffffff81d16840>] paging_init+0x13/0x22
> > [    0.000000]  [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
> > [    0.000000]  [<ffffffff81683954>] ? printk+0x3c/0x3e
> > [    0.000000]  [<ffffffff81d01a38>] start_kernel+0xe5/0x468
> > [    0.000000]  [<ffffffff81d012cf>]
x86_64_start_reservations+0xba/0xc1
> > [    0.000000]  [<ffffffff81007153>] ?
xen_setup_runstate_info+0x2c/0x36
> > [    0.000000]  [<ffffffff81d050ee>]
xen_start_kernel+0x565/0x56c
> > [    0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75
59
> > be 2a
> > 01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb
3f
> > <41> 8b
> > bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de
> > [    0.000000] RIP  [<ffffffff81d220e6>]
__alloc_bootmem_node+0x43/0x96
> > [    0.000000]  RSP <ffffffff81c01de8>
> > [    0.000000] CR2: 0000000000000000
> > [    0.000000] ---[ end trace a7919e7f17c0a725 ]---
> > [    0.000000] Kernel panic - not syncing: Attempted to kill the idle
task!
> > (XEN) Domain 0 crashed: ''noreboot'' set - not
rebooting.
> > 
> > 
> > 
> > The obvious solution would be to explicitly deny northbridge scanning 
> > when running as Dom0, though I am not sure how to implement this
without
> > upsetting the other kernel folks about "that crappy Xen
thing" again ;-)
> 
> Heh.
> Is there a numa=0 option that could be used to override it to turn it
> off?
Not compile tested.. but was thinking something like this:

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 43fd630..838cc1f 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -17,6 +17,7 @@
 #include <asm/e820.h>
 #include <asm/setup.h>
 #include <asm/acpi.h>
+#include <asm/numa.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 
@@ -528,4 +529,7 @@ void __init xen_arch_setup(void)
 	disable_cpufreq();
 	WARN_ON(set_pm_idle_to_default());
 	fiddle_vdso();
+#ifdef CONFIG_NUMA
+	numa_off = 1;
+#endif
 }

Konrad Rzeszutek Wilk

2012-Sep-14 18:58 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

> > > [    0.000000] Kernel panic - not syncing: Attempted to kill the
idle task!
> > > (XEN) Domain 0 crashed: ''noreboot'' set - not
rebooting.
> > > 
> > > 
> > > 
> > > The obvious solution would be to explicitly deny northbridge
scanning
> > > when running as Dom0, though I am not sure how to implement this
without
> > > upsetting the other kernel folks about "that crappy Xen
thing" again ;-)
> > 
> > Heh.
> > Is there a numa=0 option that could be used to override it to turn it
> > off?
> 
> Not compile tested.. but was thinking something like this:
ping?> 
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 43fd630..838cc1f 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -17,6 +17,7 @@
>  #include <asm/e820.h>
>  #include <asm/setup.h>
>  #include <asm/acpi.h>
> +#include <asm/numa.h>
>  #include <asm/xen/hypervisor.h>
>  #include <asm/xen/hypercall.h>
>  
> @@ -528,4 +529,7 @@ void __init xen_arch_setup(void)
>  	disable_cpufreq();
>  	WARN_ON(set_pm_idle_to_default());
>  	fiddle_vdso();
> +#ifdef CONFIG_NUMA
> +	numa_off = 1;
> +#endif
>  }

Andre Przywara

2012-Sep-17 07:29 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:>>>> [    0.000000] Kernel panic - not syncing: Attempted to kill
the idle task!
>>>> (XEN) Domain 0 crashed: ''noreboot'' set - not
rebooting.
>>>>
>>>>
>>>>
>>>> The obvious solution would be to explicitly deny northbridge
scanning
>>>> when running as Dom0, though I am not sure how to implement
this without
>>>> upsetting the other kernel folks about "that crappy Xen
thing" again ;-)
>>>
>>> Heh.
>>> Is there a numa=0 option that could be used to override it to turn
it
>>> off?
>>
>> Not compile tested.. but was thinking something like this:
>
> ping?
That looks good to me - at least for the time being.
I just want to check how this interacts with upcoming Dom0 NUMA support. 
It wouldn''t be too clever if we deliberately disable NUMA and future
Xen
version will allow us to use it. So let me check if I can confine this 
turn-off to the fallback K8 northbridge reading.

Thanks,
Andre.
>>
>> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
>> index 43fd630..838cc1f 100644
>> --- a/arch/x86/xen/setup.c
>> +++ b/arch/x86/xen/setup.c
>> @@ -17,6 +17,7 @@
>>   #include <asm/e820.h>
>>   #include <asm/setup.h>
>>   #include <asm/acpi.h>
>> +#include <asm/numa.h>
>>   #include <asm/xen/hypervisor.h>
>>   #include <asm/xen/hypercall.h>
>>
>> @@ -528,4 +529,7 @@ void __init xen_arch_setup(void)
>>   	disable_cpufreq();
>>   	WARN_ON(set_pm_idle_to_default());
>>   	fiddle_vdso();
>> +#ifdef CONFIG_NUMA
>> +	numa_off = 1;
>> +#endif
>>   }
>


-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712

Konrad Rzeszutek Wilk

2012-Sep-17 19:14 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara
wrote:> On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
> >>>>[    0.000000] Kernel panic - not syncing: Attempted to
kill the idle task!
> >>>>(XEN) Domain 0 crashed: ''noreboot'' set -
not rebooting.
> >>>>
> >>>>
> >>>>
> >>>>The obvious solution would be to explicitly deny
northbridge scanning
> >>>>when running as Dom0, though I am not sure how to implement
this without
> >>>>upsetting the other kernel folks about "that crappy
Xen thing" again ;-)
> >>>
> >>>Heh.
> >>>Is there a numa=0 option that could be used to override it to
turn it
> >>>off?
> >>
> >>Not compile tested.. but was thinking something like this:
> >
> >ping?
> 
> That looks good to me - at least for the time being.
OK, can I''ve your Tested-by/Acked-by on it pls?
> I just want to check how this interacts with upcoming Dom0 NUMA
> support. It wouldn''t be too clever if we deliberately disable NUMA
We can always revert this patch in future versions of
Linux.> and future Xen version will allow us to use it. So let me check if I
> can confine this turn-off to the fallback K8 northbridge reading.
This potentially could work, but I would prefer to not do it for 3.6.

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index a4790bf..b4edce4 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -17,6 +17,7 @@
 #include <asm/e820.h>
 #include <asm/setup.h>
 #include <asm/acpi.h>
+#include <asm/numa.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 
@@ -483,7 +484,32 @@ void __cpuinit xen_enable_sysenter(void)
 	if(ret != 0)
 		setup_clear_cpu_cap(sysenter_feature);
 }
+#ifdef CONFIG_AMD_NUMA
+int __cpuinit xen_amd_k8(void)
+{
+	int num;
+
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+		return -ENOENT;
+
+	for (num = 0; num < 32; num++) {
+		u32 header;
+
+		header = read_pci_config(0, num, 0, 0x00);
+		if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)) &&
+			header != (PCI_VENDOR_ID_AMD | (0x1200<<16)) &&
+			header != (PCI_VENDOR_ID_AMD | (0x1300<<16)))
+			continue;
 
+		header = read_pci_config(0, num, 1, 0x00);
+		if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)) &&
+			header != (PCI_VENDOR_ID_AMD | (0x1201<<16)) &&
+			header != (PCI_VENDOR_ID_AMD | (0x1301<<16)))
+			continue;
+		return num;
+	}
+	return -ENOENT;
+#endif
 void __cpuinit xen_enable_syscall(void)
 {
 #ifdef CONFIG_X86_64
@@ -542,4 +568,8 @@ void __init xen_arch_setup(void)
 	disable_cpufreq();
 	WARN_ON(set_pm_idle_to_default());
 	fiddle_vdso();
+#ifdef CONFIG_AMD_NUMA
+	if (xen_amd_k8() >= 0)
+		numa_off=1;
+#endif
 }

Andre Przywara

2012-Sep-18 09:57 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote:> On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote:
>> On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
>>>>>> [    0.000000] Kernel panic - not syncing: Attempted to
kill the idle task!
>>>>>> (XEN) Domain 0 crashed: ''noreboot''
set - not rebooting.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The obvious solution would be to explicitly deny
northbridge scanning
>>>>>> when running as Dom0, though I am not sure how to
implement this without
>>>>>> upsetting the other kernel folks about "that
crappy Xen thing" again ;-)
>>>>>
>>>>> Heh.
>>>>> Is there a numa=0 option that could be used to override it
to turn it
>>>>> off?
>>>>
>>>> Not compile tested.. but was thinking something like this:
>>>
>>> ping?
>>
>> That looks good to me - at least for the time being.
>
> OK, can I''ve your Tested-by/Acked-by on it pls?
>
>> I just want to check how this interacts with upcoming Dom0 NUMA
>> support. It wouldn''t be too clever if we deliberately disable
NUMA
>
> We can always revert this patch in future versions of Linux.
I don''t like this idea. Then we have Linux kernel up to 3.5 working and
say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That would be 
pretty unfortunate.

I haven''t checked back with Dario, but I''d suspect that we use
ACPI for
injecting NUMA topology into Dom0. Even if not, a general "numa=off"
for
Dom0 is too much of a sledgehammer for me.
>> and future Xen version will allow us to use it. So let me check if I
>> can confine this turn-off to the fallback K8 northbridge reading.
>
> This potentially could work, but I would prefer to not do it for 3.6.
Mmh, I don''t get the idea of your patch below. One can always read the 
NUMA topology from the AMD northbridge, but this is deprecated if favor 
of ACPI. The amdtopology.c stuff was only there to enable NUMA for very 
early Opterons, where BIOSes didn''t provide (sane) SRAT tables.
Though we disallow ACPI for NUMA on Dom0, this northbridge scanning 
unfortunately "shines through" the virtualization, actually revealing 
the system''s NUMA topology, which is usually much different from
Dom0''s one.

So instead I want to do more something like this:

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index bfacd2c..7811c0d 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -20,6 +20,8 @@

  extern int numa_off;

+extern bool deny_amd_nb_numa_scan;
+
  /*
   * __apicid_to_node[] stores the raw mapping between physical apicid and
   * node and is used to initialize cpu_to_node mapping.
diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 5247d01..f223a67 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -29,6 +29,8 @@

  static unsigned char __initdata nodeids[8];

+bool deny_amd_nb_numa_scan = 0;
+
  static __init int find_northbridge(void)
  {
  	int num;
@@ -78,6 +80,9 @@ int __init amd_numa_init(void)
  	u32 nodeid, reg;
  	unsigned int bits, cores, apicid_base;

+	if (deny_amd_nb_numa_scan)
+		return -ENOENT;
+
  	if (!early_pci_allowed())
  		return -EINVAL;

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index d11ca11..6db63c0 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -532,6 +532,8 @@ void __init xen_arch_setup(void)
  	}
  #endif

+	deny_amd_nb_numa_scan = 1;
+
  	memcpy(boot_command_line, xen_start_info->cmd_line,
  	       MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ?
  	       COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE);

This would just turn off this one kind of NUMA discovery for Dom0.
The patch is admittedly a bit rough (not sure about the proper placement 
into #ifdef''s, for instance) and not well tested yet.
Also one could think about using a more general variable name to cover 
other hardware things in the future that Dom0 shouldn''t use.
So this isn''t something still for 3.6, probably not even for 3.7.

What about if we drop the patch for this problem at all for 3.6 and 
recommend "numa=off" as a workaround? This is much less sticky than a 
kernel patch and could appear in the Xen wiki, for instance.
After all this isn''t a strict regression (appears with every 3.x
kernel,
AFAICT).
Most of the time the northbridge scanning will yield bogus results, so 
the kernel eventually discards it, but sometimes it seems to slip 
through and causes trouble.
Also it does not trigger on newer (Bulldozer) class CPUs, since we 
deliberately avoided adding the new northbridge PCI-ID for this routine.

Regards,
Andre.
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index a4790bf..b4edce4 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -17,6 +17,7 @@
>   #include <asm/e820.h>
>   #include <asm/setup.h>
>   #include <asm/acpi.h>
> +#include <asm/numa.h>
>   #include <asm/xen/hypervisor.h>
>   #include <asm/xen/hypercall.h>
>
> @@ -483,7 +484,32 @@ void __cpuinit xen_enable_sysenter(void)
>   	if(ret != 0)
>   		setup_clear_cpu_cap(sysenter_feature);
>   }
> +#ifdef CONFIG_AMD_NUMA
> +int __cpuinit xen_amd_k8(void)
> +{
> +	int num;
> +
> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> +		return -ENOENT;
> +
> +	for (num = 0; num < 32; num++) {
> +		u32 header;
> +
> +		header = read_pci_config(0, num, 0, 0x00);
> +		if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)) &&
> +			header != (PCI_VENDOR_ID_AMD | (0x1200<<16)) &&
> +			header != (PCI_VENDOR_ID_AMD | (0x1300<<16)))
> +			continue;
>
> +		header = read_pci_config(0, num, 1, 0x00);
> +		if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)) &&
> +			header != (PCI_VENDOR_ID_AMD | (0x1201<<16)) &&
> +			header != (PCI_VENDOR_ID_AMD | (0x1301<<16)))
> +			continue;
> +		return num;
> +	}
> +	return -ENOENT;
> +#endif
>   void __cpuinit xen_enable_syscall(void)
>   {
>   #ifdef CONFIG_X86_64
> @@ -542,4 +568,8 @@ void __init xen_arch_setup(void)
>   	disable_cpufreq();
>   	WARN_ON(set_pm_idle_to_default());
>   	fiddle_vdso();
> +#ifdef CONFIG_AMD_NUMA
> +	if (xen_amd_k8() >= 0)
> +		numa_off=1;
> +#endif
>   }
>


-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

Konrad Rzeszutek Wilk

2012-Sep-18 13:44 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

On Tue, Sep 18, 2012 at 11:57:33AM +0200, Andre Przywara
wrote:> On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote:
> >On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote:
> >>On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
> >>>>>>[    0.000000] Kernel panic - not syncing:
Attempted to kill the idle task!
> >>>>>>(XEN) Domain 0 crashed:
''noreboot'' set - not rebooting.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>The obvious solution would be to explicitly deny
northbridge scanning
> >>>>>>when running as Dom0, though I am not sure how to
implement this without
> >>>>>>upsetting the other kernel folks about "that
crappy Xen thing" again ;-)
> >>>>>
> >>>>>Heh.
> >>>>>Is there a numa=0 option that could be used to override
it to turn it
> >>>>>off?
> >>>>
> >>>>Not compile tested.. but was thinking something like this:
> >>>
> >>>ping?
> >>
> >>That looks good to me - at least for the time being.
> >
> >OK, can I''ve your Tested-by/Acked-by on it pls?
> >
> >>I just want to check how this interacts with upcoming Dom0 NUMA
> >>support. It wouldn''t be too clever if we deliberately
disable NUMA
> >
> >We can always revert this patch in future versions of Linux.
> 
> I don''t like this idea. Then we have Linux kernel up to 3.5
working
> and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That
> would be pretty unfortunate.
Huh? v3.5 working? But it never worked? I would say turn off the NUMA
detection (keep in mind it still will set up the dummy NUMA stuff)
until there are some PV NUMA capability and then we can revert it.
> 
> I haven''t checked back with Dario, but I''d suspect that
we use ACPI
> for injecting NUMA topology into Dom0. Even if not, a general
> "numa=off" for Dom0 is too much of a sledgehammer for me.
How would you inject it in Dom0? It s a PV guest so the hypervisor would
have to tweak the SRAT/SLIT tables. That is not going to happen
in the very short term.. And I don''t recall seeing any patches, so
the dom0 NUMA support is right now non-existent?
> 
> >>and future Xen version will allow us to use it. So let me check if
I
> >>can confine this turn-off to the fallback K8 northbridge reading.
> >
> >This potentially could work, but I would prefer to not do it for 3.6.
> 
> Mmh, I don''t get the idea of your patch below. One can always read
> the NUMA topology from the AMD northbridge, but this is deprecated
> if favor of ACPI. The amdtopology.c stuff was only there to enable
> NUMA for very early Opterons, where BIOSes didn''t provide (sane)
> SRAT tables.
> Though we disallow ACPI for NUMA on Dom0, this northbridge scanning
> unfortunately "shines through" the virtualization, actually
> revealing the system''s NUMA topology, which is usually much
> different from Dom0''s one.
Right, but isn''t that what you found broke? It wasn''t ACPI
NUMA
but the old-style K8 northbridge information? That is what we are
trying to fix.
> 
> So instead I want to do more something like this:
> 
> diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
> index bfacd2c..7811c0d 100644
> --- a/arch/x86/include/asm/numa.h
> +++ b/arch/x86/include/asm/numa.h
> @@ -20,6 +20,8 @@
> 
>  extern int numa_off;
> 
> +extern bool deny_amd_nb_numa_scan;
> +
>  /*
>   * __apicid_to_node[] stores the raw mapping between physical apicid and
>   * node and is used to initialize cpu_to_node mapping.
> diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
> index 5247d01..f223a67 100644
> --- a/arch/x86/mm/amdtopology.c
> +++ b/arch/x86/mm/amdtopology.c
> @@ -29,6 +29,8 @@
> 
>  static unsigned char __initdata nodeids[8];
> 
> +bool deny_amd_nb_numa_scan = 0;
> +
>  static __init int find_northbridge(void)
>  {
>  	int num;
> @@ -78,6 +80,9 @@ int __init amd_numa_init(void)
>  	u32 nodeid, reg;
>  	unsigned int bits, cores, apicid_base;
> 
> +	if (deny_amd_nb_numa_scan)
> +		return -ENOENT;
> +
>  	if (!early_pci_allowed())
>  		return -EINVAL;
> 
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index d11ca11..6db63c0 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -532,6 +532,8 @@ void __init xen_arch_setup(void)
>  	}
>  #endif
> 
> +	deny_amd_nb_numa_scan = 1;
> +
>  	memcpy(boot_command_line, xen_start_info->cmd_line,
>  	       MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ?
>  	       COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE);
> 
> This would just turn off this one kind of NUMA discovery for Dom0.
> The patch is admittedly a bit rough (not sure about the proper
> placement into #ifdef''s, for instance) and not well tested yet.
> Also one could think about using a more general variable name to
> cover other hardware things in the future that Dom0 shouldn''t use.
> So this isn''t something still for 3.6, probably not even for 3.7.
> 
> What about if we drop the patch for this problem at all for 3.6 and
> recommend "numa=off" as a workaround? This is much less sticky
than
> a kernel patch and could appear in the Xen wiki, for instance.
I hate workarounds. People end up using them forever and they get
codified.
> After all this isn''t a strict regression (appears with every 3.x
> kernel, AFAICT).
> Most of the time the northbridge scanning will yield bogus results,
> so the kernel eventually discards it, but sometimes it seems to slip
> through and causes trouble.
> Also it does not trigger on newer (Bulldozer) class CPUs, since we
> deliberately avoided adding the new northbridge PCI-ID for this
> routine.
Right, you end up using the ACPI NUMA in them.

Konrad Rzeszutek Wilk

2012-Sep-18 14:55 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

On Tue, Sep 18, 2012 at 06:50:14PM +0200, Andre Przywara
wrote:> On 09/18/2012 03:44 PM, Konrad Rzeszutek Wilk wrote:
> >On Tue, Sep 18, 2012 at 11:57:33AM +0200, Andre Przywara wrote:
> >>On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote:
> >>>On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote:
> >>>>On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
> >>>>>>>>[    0.000000] Kernel panic - not syncing:
Attempted to kill the idle task!
> >>>>>>>>(XEN) Domain 0 crashed:
''noreboot'' set - not rebooting.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>The obvious solution would be to explicitly
deny northbridge scanning
> >>>>>>>>when running as Dom0, though I am not sure
how to implement this without
> >>>>>>>>upsetting the other kernel folks about
"that crappy Xen thing" again ;-)
> >>>>>>>
> >>>>>>>Heh.
> >>>>>>>Is there a numa=0 option that could be used to
override it to turn it
> >>>>>>>off?
> >>>>>>
> >>>>>>Not compile tested.. but was thinking something
like this:
> >>>>>
> >>>>>ping?
> >>>>
> >>>>That looks good to me - at least for the time being.
> >>>
> >>>OK, can I''ve your Tested-by/Acked-by on it pls?
> >>>
> >>>>I just want to check how this interacts with upcoming Dom0
NUMA
> >>>>support. It wouldn''t be too clever if we
deliberately disable NUMA
> >>>
> >>>We can always revert this patch in future versions of Linux.
> >>
> >>I don''t like this idea. Then we have Linux kernel up to
3.5 working
> >>and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That
> >>would be pretty unfortunate.
> >
> >Huh? v3.5 working? But it never worked? I would say turn off the NUMA
> >detection (keep in mind it still will set up the dummy NUMA stuff)
> >until there are some PV NUMA capability and then we can revert it.
> 
> I was under the impression that somehow the Dom0 NUMA would be made
> compatible, using some of the existing discovery mechanisms. So we
> would enable the hypervisor, and Dom0 would just magically start
> working. I am probably rooted too much in the HVM world ;-)
> 
> >>
> >>I haven''t checked back with Dario, but I''d
suspect that we use ACPI
> >>for injecting NUMA topology into Dom0. Even if not, a general
> >>"numa=off" for Dom0 is too much of a sledgehammer for me.
> >
> >How would you inject it in Dom0? It s a PV guest so the hypervisor
would
> >have to tweak the SRAT/SLIT tables. That is not going to happen
> >in the very short term.. And I don''t recall seeing any
patches, so
> >the dom0 NUMA support is right now non-existent?
> 
> Right, I just don''t wanted to slam the door deliberately. Thinking
> more about this, we probably need some kind of PV enablement in
> Dom0, even if we could somehow use the ACPI tables (and thus the
> ACPI parsing code).
> If this is the case, we could at the same time remove this "force
> numa off" patch.
> 
> I am almost convinced by now.
> Just waiting for Dario''s opinion for a few more hours and will
send
> my final opinion later today. If you cannot wait, tell me.
Couple of days is OK with me. My deadline is Friday as I would like
to send a git pull to Linus and include this patch if it makes sense.

Andre Przywara

2012-Sep-18 16:50 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

On 09/18/2012 03:44 PM, Konrad Rzeszutek Wilk wrote:> On Tue, Sep 18, 2012 at 11:57:33AM +0200, Andre Przywara wrote:
>> On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote:
>>>> On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
>>>>>>>> [    0.000000] Kernel panic - not syncing:
Attempted to kill the idle task!
>>>>>>>> (XEN) Domain 0 crashed:
''noreboot'' set - not rebooting.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The obvious solution would be to explicitly
deny northbridge scanning
>>>>>>>> when running as Dom0, though I am not sure how
to implement this without
>>>>>>>> upsetting the other kernel folks about
"that crappy Xen thing" again ;-)
>>>>>>>
>>>>>>> Heh.
>>>>>>> Is there a numa=0 option that could be used to
override it to turn it
>>>>>>> off?
>>>>>>
>>>>>> Not compile tested.. but was thinking something like
this:
>>>>>
>>>>> ping?
>>>>
>>>> That looks good to me - at least for the time being.
>>>
>>> OK, can I''ve your Tested-by/Acked-by on it pls?
>>>
>>>> I just want to check how this interacts with upcoming Dom0 NUMA
>>>> support. It wouldn''t be too clever if we deliberately
disable NUMA
>>>
>>> We can always revert this patch in future versions of Linux.
>>
>> I don''t like this idea. Then we have Linux kernel up to 3.5
working
>> and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That
>> would be pretty unfortunate.
>
> Huh? v3.5 working? But it never worked? I would say turn off the NUMA
> detection (keep in mind it still will set up the dummy NUMA stuff)
> until there are some PV NUMA capability and then we can revert it.
I was under the impression that somehow the Dom0 NUMA would be made 
compatible, using some of the existing discovery mechanisms. So we would 
enable the hypervisor, and Dom0 would just magically start working. I am 
probably rooted too much in the HVM world ;-)
>>
>> I haven''t checked back with Dario, but I''d suspect
that we use ACPI
>> for injecting NUMA topology into Dom0. Even if not, a general
>> "numa=off" for Dom0 is too much of a sledgehammer for me.
>
> How would you inject it in Dom0? It s a PV guest so the hypervisor would
> have to tweak the SRAT/SLIT tables. That is not going to happen
> in the very short term.. And I don''t recall seeing any patches, so
> the dom0 NUMA support is right now non-existent?
Right, I just don''t wanted to slam the door deliberately. Thinking more
about this, we probably need some kind of PV enablement in Dom0, even if 
we could somehow use the ACPI tables (and thus the ACPI parsing code).
If this is the case, we could at the same time remove this "force numa 
off" patch.

I am almost convinced by now.
Just waiting for Dario''s opinion for a few more hours and will send my 
final opinion later today. If you cannot wait, tell me.


Andre.

Konrad Rzeszutek Wilk

2012-Sep-21 17:48 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

> Acked-by: Andre Przywara <andre.przywara@amd.com>
> 
> I compiled and boot-tested this on my (single node ;-) test box.
> First bare-metal, dmesg: No NUMA configuration found
> Then again, but with numa=off on the cmd-line: NUMA turned off
> Then under Xen as Dom0 kernel: NUMA turned off
> 
> So the code behaves under Xen as one would have explicitly specified
> numa=off, which is what we want.
Right.> I couldn''t get hold of the test machine (old K8 server) that the
bug
> was once triggered, that''s why I''m reluctant to give my
Tested-by.
> Will try this ASAP.
OK, will wait with this - it would be a bit silly if the patch did not
fix the issue :-)

Andre Przywara

2012-Sep-21 17:49 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

On 08/17/2012 04:22 PM, Konrad Rzeszutek Wilk wrote:> On Fri, Aug 03, 2012 at 08:36:28AM -0400, Konrad Rzeszutek Wilk wrote:
>> On Fri, Aug 03, 2012 at 02:20:31PM +0200, Andre Przywara wrote:
Sorry Konrad, almost forgot.
Comment (and Ack) below...
>>> we see Dom0 crashes due to the kernel detecting the NUMA topology
not by
>>> ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
>>>
>>> This will detect the actual NUMA config of the physical machine,
but
>>> will crash about the mismatch with Dom0''s virtual memory.
Variation of
>>> the theme: Dom0 sees what it''s not supposed to see.
>>>
>>> This happens with the said config option enabled and on a machine
where
>>> this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
>>>
>>> We have this dump then:
>>> [    0.000000] NUMA: Warning: node ids are out of bound, from=-1
to=-1
>>> distance=10
>>> [    0.000000] Scanning NUMA topology in Northbridge 24
>>> [    0.000000] Number of physical nodes 4
>>> [    0.000000] Node 0 MemBase 0000000000000000 Limit
0000000040000000
>>> [    0.000000] Node 1 MemBase 0000000040000000 Limit
0000000138000000
>>> [    0.000000] Node 2 MemBase 0000000138000000 Limit
00000001f8000000
>>> [    0.000000] Node 3 MemBase 00000001f8000000 Limit
0000000238000000
>>> [    0.000000] Initmem setup node 0
0000000000000000-0000000040000000
>>> [    0.000000]   NODE_DATA [000000003ffd9000 - 000000003fffffff]
>>> [    0.000000] Initmem setup node 1
0000000040000000-0000000138000000
>>> [    0.000000]   NODE_DATA [0000000137fd9000 - 0000000137ffffff]
>>> [    0.000000] Initmem setup node 2
0000000138000000-00000001f8000000
>>> [    0.000000]   NODE_DATA [00000001f095e000 - 00000001f0984fff]
>>> [    0.000000] Initmem setup node 3
00000001f8000000-0000000238000000
>>> [    0.000000] Cannot find 159744 bytes in node 3
>>> [    0.000000] BUG: unable to handle kernel NULL pointer
dereference at
>>> (null)
>>> [    0.000000] IP: [<ffffffff81d220e6>]
__alloc_bootmem_node+0x43/0x96
>>> [    0.000000] PGD 0
>>> [    0.000000] Oops: 0000 [#1] SMP
>>> [    0.000000] CPU 0
>>> [    0.000000] Modules linked in:
>>> [    0.000000]
>>> [    0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD
Dinar/Dinar
>>> [    0.000000] RIP: e030:[<ffffffff81d220e6>] 
[<ffffffff81d220e6>]
>>> __alloc_bootmem_node+0x43/0x96
>>> [    0.000000] RSP: e02b:ffffffff81c01de8  EFLAGS: 00010046
>>> [    0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX:
>>> 0000000000000000
>>> [    0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI:
>>> 0000000000000000
>>> [    0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09:
>>> 0000000000000000
>>> [    0.000000] R10: 0000000000098000 R11: 0000000000000000 R12:
>>> 0000000000000000
>>> [    0.000000] R13: 0000000000000000 R14: 0000000000000040 R15:
>>> 0000000000000003
>>> [    0.000000] FS:  0000000000000000(0000)
GS:ffffffff81ced000(0000)
>>> knlGS:0000000000000000
>>> [    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [    0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4:
>>> 0000000000000660
>>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>> 0000000000000000
>>> [    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7:
>>> 0000000000000000
>>> [    0.000000] Process swapper (pid: 0, threadinfo
ffffffff81c00000,
>>> task ffffffff81c0d020)
>>> [    0.000000] Stack:
>>> [    0.000000]  00000000000000c0 0000000000000003 0000000000000000
>>> 000000000000003f
>>> [    0.000000]  ffffffff81c01e68 ffffffff81d23024 0000000000400000
>>> 0000000000000002
>>> [    0.000000]  0000000000080000 ffff8801f055e000 ffff8801f055e1f8
>>> 0000000000000000
>>> [    0.000000] Call Trace:
>>> [    0.000000]  [<ffffffff81d23024>]
>>> sparse_early_usemaps_alloc_node+0x64/0x178
>>> [    0.000000]  [<ffffffff81d23348>] sparse_init+0xe4/0x25a
>>> [    0.000000]  [<ffffffff81d16840>] paging_init+0x13/0x22
>>> [    0.000000]  [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
>>> [    0.000000]  [<ffffffff81683954>] ? printk+0x3c/0x3e
>>> [    0.000000]  [<ffffffff81d01a38>] start_kernel+0xe5/0x468
>>> [    0.000000]  [<ffffffff81d012cf>]
x86_64_start_reservations+0xba/0xc1
>>> [    0.000000]  [<ffffffff81007153>] ?
xen_setup_runstate_info+0x2c/0x36
>>> [    0.000000]  [<ffffffff81d050ee>]
xen_start_kernel+0x565/0x56c
>>> [    0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00
75 59
>>> be 2a
>>> 01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01
eb 3f
>>> <41>  8b
>>> bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de
>>> [    0.000000] RIP  [<ffffffff81d220e6>]
__alloc_bootmem_node+0x43/0x96
>>> [    0.000000]  RSP<ffffffff81c01de8>
>>> [    0.000000] CR2: 0000000000000000
>>> [    0.000000] ---[ end trace a7919e7f17c0a725 ]---
>>> [    0.000000] Kernel panic - not syncing: Attempted to kill the
idle task!
>>> (XEN) Domain 0 crashed: ''noreboot'' set - not
rebooting.
>>>
>>>
>>>
>>> The obvious solution would be to explicitly deny northbridge
scanning
>>> when running as Dom0, though I am not sure how to implement this
without
>>> upsetting the other kernel folks about "that crappy Xen
thing" again ;-)
>>
>> Heh.
>> Is there a numa=0 option that could be used to override it to turn it
>> off?
>
> Not compile tested.. but was thinking something like this:
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 43fd630..838cc1f 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -17,6 +17,7 @@
>   #include<asm/e820.h>
>   #include<asm/setup.h>
>   #include<asm/acpi.h>
> +#include<asm/numa.h>
>   #include<asm/xen/hypervisor.h>
>   #include<asm/xen/hypercall.h>
>
> @@ -528,4 +529,7 @@ void __init xen_arch_setup(void)
>   	disable_cpufreq();
>   	WARN_ON(set_pm_idle_to_default());
>   	fiddle_vdso();
> +#ifdef CONFIG_NUMA
> +	numa_off = 1;
> +#endif
>   }
>
Acked-by: Andre Przywara <andre.przywara@amd.com>

I compiled and boot-tested this on my (single node ;-) test box.
First bare-metal, dmesg: No NUMA configuration found
Then again, but with numa=off on the cmd-line: NUMA turned off
Then under Xen as Dom0 kernel: NUMA turned off

So the code behaves under Xen as one would have explicitly specified 
numa=off, which is what we want.
I couldn''t get hold of the test machine (old K8 server) that the bug
was
once triggered, that''s why I''m reluctant to give my Tested-by.
Will try this ASAP.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

Andre Przywara

2012-Sep-21 23:46 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

On 09/21/2012 07:48 PM, Konrad Rzeszutek Wilk wrote:>> Acked-by: Andre Przywara<andre.przywara@amd.com>
>>
>> I compiled and boot-tested this on my (single node ;-) test box.
>> First bare-metal, dmesg: No NUMA configuration found
>> Then again, but with numa=off on the cmd-line: NUMA turned off
>> Then under Xen as Dom0 kernel: NUMA turned off
>>
>> So the code behaves under Xen as one would have explicitly specified
>> numa=off, which is what we want.
>
> Right.
>> I couldn''t get hold of the test machine (old K8 server) that
the bug
>> was once triggered, that''s why I''m reluctant to give
my Tested-by.
>> Will try this ASAP.
>
> OK, will wait with this - it would be a bit silly if the patch did not
> fix the issue :-)
Thanks for you patience. I tried some machines, it not only affects K8s, 
but also Barcelonas and Magny-Cours.
Boot those with a Xen HV and restrict Dom0''s memory to something well 
below the first node''s size (say dom0_mem=512M). If the 3.x Dom0 kernel
has CONFIG_AMD_NUMA compiled in, the box will crash, because the 
hardware''s NUMA info read from the northbridge does not fit to
Dom0''s
understanding of it''s memory.
With your fix the box booted fine, NUMA is turned off and everyone is happy.
Double checked by commenting the numa_off=1 line in your patch: crash 
again. So this line definitely fixes this.

Tested-by: Andre Przywara <andre.przywara@amd.com>

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

Konrad Rzeszutek Wilk

2012-Sep-24 13:48 UTC

head link

Re: Dom0 crash with old style AMD NUMA detection

On Sat, Sep 22, 2012 at 01:46:57AM +0200, Andre Przywara
wrote:> On 09/21/2012 07:48 PM, Konrad Rzeszutek Wilk wrote:
> >>Acked-by: Andre Przywara<andre.przywara@amd.com>
> >>
> >>I compiled and boot-tested this on my (single node ;-) test box.
> >>First bare-metal, dmesg: No NUMA configuration found
> >>Then again, but with numa=off on the cmd-line: NUMA turned off
> >>Then under Xen as Dom0 kernel: NUMA turned off
> >>
> >>So the code behaves under Xen as one would have explicitly
specified
> >>numa=off, which is what we want.
> >
> >Right.
> >>I couldn''t get hold of the test machine (old K8 server)
that the bug
> >>was once triggered, that''s why I''m reluctant to
give my Tested-by.
> >>Will try this ASAP.
> >
> >OK, will wait with this - it would be a bit silly if the patch did not
> >fix the issue :-)
> 
> Thanks for you patience. I tried some machines, it not only affects
> K8s, but also Barcelonas and Magny-Cours.
> Boot those with a Xen HV and restrict Dom0''s memory to something
> well below the first node''s size (say dom0_mem=512M). If the 3.x
> Dom0 kernel has CONFIG_AMD_NUMA compiled in, the box will crash,
> because the hardware''s NUMA info read from the northbridge does
not
> fit to Dom0''s understanding of it''s memory.
> With your fix the box booted fine, NUMA is turned off and everyone is
happy.
> Double checked by commenting the numa_off=1 line in your patch:
> crash again. So this line definitely fixes this.
> 
> Tested-by: Andre Przywara <andre.przywara@amd.com>
OK, send out a git pull for it today. If Linus doesn''t take it, I will
just have
to do it in v3.7 time-frame and do the stable kernel backport.

Thanks again for testing and reporting this!> 
> Regards,
> Andre.
> 
> -- 
> Andre Przywara
> AMD-Operating System Research Center (OSRC), Dresden, Germany
>

Xen devel - Aug 2012 - Dom0 crash with old style AMD NUMA detection

Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection

Re: Dom0 crash with old style AMD NUMA detection