thr3ads.net - Xen devel - [Xen-devel] kernel panic when enable x2apic [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Zhang, Yang Z

2010-Nov-17 08:59 UTC

[Xen-devel] kernel panic when enable x2apic

Hi jan.
Did your test your patch with x2apic enabled? We always see kernel panic when
enable x2apic since 22375? And without your patch or set x2apic=0 in xen command
line, it works fine.
Any suggestions?

(XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48013cac9>] pci_get_pdev_by_domain+0x49/0x5b
(XEN) RFLAGS: 0000000000010207   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff83007f034000   rcx: 0000000000000000
(XEN) rdx: 00000000ffffffff   rsi: 00000000ffffffff   rdi: ffff83007f034000
(XEN) rbp: ffff82c480297d38   rsp: ffff82c480297d38   r8:  0000000000000000
(XEN) r9:  ffff82c4802525b0   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: ffff83007f034000   r13: 0000000000000004   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 000000007f29c000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480297d38:
(XEN)    ffff82c480297d78 ffff82c48013cec5 ffff82c480297d58 ffff83007f034000
(XEN)    ffff83007f034000 0000000000000004 0000000000000000 0000000000000000
(XEN)    ffff82c480297d98 ffff82c480152fd4 ffff83007f034000 000000000000003f
(XEN)    ffff82c480297dd8 ffff82c480104e1d ffff82c48028a298 0000000000000080
(XEN)    0000000000000080 0000000000000007 0000000000000008 0000000000000007
(XEN)    ffff82c480297f08 ffff82c48027802b 0000000000000000 0000000000000000
(XEN)    ffff82c4802596a5 0000000000259640 00f1400000000000 0000000000000000
(XEN)    ffff83000007bc50 ffff83000007bfb0 ffff83000007bef0 0000000000f14000
(XEN)    0000000000000000 0000000000000000 0000000020000000 0000000000000000
(XEN)    0000000000000000 ffffffffffffffff ffff83000007bef0 000000000007bef0
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffff82c480287cc8 0000000001000000 ffffffff00000000 ffff82c480259640
(XEN)    0000000800000000 000000010000006e 0000000000000003 00000000000002f8
(XEN)    0000000000000000 0000000000000000 000000007c223900 000000007de55018
(XEN)    0000000000000000 0000000000000001 0000000000067ebc ffff82c4801000b5
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c48013cac9>] pci_get_pdev_by_domain+0x49/0x5b
(XEN)    [<ffff82c48013cec5>] pci_release_devices+0x200/0x230
(XEN)    [<ffff82c480152fd4>] arch_domain_destroy+0x28/0x2c9
(XEN)    [<ffff82c480104e1d>] domain_create+0x3cb/0x46e
(XEN)    [<ffff82c48027802b>] __start_xen+0x5660/0x5935
(XEN)    
(XEN) Pagetable walk from 0000000000000000:
(XEN)  L4[0x000] = 000000007f2d4063 5555555555555555
(XEN)  L3[0x000] = 000000007f0fb063 5555555555555555
(XEN)  L2[0x000] = 000000007f0fa063 5555555555555555 
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000000
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

best regards
yang



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Nov-17 09:41 UTC

head link

[Xen-devel] Re: kernel panic when enable x2apic

>>> On 17.11.10 at 09:59, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
> Did your test your patch with x2apic enabled? We always see kernel panic 
No, due to there not being a suitable box accessible to me.
> when enable x2apic since 22375? And without your patch or set x2apic=0 in
xen
> command line, it works fine.
> Any suggestions?
Not immediately.
> (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c48013cac9>]
pci_get_pdev_by_domain+0x49/0x5b
> (XEN) RFLAGS: 0000000000010207   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: ffff83007f034000   rcx: 0000000000000000
> (XEN) rdx: 00000000ffffffff   rsi: 00000000ffffffff   rdi: ffff83007f034000
> (XEN) rbp: ffff82c480297d38   rsp: ffff82c480297d38   r8:  0000000000000000
> (XEN) r9:  ffff82c4802525b0   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: ffff83007f034000   r13: 0000000000000004   r14: 0000000000000000
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 000000007f29c000   cr2: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c480297d38:
> (XEN)    ffff82c480297d78 ffff82c48013cec5 ffff82c480297d58
ffff83007f034000
> (XEN)    ffff83007f034000 0000000000000004 0000000000000000
0000000000000000
> (XEN)    ffff82c480297d98 ffff82c480152fd4 ffff83007f034000
000000000000003f
> (XEN)    ffff82c480297dd8 ffff82c480104e1d ffff82c48028a298
0000000000000080
> (XEN)    0000000000000080 0000000000000007 0000000000000008
0000000000000007
> (XEN)    ffff82c480297f08 ffff82c48027802b 0000000000000000
0000000000000000
> (XEN)    ffff82c4802596a5 0000000000259640 00f1400000000000
0000000000000000
> (XEN)    ffff83000007bc50 ffff83000007bfb0 ffff83000007bef0
0000000000f14000
> (XEN)    0000000000000000 0000000000000000 0000000020000000
0000000000000000
> (XEN)    0000000000000000 ffffffffffffffff ffff83000007bef0
000000000007bef0
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    ffff82c480287cc8 0000000001000000 ffffffff00000000
ffff82c480259640
> (XEN)    0000000800000000 000000010000006e 0000000000000003
00000000000002f8
> (XEN)    0000000000000000 0000000000000000 000000007c223900
000000007de55018
> (XEN)    0000000000000000 0000000000000001 0000000000067ebc
ffff82c4801000b5
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48013cac9>] pci_get_pdev_by_domain+0x49/0x5b
> (XEN)    [<ffff82c48013cec5>] pci_release_devices+0x200/0x230
> (XEN)    [<ffff82c480152fd4>] arch_domain_destroy+0x28/0x2c9
> (XEN)    [<ffff82c480104e1d>] domain_create+0x3cb/0x46e
> (XEN)    [<ffff82c48027802b>] __start_xen+0x5660/0x5935
With arch_domain_destroy() on the call stack, things went wrong
earlier (and the crash likely is in some error path, where I''d expect
the problem has always existed, just that it was never hit). Was
there no other relevant output prior to the actual crash?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Yang Z

2010-Nov-17 13:16 UTC

head link

[Xen-devel] RE: kernel panic when enable x2apic

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Wednesday, November 17, 2010 5:42 PM
> To: Zhang, Yang Z
> Cc: Han, Weidong; xen-devel@lists.xensource.com
> Subject: Re: kernel panic when enable x2apic
> 
> >>> On 17.11.10 at 09:59, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
> > Did your test your patch with x2apic enabled? We always see kernel
panic
> 
> No, due to there not being a suitable box accessible to me.
> 
> > when enable x2apic since 22375? And without your patch or set x2apic=0
in
> xen
> > command line, it works fine.
> > Any suggestions?
> 
> Not immediately.
> 
> > (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Tainted:    C ]----
> > (XEN) CPU:    0
> > (XEN) RIP:    e008:[<ffff82c48013cac9>]
> pci_get_pdev_by_domain+0x49/0x5b
> > (XEN) RFLAGS: 0000000000010207   CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000   rbx: ffff83007f034000   rcx:
> 0000000000000000
> > (XEN) rdx: 00000000ffffffff   rsi: 00000000ffffffff   rdi:
ffff83007f034000
> > (XEN) rbp: ffff82c480297d38   rsp: ffff82c480297d38   r8:
> 0000000000000000
> > (XEN) r9:  ffff82c4802525b0   r10: 0000000000000000   r11:
> 0000000000000000
> > (XEN) r12: ffff83007f034000   r13: 0000000000000004   r14:
> 0000000000000000
> > (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4:
> 00000000000026f0
> > (XEN) cr3: 000000007f29c000   cr2: 0000000000000000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c480297d38:
> > (XEN)    ffff82c480297d78 ffff82c48013cec5 ffff82c480297d58
> ffff83007f034000
> > (XEN)    ffff83007f034000 0000000000000004 0000000000000000
> 0000000000000000
> > (XEN)    ffff82c480297d98 ffff82c480152fd4 ffff83007f034000
> 000000000000003f
> > (XEN)    ffff82c480297dd8 ffff82c480104e1d ffff82c48028a298
> 0000000000000080
> > (XEN)    0000000000000080 0000000000000007 0000000000000008
> 0000000000000007
> > (XEN)    ffff82c480297f08 ffff82c48027802b 0000000000000000
> 0000000000000000
> > (XEN)    ffff82c4802596a5 0000000000259640 00f1400000000000
> 0000000000000000
> > (XEN)    ffff83000007bc50 ffff83000007bfb0 ffff83000007bef0
> 0000000000f14000
> > (XEN)    0000000000000000 0000000000000000 0000000020000000
> 0000000000000000
> > (XEN)    0000000000000000 ffffffffffffffff ffff83000007bef0
> 000000000007bef0
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    ffff82c480287cc8 0000000001000000 ffffffff00000000
> ffff82c480259640
> > (XEN)    0000000800000000 000000010000006e 0000000000000003
> 00000000000002f8
> > (XEN)    0000000000000000 0000000000000000 000000007c223900
> 000000007de55018
> > (XEN)    0000000000000000 0000000000000001 0000000000067ebc
> ffff82c4801000b5
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82c48013cac9>] pci_get_pdev_by_domain+0x49/0x5b
> > (XEN)    [<ffff82c48013cec5>] pci_release_devices+0x200/0x230
> > (XEN)    [<ffff82c480152fd4>] arch_domain_destroy+0x28/0x2c9
> > (XEN)    [<ffff82c480104e1d>] domain_create+0x3cb/0x46e
> > (XEN)    [<ffff82c48027802b>] __start_xen+0x5660/0x5935
> 
> With arch_domain_destroy() on the call stack, things went wrong
> earlier (and the crash likely is in some error path, where I''d
expect
> the problem has always existed, just that it was never hit). Was
> there no other relevant output prior to the actual crash?
> 
In fact, there have other error info before the crash and I didn''t see
it before:
(XEN) traps.c:2938: GPF (0000): ffff82c4801a0a73 -> ffff82c48020f0d2
(XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed
(XEN) traps.c:2938: GPF (0000): ffff82c4801a0a73 -> ffff82c48020f0d2
(XEN) MTRR: CPU 0: Writing MSR 201 to f00000010 failed

best regards
yang> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Nov-17 16:23 UTC

head link

[Xen-devel] RE: kernel panic when enable x2apic

>>> On 17.11.10 at 14:16, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
> In fact, there have other error info before the crash and I didn''t
see it
> before:
> (XEN) traps.c:2938: GPF (0000): ffff82c4801a0a73 -> ffff82c48020f0d2
> (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed
> (XEN) traps.c:2938: GPF (0000): ffff82c4801a0a73 -> ffff82c48020f0d2
> (XEN) MTRR: CPU 0: Writing MSR 201 to f00000010 failed
Hmm, these values are totally bogus (and hence it is quite clear
that the CPU would fault on them being written to the actual MSRs).
The question is where these bogus values originate, and how this is
connected to said patch (I can''t see any relation between the two).

Wouldn''t it be possible that you simple send the whole log?

Would you be able to do some more debugging on this to at
least narrow where things start going wrong?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Yang Z

2010-Nov-18 04:53 UTC

head link

[Xen-devel] RE: kernel panic when enable x2apic

With investigation, it seems the heap was broken(not sure, just
guess).>From the calltrace, it called arch_domain_destroy, and I want to see why it
fail. After take a look at code, it show the cpupool equal null which returned
by cpupool_find_by_id()Then I add some debuginfo in the cpupool_find_by_id() to see why the creation
dom0 is fail:
for_each_cpupool(q){
	printk("cpupool_id=%x\n", (*q)->cpupool_id)
}
Unfortunately, it raise another panic:

(XEN) cpupool_id = 7f034000
(XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c4801012c9>] cpupool_find_by_id+0x39/0xcd
(XEN) RFLAGS: 0000000000010282   CONTEXT: hypervisor
(XEN) rax: ffffffff0fff0001   rbx: ffff83007f0f7fa8   rcx: ffff82c4802d2390
(XEN) rdx: 000000000000000a   rsi: 000000000000000a   rdi: ffff82c48024e2e8
(XEN) rbp: ffff82c480297d78   rsp: ffff82c480297d58   r8:  0000000000000000
(XEN) r9:  0000000000000004   r10: 0000000000000008   r11: 0000000000000008
(XEN) r12: 0000000000000000   r13: 0000000000000001   r14: 0000000000000000
(XEN) r15: 000000000000003f   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 000000007f29c000   cr2: ffffffff0fff0001
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480297d58:
(XEN)    0000000000000213 0000000000000000 ffff83007f034000 0000000000000004
(XEN)    ffff82c480297d98 ffff82c4801017dc ffff83007f034000 0000000000000000
(XEN)    ffff82c480297dd8 ffff82c480104f41 ffff82c480289d38 0000000000000080
(XEN)    0000000000000080 0000000000000007 0000000000000008 0000000000000007
(XEN)    ffff82c480297f08 ffff82c480277afb 0000000000000000 0000000000000000
(XEN)    ffff82c4802596a5 0000000000259640 00f1400000000000 0000000000000000
(XEN)    ffff83000007bc50 ffff83000007bfb0 ffff83000007bef0 0000000000f14000
(XEN)    0000000000000000 0000000000000000 0000000020000000 0000000000000000
(XEN)    0000000000000000 ffffffffffffffff ffff83000007bef0 000000000007bef0
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    ffff82c480287788 0000000001000000 ffffffff00000000 ffff82c480259640
(XEN)    0000000800000000 000000010000006e 0000000000000003 00000000000002f8
(XEN)    0000000000000000 0000000000000000 000000007c223900 000000007de39018
(XEN)    0000000000000000 0000000000000001 0000000000067ebc ffff82c4801000b5
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c4801012c9>] cpupool_find_by_id+0x39/0xcd
(XEN)    [<ffff82c4801017dc>] cpupool_add_domain+0x52/0xb9
(XEN)    [<ffff82c480104f41>] domain_create+0x41f/0x59e
(XEN)    [<ffff82c480277afb>] __start_xen+0x5660/0x5935
(XEN)
(XEN) Pagetable walk from ffffffff0fff0001:
>From this output, it shows the cpupool_id = 7f034000, I don''t know
why it was 7f034000. I think the first cpupool_id should be 0?Am I right?
Also the fail with write mtrr MSR, the value also is very strange:
ffff83007f0f7670, it totally different with the SDM says.
(XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed

So, I am think that maybe the heap is broken? 

best regards
yang
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Thursday, November 18, 2010 12:24 AM
> To: Zhang, Yang Z
> Cc: Han, Weidong; xen-devel@lists.xensource.com
> Subject: RE: kernel panic when enable x2apic
> 
> >>> On 17.11.10 at 14:16, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
> > In fact, there have other error info before the crash and I
didn''t see it
> > before:
> > (XEN) traps.c:2938: GPF (0000): ffff82c4801a0a73 ->
ffff82c48020f0d2
> > (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed
> > (XEN) traps.c:2938: GPF (0000): ffff82c4801a0a73 ->
ffff82c48020f0d2
> > (XEN) MTRR: CPU 0: Writing MSR 201 to f00000010 failed
> 
> Hmm, these values are totally bogus (and hence it is quite clear
> that the CPU would fault on them being written to the actual MSRs).
> The question is where these bogus values originate, and how this is
> connected to said patch (I can''t see any relation between the
two).
> 
> Wouldn''t it be possible that you simple send the whole log?
> 
> Would you be able to do some more debugging on this to at
> least narrow where things start going wrong?
> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Nov-19 10:17 UTC

head link

[Xen-devel] RE: kernel panic when enable x2apic

>>> On 18.11.10 at 05:53, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
> From this output, it shows the cpupool_id = 7f034000, I don''t know
why it
> was 7f034000. I think the first cpupool_id should be 0?Am I right?
Yes, it ought to be zero.
> Also the fail with write mtrr MSR, the value also is very strange: 
> ffff83007f0f7670, it totally different with the SDM says.
> (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed
Yes, I had indicated so in an earlier reply.
> So, I am think that maybe the heap is broken? 
General memory corruption is more likely. The question is when it
starts.
>> Wouldn''t it be possible that you simple send the whole log?
Besides the above, I can only repeat this request of mine. I''m
afraid I can''t be of much help here without being able to
reproduce the problem and without having much data to work
from.

Since no-one else complained so far - did you do a full rebuild
of Xen after you pulled in the changeset in question?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Sander Eikelenboom

2010-Nov-19 10:40 UTC

head link

Re: [Xen-devel] RE: kernel panic when enable x2apic

Hello Jan,

Friday, November 19, 2010, 11:17:21 AM, you wrote:
>>>> On 18.11.10 at 05:53, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
>> From this output, it shows the cpupool_id = 7f034000, I don''t
know why it
>> was 7f034000. I think the first cpupool_id should be 0?Am I right?
> Yes, it ought to be zero.
>> Also the fail with write mtrr MSR, the value also is very strange: 
>> ffff83007f0f7670, it totally different with the SDM says.
>> (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed
> Yes, I had indicated so in an earlier reply.
>> So, I am think that maybe the heap is broken? 
> General memory corruption is more likely. The question is when it
> starts.
General memory corruption could also be hardware related (bad dimm) ?
>>> Wouldn''t it be possible that you simple send the whole
log?
> Besides the above, I can only repeat this request of mine. I''m
> afraid I can''t be of much help here without being able to
> reproduce the problem and without having much data to work
> from.
> Since no-one else complained so far - did you do a full rebuild
> of Xen after you pulled in the changeset in question?
> Jan





-- 
Best regards,
 Sander                            mailto:linux@eikelenboom.it


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Nov-19 11:08 UTC

head link

Re: [Xen-devel] RE: kernel panic when enable x2apic

>>> On 19.11.10 at 11:40, Sander Eikelenboom
<linux@eikelenboom.it> wrote:
> Hello Jan,
> 
> Friday, November 19, 2010, 11:17:21 AM, you wrote:
> 
>>>>> On 18.11.10 at 05:53, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
>>> From this output, it shows the cpupool_id = 7f034000, I
don''t know why it
>>> was 7f034000. I think the first cpupool_id should be 0?Am I right?
> 
>> Yes, it ought to be zero.
> 
>>> Also the fail with write mtrr MSR, the value also is very strange: 
>>> ffff83007f0f7670, it totally different with the SDM says.
>>> (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed
> 
>> Yes, I had indicated so in an earlier reply.
> 
>>> So, I am think that maybe the heap is broken? 
> 
>> General memory corruption is more likely. The question is when it
>> starts.
> 
> General memory corruption could also be hardware related (bad dimm) ?
In general, yes, but this wouldn''t normally lead to patterns that look
like valid (albeit misplaced) addresses, I would think.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Yang Z

2010-Nov-30 00:58 UTC

head link

RE: [Xen-devel] RE: kernel panic when enable x2apic

hi Jan, weidong is looking into this issue. It is interesting is that, he can
reproduce this issue in an older changeset in his box. So I am not sure whether
there has another issue in my box. We will let you know ASAP when we get result.

best regards
yang

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@novell.com]
> Sent: Friday, November 19, 2010 7:09 PM
> To: Sander Eikelenboom
> Cc: Han, Weidong; Zhang, Yang Z; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] RE: kernel panic when enable x2apic
> 
> >>> On 19.11.10 at 11:40, Sander Eikelenboom
<linux@eikelenboom.it> wrote:
> > Hello Jan,
> >
> > Friday, November 19, 2010, 11:17:21 AM, you wrote:
> >
> >>>>> On 18.11.10 at 05:53, "Zhang, Yang Z"
<yang.z.zhang@intel.com>
> wrote:
> >>> From this output, it shows the cpupool_id = 7f034000, I
don''t know why it
> >>> was 7f034000. I think the first cpupool_id should be 0?Am I
right?
> >
> >> Yes, it ought to be zero.
> >
> >>> Also the fail with write mtrr MSR, the value also is very
strange:
> >>> ffff83007f0f7670, it totally different with the SDM says.
> >>> (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed
> >
> >> Yes, I had indicated so in an earlier reply.
> >
> >>> So, I am think that maybe the heap is broken?
> >
> >> General memory corruption is more likely. The question is when it
> >> starts.
> >
> > General memory corruption could also be hardware related (bad dimm) ?
> 
> In general, yes, but this wouldn''t normally lead to patterns that
look
> like valid (albeit misplaced) addresses, I would think.
> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Weidong Han

2010-Nov-30 08:50 UTC

head link

Re: [Xen-devel] RE: kernel panic when enable x2apic

Jan Beulich wrote:>>>> On 19.11.10 at 11:40, Sander Eikelenboom
<linux@eikelenboom.it> wrote:
>>>>         
>> Hello Jan,
>>
>> Friday, November 19, 2010, 11:17:21 AM, you wrote:
>>
>>     
>>>>>> On 18.11.10 at 05:53, "Zhang, Yang Z"
<yang.z.zhang@intel.com> wrote:
>>>>>>             
>>>> From this output, it shows the cpupool_id = 7f034000, I
don''t know why it
>>>> was 7f034000. I think the first cpupool_id should be 0?Am I
right?
>>>>         
>>> Yes, it ought to be zero.
>>>       
>>>> Also the fail with write mtrr MSR, the value also is very
strange:
>>>> ffff83007f0f7670, it totally different with the SDM says.
>>>> (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed
>>>>         
>>> Yes, I had indicated so in an earlier reply.
>>>       
>>>> So, I am think that maybe the heap is broken? 
>>>>         
>>> General memory corruption is more likely. The question is when it
>>> starts.
>>>       
>> General memory corruption could also be hardware related (bad dimm) ?
>>     
>
> In general, yes, but this wouldn''t normally lead to patterns that
look
> like valid (albeit misplaced) addresses, I would think.
>
> Jan
>
>   We root caused this issue. Actually it is not related to x2APIC and c/s 
22375, it''s caused by incorrectly setting boot_cpu_data.x86_capability.
boot_cpu_data.x86_capability is set in identify_cpu, but I found 
boot_cpu_data.x86_capability[4] is also set  in start_vmx, which may 
overwrite the previous values. This panic is caused by overwriting 
X86_FEATURE_XSAVE bit  in boot_cpu_data.x86_capability. Yang''s platform
support xsave, and xsave is not enabled (by default), then 
X86_FEATURE_XSAVE bit will be cleared in boot_cpu_data.x86_capability in 
init_intel, that means cpu_has_xsave is 0. But later, start_vmx set that 
bit (cpu_has_xsave is true) again. This results in Xen to allocate xsave 
area in vcpu_initialise, we observed it may allocate a used address for 
it, therefore cause the panic. The obvious solution is to remove 
boot_cpu_data.x86_capability[4] = cpuid_ecx(1) in start_vmx. It indeed 
works with the change. I will send out the patch after more tests.

Regards,
Weidong



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-30 09:23 UTC

head link

Re: [Xen-devel] RE: kernel panic when enable x2apic

On 30/11/2010 08:50, "Weidong Han" <weidong.han@intel.com>
wrote:
> This results in Xen to allocate xsave
> area in vcpu_initialise, we observed it may allocate a used address for
> it, therefore cause the panic.
Actually you xmalloc a zero-sized area, and then immediately write past the
end of it, corrupting neigbouring data, including possibly xmalloc metadata.
> The obvious solution is to remove
> boot_cpu_data.x86_capability[4] = cpuid_ecx(1) in start_vmx. It indeed
> works with the change. I will send out the patch after more tests.
Yes, the write to x86_capability is totally unnecessary. There is a similar
pointless one in SVM code -- in fact they don''t even manage to write to
the
correct array element of x86_capability[]!

Removing both writes to x86_capability[] would be an appropriate fix for 4.0
branch as well.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-30 11:40 UTC

head link

Re: [Xen-devel] RE: kernel panic when enable x2apic

On 30/11/2010 09:23, "Keir Fraser" <keir@xen.org> wrote:
>> The obvious solution is to remove
>> boot_cpu_data.x86_capability[4] = cpuid_ecx(1) in start_vmx. It indeed
>> works with the change. I will send out the patch after more tests.
> 
> Yes, the write to x86_capability is totally unnecessary. There is a similar
> pointless one in SVM code -- in fact they don''t even manage to
write to the
> correct array element of x86_capability[]!
> 
> Removing both writes to x86_capability[] would be an appropriate fix for
4.0
> branch as well.
I applied a fix to xen-unstable and xen-4.0-testing. Eyeballing plus a quick
test convinces me it is absolutely fine. I credited you in the changeset
comment, I hope that''s okay.

 Thanks,
 Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Weidong Han

2010-Dec-01 00:42 UTC

head link

Re: [Xen-devel] RE: kernel panic when enable x2apic

Keir Fraser wrote:> On 30/11/2010 09:23, "Keir Fraser" <keir@xen.org> wrote:
>
>   
>>> The obvious solution is to remove
>>> boot_cpu_data.x86_capability[4] = cpuid_ecx(1) in start_vmx. It
indeed
>>> works with the change. I will send out the patch after more tests.
>>>       
>> Yes, the write to x86_capability is totally unnecessary. There is a
similar
>> pointless one in SVM code -- in fact they don''t even manage to
write to the
>> correct array element of x86_capability[]!
>>
>> Removing both writes to x86_capability[] would be an appropriate fix
for 4.0
>> branch as well.
>>     
>
> I applied a fix to xen-unstable and xen-4.0-testing. Eyeballing plus a
quick
> test convinces me it is absolutely fine. I credited you in the changeset
> comment, I hope that''s okay.
>
>   Great!

Regards,
Weidong


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2010 - kernel panic when enable x2apic

[Xen-devel] kernel panic when enable x2apic

[Xen-devel] Re: kernel panic when enable x2apic

[Xen-devel] RE: kernel panic when enable x2apic

[Xen-devel] RE: kernel panic when enable x2apic

[Xen-devel] RE: kernel panic when enable x2apic

[Xen-devel] RE: kernel panic when enable x2apic

Re: [Xen-devel] RE: kernel panic when enable x2apic

Re: [Xen-devel] RE: kernel panic when enable x2apic

RE: [Xen-devel] RE: kernel panic when enable x2apic

Re: [Xen-devel] RE: kernel panic when enable x2apic

Re: [Xen-devel] RE: kernel panic when enable x2apic

Re: [Xen-devel] RE: kernel panic when enable x2apic

Re: [Xen-devel] RE: kernel panic when enable x2apic