Hi jan. Did your test your patch with x2apic enabled? We always see kernel panic when enable x2apic since 22375? And without your patch or set x2apic=0 in xen command line, it works fine. Any suggestions? (XEN) ----[ Xen-4.1-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c48013cac9>] pci_get_pdev_by_domain+0x49/0x5b (XEN) RFLAGS: 0000000000010207 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff83007f034000 rcx: 0000000000000000 (XEN) rdx: 00000000ffffffff rsi: 00000000ffffffff rdi: ffff83007f034000 (XEN) rbp: ffff82c480297d38 rsp: ffff82c480297d38 r8: 0000000000000000 (XEN) r9: ffff82c4802525b0 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: ffff83007f034000 r13: 0000000000000004 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 000000007f29c000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff82c480297d38: (XEN) ffff82c480297d78 ffff82c48013cec5 ffff82c480297d58 ffff83007f034000 (XEN) ffff83007f034000 0000000000000004 0000000000000000 0000000000000000 (XEN) ffff82c480297d98 ffff82c480152fd4 ffff83007f034000 000000000000003f (XEN) ffff82c480297dd8 ffff82c480104e1d ffff82c48028a298 0000000000000080 (XEN) 0000000000000080 0000000000000007 0000000000000008 0000000000000007 (XEN) ffff82c480297f08 ffff82c48027802b 0000000000000000 0000000000000000 (XEN) ffff82c4802596a5 0000000000259640 00f1400000000000 0000000000000000 (XEN) ffff83000007bc50 ffff83000007bfb0 ffff83000007bef0 0000000000f14000 (XEN) 0000000000000000 0000000000000000 0000000020000000 0000000000000000 (XEN) 0000000000000000 ffffffffffffffff ffff83000007bef0 000000000007bef0 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) ffff82c480287cc8 0000000001000000 ffffffff00000000 ffff82c480259640 (XEN) 0000000800000000 000000010000006e 0000000000000003 00000000000002f8 (XEN) 0000000000000000 0000000000000000 000000007c223900 000000007de55018 (XEN) 0000000000000000 0000000000000001 0000000000067ebc ffff82c4801000b5 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) Xen call trace: (XEN) [<ffff82c48013cac9>] pci_get_pdev_by_domain+0x49/0x5b (XEN) [<ffff82c48013cec5>] pci_release_devices+0x200/0x230 (XEN) [<ffff82c480152fd4>] arch_domain_destroy+0x28/0x2c9 (XEN) [<ffff82c480104e1d>] domain_create+0x3cb/0x46e (XEN) [<ffff82c48027802b>] __start_xen+0x5660/0x5935 (XEN) (XEN) Pagetable walk from 0000000000000000: (XEN) L4[0x000] = 000000007f2d4063 5555555555555555 (XEN) L3[0x000] = 000000007f0fb063 5555555555555555 (XEN) L2[0x000] = 000000007f0fa063 5555555555555555 (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: 0000000000000000 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. best regards yang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 17.11.10 at 09:59, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: > Did your test your patch with x2apic enabled? We always see kernel panicNo, due to there not being a suitable box accessible to me.> when enable x2apic since 22375? And without your patch or set x2apic=0 in xen > command line, it works fine. > Any suggestions?Not immediately.> (XEN) ----[ Xen-4.1-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82c48013cac9>] pci_get_pdev_by_domain+0x49/0x5b > (XEN) RFLAGS: 0000000000010207 CONTEXT: hypervisor > (XEN) rax: 0000000000000000 rbx: ffff83007f034000 rcx: 0000000000000000 > (XEN) rdx: 00000000ffffffff rsi: 00000000ffffffff rdi: ffff83007f034000 > (XEN) rbp: ffff82c480297d38 rsp: ffff82c480297d38 r8: 0000000000000000 > (XEN) r9: ffff82c4802525b0 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: ffff83007f034000 r13: 0000000000000004 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 000000007f29c000 cr2: 0000000000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff82c480297d38: > (XEN) ffff82c480297d78 ffff82c48013cec5 ffff82c480297d58 ffff83007f034000 > (XEN) ffff83007f034000 0000000000000004 0000000000000000 0000000000000000 > (XEN) ffff82c480297d98 ffff82c480152fd4 ffff83007f034000 000000000000003f > (XEN) ffff82c480297dd8 ffff82c480104e1d ffff82c48028a298 0000000000000080 > (XEN) 0000000000000080 0000000000000007 0000000000000008 0000000000000007 > (XEN) ffff82c480297f08 ffff82c48027802b 0000000000000000 0000000000000000 > (XEN) ffff82c4802596a5 0000000000259640 00f1400000000000 0000000000000000 > (XEN) ffff83000007bc50 ffff83000007bfb0 ffff83000007bef0 0000000000f14000 > (XEN) 0000000000000000 0000000000000000 0000000020000000 0000000000000000 > (XEN) 0000000000000000 ffffffffffffffff ffff83000007bef0 000000000007bef0 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) ffff82c480287cc8 0000000001000000 ffffffff00000000 ffff82c480259640 > (XEN) 0000000800000000 000000010000006e 0000000000000003 00000000000002f8 > (XEN) 0000000000000000 0000000000000000 000000007c223900 000000007de55018 > (XEN) 0000000000000000 0000000000000001 0000000000067ebc ffff82c4801000b5 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) Xen call trace: > (XEN) [<ffff82c48013cac9>] pci_get_pdev_by_domain+0x49/0x5b > (XEN) [<ffff82c48013cec5>] pci_release_devices+0x200/0x230 > (XEN) [<ffff82c480152fd4>] arch_domain_destroy+0x28/0x2c9 > (XEN) [<ffff82c480104e1d>] domain_create+0x3cb/0x46e > (XEN) [<ffff82c48027802b>] __start_xen+0x5660/0x5935With arch_domain_destroy() on the call stack, things went wrong earlier (and the crash likely is in some error path, where I''d expect the problem has always existed, just that it was never hit). Was there no other relevant output prior to the actual crash? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Wednesday, November 17, 2010 5:42 PM > To: Zhang, Yang Z > Cc: Han, Weidong; xen-devel@lists.xensource.com > Subject: Re: kernel panic when enable x2apic > > >>> On 17.11.10 at 09:59, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: > > Did your test your patch with x2apic enabled? We always see kernel panic > > No, due to there not being a suitable box accessible to me. > > > when enable x2apic since 22375? And without your patch or set x2apic=0 in > xen > > command line, it works fine. > > Any suggestions? > > Not immediately. > > > (XEN) ----[ Xen-4.1-unstable x86_64 debug=y Tainted: C ]---- > > (XEN) CPU: 0 > > (XEN) RIP: e008:[<ffff82c48013cac9>] > pci_get_pdev_by_domain+0x49/0x5b > > (XEN) RFLAGS: 0000000000010207 CONTEXT: hypervisor > > (XEN) rax: 0000000000000000 rbx: ffff83007f034000 rcx: > 0000000000000000 > > (XEN) rdx: 00000000ffffffff rsi: 00000000ffffffff rdi: ffff83007f034000 > > (XEN) rbp: ffff82c480297d38 rsp: ffff82c480297d38 r8: > 0000000000000000 > > (XEN) r9: ffff82c4802525b0 r10: 0000000000000000 r11: > 0000000000000000 > > (XEN) r12: ffff83007f034000 r13: 0000000000000004 r14: > 0000000000000000 > > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: > 00000000000026f0 > > (XEN) cr3: 000000007f29c000 cr2: 0000000000000000 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > > (XEN) Xen stack trace from rsp=ffff82c480297d38: > > (XEN) ffff82c480297d78 ffff82c48013cec5 ffff82c480297d58 > ffff83007f034000 > > (XEN) ffff83007f034000 0000000000000004 0000000000000000 > 0000000000000000 > > (XEN) ffff82c480297d98 ffff82c480152fd4 ffff83007f034000 > 000000000000003f > > (XEN) ffff82c480297dd8 ffff82c480104e1d ffff82c48028a298 > 0000000000000080 > > (XEN) 0000000000000080 0000000000000007 0000000000000008 > 0000000000000007 > > (XEN) ffff82c480297f08 ffff82c48027802b 0000000000000000 > 0000000000000000 > > (XEN) ffff82c4802596a5 0000000000259640 00f1400000000000 > 0000000000000000 > > (XEN) ffff83000007bc50 ffff83000007bfb0 ffff83000007bef0 > 0000000000f14000 > > (XEN) 0000000000000000 0000000000000000 0000000020000000 > 0000000000000000 > > (XEN) 0000000000000000 ffffffffffffffff ffff83000007bef0 > 000000000007bef0 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) ffff82c480287cc8 0000000001000000 ffffffff00000000 > ffff82c480259640 > > (XEN) 0000000800000000 000000010000006e 0000000000000003 > 00000000000002f8 > > (XEN) 0000000000000000 0000000000000000 000000007c223900 > 000000007de55018 > > (XEN) 0000000000000000 0000000000000001 0000000000067ebc > ffff82c4801000b5 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) Xen call trace: > > (XEN) [<ffff82c48013cac9>] pci_get_pdev_by_domain+0x49/0x5b > > (XEN) [<ffff82c48013cec5>] pci_release_devices+0x200/0x230 > > (XEN) [<ffff82c480152fd4>] arch_domain_destroy+0x28/0x2c9 > > (XEN) [<ffff82c480104e1d>] domain_create+0x3cb/0x46e > > (XEN) [<ffff82c48027802b>] __start_xen+0x5660/0x5935 > > With arch_domain_destroy() on the call stack, things went wrong > earlier (and the crash likely is in some error path, where I''d expect > the problem has always existed, just that it was never hit). Was > there no other relevant output prior to the actual crash? >In fact, there have other error info before the crash and I didn''t see it before: (XEN) traps.c:2938: GPF (0000): ffff82c4801a0a73 -> ffff82c48020f0d2 (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed (XEN) traps.c:2938: GPF (0000): ffff82c4801a0a73 -> ffff82c48020f0d2 (XEN) MTRR: CPU 0: Writing MSR 201 to f00000010 failed best regards yang> Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 17.11.10 at 14:16, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: > In fact, there have other error info before the crash and I didn''t see it > before: > (XEN) traps.c:2938: GPF (0000): ffff82c4801a0a73 -> ffff82c48020f0d2 > (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed > (XEN) traps.c:2938: GPF (0000): ffff82c4801a0a73 -> ffff82c48020f0d2 > (XEN) MTRR: CPU 0: Writing MSR 201 to f00000010 failedHmm, these values are totally bogus (and hence it is quite clear that the CPU would fault on them being written to the actual MSRs). The question is where these bogus values originate, and how this is connected to said patch (I can''t see any relation between the two). Wouldn''t it be possible that you simple send the whole log? Would you be able to do some more debugging on this to at least narrow where things start going wrong? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
With investigation, it seems the heap was broken(not sure, just guess).>From the calltrace, it called arch_domain_destroy, and I want to see why it fail. After take a look at code, it show the cpupool equal null which returned by cpupool_find_by_id()Then I add some debuginfo in the cpupool_find_by_id() to see why the creation dom0 is fail: for_each_cpupool(q){ printk("cpupool_id=%x\n", (*q)->cpupool_id) } Unfortunately, it raise another panic: (XEN) cpupool_id = 7f034000 (XEN) ----[ Xen-4.1-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c4801012c9>] cpupool_find_by_id+0x39/0xcd (XEN) RFLAGS: 0000000000010282 CONTEXT: hypervisor (XEN) rax: ffffffff0fff0001 rbx: ffff83007f0f7fa8 rcx: ffff82c4802d2390 (XEN) rdx: 000000000000000a rsi: 000000000000000a rdi: ffff82c48024e2e8 (XEN) rbp: ffff82c480297d78 rsp: ffff82c480297d58 r8: 0000000000000000 (XEN) r9: 0000000000000004 r10: 0000000000000008 r11: 0000000000000008 (XEN) r12: 0000000000000000 r13: 0000000000000001 r14: 0000000000000000 (XEN) r15: 000000000000003f cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 000000007f29c000 cr2: ffffffff0fff0001 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff82c480297d58: (XEN) 0000000000000213 0000000000000000 ffff83007f034000 0000000000000004 (XEN) ffff82c480297d98 ffff82c4801017dc ffff83007f034000 0000000000000000 (XEN) ffff82c480297dd8 ffff82c480104f41 ffff82c480289d38 0000000000000080 (XEN) 0000000000000080 0000000000000007 0000000000000008 0000000000000007 (XEN) ffff82c480297f08 ffff82c480277afb 0000000000000000 0000000000000000 (XEN) ffff82c4802596a5 0000000000259640 00f1400000000000 0000000000000000 (XEN) ffff83000007bc50 ffff83000007bfb0 ffff83000007bef0 0000000000f14000 (XEN) 0000000000000000 0000000000000000 0000000020000000 0000000000000000 (XEN) 0000000000000000 ffffffffffffffff ffff83000007bef0 000000000007bef0 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) ffff82c480287788 0000000001000000 ffffffff00000000 ffff82c480259640 (XEN) 0000000800000000 000000010000006e 0000000000000003 00000000000002f8 (XEN) 0000000000000000 0000000000000000 000000007c223900 000000007de39018 (XEN) 0000000000000000 0000000000000001 0000000000067ebc ffff82c4801000b5 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) Xen call trace: (XEN) [<ffff82c4801012c9>] cpupool_find_by_id+0x39/0xcd (XEN) [<ffff82c4801017dc>] cpupool_add_domain+0x52/0xb9 (XEN) [<ffff82c480104f41>] domain_create+0x41f/0x59e (XEN) [<ffff82c480277afb>] __start_xen+0x5660/0x5935 (XEN) (XEN) Pagetable walk from ffffffff0fff0001:>From this output, it shows the cpupool_id = 7f034000, I don''t know why it was 7f034000. I think the first cpupool_id should be 0?Am I right?Also the fail with write mtrr MSR, the value also is very strange: ffff83007f0f7670, it totally different with the SDM says. (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed So, I am think that maybe the heap is broken? best regards yang> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Thursday, November 18, 2010 12:24 AM > To: Zhang, Yang Z > Cc: Han, Weidong; xen-devel@lists.xensource.com > Subject: RE: kernel panic when enable x2apic > > >>> On 17.11.10 at 14:16, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: > > In fact, there have other error info before the crash and I didn''t see it > > before: > > (XEN) traps.c:2938: GPF (0000): ffff82c4801a0a73 -> ffff82c48020f0d2 > > (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed > > (XEN) traps.c:2938: GPF (0000): ffff82c4801a0a73 -> ffff82c48020f0d2 > > (XEN) MTRR: CPU 0: Writing MSR 201 to f00000010 failed > > Hmm, these values are totally bogus (and hence it is quite clear > that the CPU would fault on them being written to the actual MSRs). > The question is where these bogus values originate, and how this is > connected to said patch (I can''t see any relation between the two). > > Wouldn''t it be possible that you simple send the whole log? > > Would you be able to do some more debugging on this to at > least narrow where things start going wrong? > > Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 18.11.10 at 05:53, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: > From this output, it shows the cpupool_id = 7f034000, I don''t know why it > was 7f034000. I think the first cpupool_id should be 0?Am I right?Yes, it ought to be zero.> Also the fail with write mtrr MSR, the value also is very strange: > ffff83007f0f7670, it totally different with the SDM says. > (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failedYes, I had indicated so in an earlier reply.> So, I am think that maybe the heap is broken?General memory corruption is more likely. The question is when it starts.>> Wouldn''t it be possible that you simple send the whole log?Besides the above, I can only repeat this request of mine. I''m afraid I can''t be of much help here without being able to reproduce the problem and without having much data to work from. Since no-one else complained so far - did you do a full rebuild of Xen after you pulled in the changeset in question? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sander Eikelenboom
2010-Nov-19 10:40 UTC
Re: [Xen-devel] RE: kernel panic when enable x2apic
Hello Jan, Friday, November 19, 2010, 11:17:21 AM, you wrote:>>>> On 18.11.10 at 05:53, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: >> From this output, it shows the cpupool_id = 7f034000, I don''t know why it >> was 7f034000. I think the first cpupool_id should be 0?Am I right?> Yes, it ought to be zero.>> Also the fail with write mtrr MSR, the value also is very strange: >> ffff83007f0f7670, it totally different with the SDM says. >> (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed> Yes, I had indicated so in an earlier reply.>> So, I am think that maybe the heap is broken?> General memory corruption is more likely. The question is when it > starts.General memory corruption could also be hardware related (bad dimm) ?>>> Wouldn''t it be possible that you simple send the whole log?> Besides the above, I can only repeat this request of mine. I''m > afraid I can''t be of much help here without being able to > reproduce the problem and without having much data to work > from.> Since no-one else complained so far - did you do a full rebuild > of Xen after you pulled in the changeset in question?> Jan-- Best regards, Sander mailto:linux@eikelenboom.it _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 19.11.10 at 11:40, Sander Eikelenboom <linux@eikelenboom.it> wrote: > Hello Jan, > > Friday, November 19, 2010, 11:17:21 AM, you wrote: > >>>>> On 18.11.10 at 05:53, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: >>> From this output, it shows the cpupool_id = 7f034000, I don''t know why it >>> was 7f034000. I think the first cpupool_id should be 0?Am I right? > >> Yes, it ought to be zero. > >>> Also the fail with write mtrr MSR, the value also is very strange: >>> ffff83007f0f7670, it totally different with the SDM says. >>> (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed > >> Yes, I had indicated so in an earlier reply. > >>> So, I am think that maybe the heap is broken? > >> General memory corruption is more likely. The question is when it >> starts. > > General memory corruption could also be hardware related (bad dimm) ?In general, yes, but this wouldn''t normally lead to patterns that look like valid (albeit misplaced) addresses, I would think. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
hi Jan, weidong is looking into this issue. It is interesting is that, he can reproduce this issue in an older changeset in his box. So I am not sure whether there has another issue in my box. We will let you know ASAP when we get result. best regards yang> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Friday, November 19, 2010 7:09 PM > To: Sander Eikelenboom > Cc: Han, Weidong; Zhang, Yang Z; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] RE: kernel panic when enable x2apic > > >>> On 19.11.10 at 11:40, Sander Eikelenboom <linux@eikelenboom.it> wrote: > > Hello Jan, > > > > Friday, November 19, 2010, 11:17:21 AM, you wrote: > > > >>>>> On 18.11.10 at 05:53, "Zhang, Yang Z" <yang.z.zhang@intel.com> > wrote: > >>> From this output, it shows the cpupool_id = 7f034000, I don''t know why it > >>> was 7f034000. I think the first cpupool_id should be 0?Am I right? > > > >> Yes, it ought to be zero. > > > >>> Also the fail with write mtrr MSR, the value also is very strange: > >>> ffff83007f0f7670, it totally different with the SDM says. > >>> (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed > > > >> Yes, I had indicated so in an earlier reply. > > > >>> So, I am think that maybe the heap is broken? > > > >> General memory corruption is more likely. The question is when it > >> starts. > > > > General memory corruption could also be hardware related (bad dimm) ? > > In general, yes, but this wouldn''t normally lead to patterns that look > like valid (albeit misplaced) addresses, I would think. > > Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich wrote:>>>> On 19.11.10 at 11:40, Sander Eikelenboom <linux@eikelenboom.it> wrote: >>>> >> Hello Jan, >> >> Friday, November 19, 2010, 11:17:21 AM, you wrote: >> >> >>>>>> On 18.11.10 at 05:53, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote: >>>>>> >>>> From this output, it shows the cpupool_id = 7f034000, I don''t know why it >>>> was 7f034000. I think the first cpupool_id should be 0?Am I right? >>>> >>> Yes, it ought to be zero. >>> >>>> Also the fail with write mtrr MSR, the value also is very strange: >>>> ffff83007f0f7670, it totally different with the SDM says. >>>> (XEN) MTRR: CPU 0: Writing MSR 200 to ffff83007f0f7670 failed >>>> >>> Yes, I had indicated so in an earlier reply. >>> >>>> So, I am think that maybe the heap is broken? >>>> >>> General memory corruption is more likely. The question is when it >>> starts. >>> >> General memory corruption could also be hardware related (bad dimm) ? >> > > In general, yes, but this wouldn''t normally lead to patterns that look > like valid (albeit misplaced) addresses, I would think. > > Jan > >We root caused this issue. Actually it is not related to x2APIC and c/s 22375, it''s caused by incorrectly setting boot_cpu_data.x86_capability. boot_cpu_data.x86_capability is set in identify_cpu, but I found boot_cpu_data.x86_capability[4] is also set in start_vmx, which may overwrite the previous values. This panic is caused by overwriting X86_FEATURE_XSAVE bit in boot_cpu_data.x86_capability. Yang''s platform support xsave, and xsave is not enabled (by default), then X86_FEATURE_XSAVE bit will be cleared in boot_cpu_data.x86_capability in init_intel, that means cpu_has_xsave is 0. But later, start_vmx set that bit (cpu_has_xsave is true) again. This results in Xen to allocate xsave area in vcpu_initialise, we observed it may allocate a used address for it, therefore cause the panic. The obvious solution is to remove boot_cpu_data.x86_capability[4] = cpuid_ecx(1) in start_vmx. It indeed works with the change. I will send out the patch after more tests. Regards, Weidong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 30/11/2010 08:50, "Weidong Han" <weidong.han@intel.com> wrote:> This results in Xen to allocate xsave > area in vcpu_initialise, we observed it may allocate a used address for > it, therefore cause the panic.Actually you xmalloc a zero-sized area, and then immediately write past the end of it, corrupting neigbouring data, including possibly xmalloc metadata.> The obvious solution is to remove > boot_cpu_data.x86_capability[4] = cpuid_ecx(1) in start_vmx. It indeed > works with the change. I will send out the patch after more tests.Yes, the write to x86_capability is totally unnecessary. There is a similar pointless one in SVM code -- in fact they don''t even manage to write to the correct array element of x86_capability[]! Removing both writes to x86_capability[] would be an appropriate fix for 4.0 branch as well. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 30/11/2010 09:23, "Keir Fraser" <keir@xen.org> wrote:>> The obvious solution is to remove >> boot_cpu_data.x86_capability[4] = cpuid_ecx(1) in start_vmx. It indeed >> works with the change. I will send out the patch after more tests. > > Yes, the write to x86_capability is totally unnecessary. There is a similar > pointless one in SVM code -- in fact they don''t even manage to write to the > correct array element of x86_capability[]! > > Removing both writes to x86_capability[] would be an appropriate fix for 4.0 > branch as well.I applied a fix to xen-unstable and xen-4.0-testing. Eyeballing plus a quick test convinces me it is absolutely fine. I credited you in the changeset comment, I hope that''s okay. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 30/11/2010 09:23, "Keir Fraser" <keir@xen.org> wrote: > > >>> The obvious solution is to remove >>> boot_cpu_data.x86_capability[4] = cpuid_ecx(1) in start_vmx. It indeed >>> works with the change. I will send out the patch after more tests. >>> >> Yes, the write to x86_capability is totally unnecessary. There is a similar >> pointless one in SVM code -- in fact they don''t even manage to write to the >> correct array element of x86_capability[]! >> >> Removing both writes to x86_capability[] would be an appropriate fix for 4.0 >> branch as well. >> > > I applied a fix to xen-unstable and xen-4.0-testing. Eyeballing plus a quick > test convinces me it is absolutely fine. I credited you in the changeset > comment, I hope that''s okay. > >Great! Regards, Weidong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel