Bastian Blank
2007-Aug-25 15:46 UTC
[Xen-users] Linux 2.3.23-rc3 on Xen 3.1 fails in xen_write_cr4
Hi folks Linux 2.6.23-rc3 with xen enabled fails on Xen 3.1: | (XEN) traps.c:1642:d4 Attempt to change CR4 flags. | general protection fault: 0000 [#1] [...] | EIP is at xen_write_cr4+0x3/0x7 [...] | Call Trace: | [<c0318f2c>] stop_mce+0x21/0x22 Any suggestions? It seems that some parts of the kernel wants to mess with CR4. Is it the MCE support? The complete gpf: | general protection fault: 0000 [#1] | SMP | Modules linked in: | CPU: 0 | EIP: e019:[<c010191f>] Not tainted VLI | EFLAGS: 00010202 (2.6.23-rc3-xen-686 #1) | EIP is at xen_write_cr4+0x3/0x7 | eax: 00000620 ebx: c110a000 ecx: 078bc136 edx: 078bc136 | esi: c033ee2c edi: c1107e2c ebp: 00000020 esp: c030df9c | ds: e021 es: e021 fs: 00d8 gs: 0000 ss: e021 | Process swapper (pid: 0, ti=c030c000 task=c02e4260 task.ti=c030c000) | Stack: c0318f2c c0318846 c0318e03 0000e019 00010202 c110a000 c0311905 0000002c | c03110e9 c030dff0 00009000 c0329ba0 c030dff0 00388000 c030dfe4 00000000 | c0316c27 00000000 078bc1f1 00000001 00000800 00000623 00000000 c0a24000 | Call Trace: | [<c0318f2c>] stop_mce+0x21/0x22 | [<c0318846>] alternative_instructions+0xe/0x10c | [<c0318e03>] check_bugs+0x90/0x130 | [<c0311905>] start_kernel+0x30a/0x317 | [<c03110e9>] unknown_bootoption+0x0/0x195 | [<c0316c27>] xen_start_kernel+0x162/0x169 | ======================| Code: 5a 89 f0 5b 5e 5f 5d c3 c3 31 c0 c3 64 8b 15 08 80 33 c0 89 42 08 c3 64 a1 08 80 33 c0 8b 40 08 c3 64 a1 28 80 33 c0 c3 83 e0 fb <0f> 22 e0 c3 64 a1 0c 80 33 c0 c3 56 89 c6 53 bb 02 00 00 00 ff | EIP: [<c010191f>] xen_write_cr4+0x3/0x7 SS:ESP e021:c030df9c | Kernel panic - not syncing: Attempted to kill the idle task! BAstian -- Fascinating, a totally parochial attitude. -- Spock, "Metamorphosis", stardate 3219.8 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Bastian Blank
2007-Aug-25 16:24 UTC
Re: [Xen-users] Linux 2.3.23-rc3 on Xen 3.1 fails in xen_write_cr4
On Sat, Aug 25, 2007 at 05:46:18PM +0200, Bastian Blank wrote:> Any suggestions? It seems that some parts of the kernel wants to mess > with CR4. Is it the MCE support?After disabling the MCE support, the kernel crashs: | (XEN) domain_crash_sync called from entry.S (ff16f829) | (XEN) Domain 1 (vcpu#0) crashed on cpu#0: | (XEN) ----[ Xen-3.1.0 x86_32p debug=n Not tainted ]---- | (XEN) CPU: 0 | (XEN) EIP: 0061:[<c0107a20>] | (XEN) EFLAGS: 00010286 CONTEXT: guest | (XEN) eax: 00010061 ebx: 00010061 ecx: c027ac8c edx: 0000000d | (XEN) esi: ffffffff edi: 00000000 ebp: 00000020 esp: c004000c | (XEN) cr0: 80050033 cr4: 000006f0 cr3: 0bd89000 cr2: 00010061 | (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: e021 cs: 0061 | (XEN) Guest stack trace from esp=c004000c: | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 | (XEN) 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 | (XEN) 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 There seems to be two code adresses in the trace: | c027abd0 T page_fault | c0107a20 T invalid_op which are called in a loop. Bastian -- I''m a soldier, not a diplomat. I can only tell the truth. -- Kirk, "Errand of Mercy", stardate 3198.9 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Bastian Blank
2007-Aug-29 10:02 UTC
[Xen-devel] [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs
Hi folks After disabling the MCE support, the kernel crashs: | (XEN) domain_crash_sync called from entry.S (ff16f829) | (XEN) Domain 1 (vcpu#0) crashed on cpu#0: | (XEN) ----[ Xen-3.1.0 x86_32p debug=n Not tainted ]---- | (XEN) CPU: 0 | (XEN) EIP: 0061:[<c0107a20>] | (XEN) EFLAGS: 00010286 CONTEXT: guest | (XEN) eax: 00010061 ebx: 00010061 ecx: c027ac8c edx: 0000000d | (XEN) esi: ffffffff edi: 00000000 ebp: 00000020 esp: c004000c | (XEN) cr0: 80050033 cr4: 000006f0 cr3: 0bd89000 cr2: 00010061 | (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: e021 cs: 0061 | (XEN) Guest stack trace from esp=c004000c: | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 | (XEN) 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 | (XEN) 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 There seems to be two code adresses in the trace: | c027abd0 T page_fault | c0107a20 T invalid_op The adress in ecx (c027ac8c) points to | mov %eax,%fs Bastian -- I''m a soldier, not a diplomat. I can only tell the truth. -- Kirk, "Errand of Mercy", stardate 3198.9 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-29 10:14 UTC
Re: [Xen-devel] [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs
You mean 2.6.23-rc3? And what do you mean by ''disabling the MCE support''? -- Keir On 29/8/07 11:02, "Bastian Blank" <bastian@waldi.eu.org> wrote:> Hi folks > > After disabling the MCE support, the kernel crashs: > | (XEN) domain_crash_sync called from entry.S (ff16f829) > | (XEN) Domain 1 (vcpu#0) crashed on cpu#0: > | (XEN) ----[ Xen-3.1.0 x86_32p debug=n Not tainted ]---- > | (XEN) CPU: 0 > | (XEN) EIP: 0061:[<c0107a20>] > | (XEN) EFLAGS: 00010286 CONTEXT: guest > | (XEN) eax: 00010061 ebx: 00010061 ecx: c027ac8c edx: 0000000d > | (XEN) esi: ffffffff edi: 00000000 ebp: 00000020 esp: c004000c > | (XEN) cr0: 80050033 cr4: 000006f0 cr3: 0bd89000 cr2: 00010061 > | (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: e021 cs: 0061 > | (XEN) Guest stack trace from esp=c004000c: > | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 > c027abd0 > | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 > 00010061 > | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 > 00010086 > | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 > 00000002 > | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 > c0107a20 > | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 > 00010061 > | (XEN) 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 > 00010086 > | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 > c027abd0 > | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 > 00010061 > | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 > 00010086 > | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 > 00000002 > | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 > c0107a20 > | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 > 00010061 > | (XEN) 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 > 00010086 > | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 > c027abd0 > | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 > 00010061 > | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 > 00010086 > | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 > 00000002 > | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 > c0107a20 > | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 > 00010061 > > There seems to be two code adresses in the trace: > | c027abd0 T page_fault > | c0107a20 T invalid_op > > The adress in ecx (c027ac8c) points to > | mov %eax,%fs > > Bastian_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bastian Blank
2007-Aug-29 10:27 UTC
Re: [Xen-devel] [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs
On Wed, Aug 29, 2007 at 11:14:19AM +0100, Keir Fraser wrote:> You mean 2.6.23-rc3?Yes.> And what do you mean by ''disabling the MCE support''?CONFIG_X86_MCE. If enabled, it tries to disable MCE on boot, which delivers a gpf in the CR4 write. Bastian -- Change is the essential process of all existence. -- Spock, "Let That Be Your Last Battlefield", stardate 5730.2 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2007-Aug-29 10:39 UTC
Re: [Xen-devel] [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs
>There seems to be two code adresses in the trace: >| c027abd0 T page_fault >| c0107a20 T invalid_opThese aren''t really meaningful, as the VM was obviously in a loop getting repeated exceptions (and the stack pointer clearly went bad meanwhile). You''d need to catch the state much earlier, when the first (or just very few) of these exceptions happened, so that looking at the stack can actually provide some insight. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bastian Blank
2007-Aug-29 11:02 UTC
Re: [Xen-devel] [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs
On Wed, Aug 29, 2007 at 11:39:25AM +0100, Jan Beulich wrote:> These aren''t really meaningful, as the VM was obviously in a loop getting > repeated exceptions (and the stack pointer clearly went bad meanwhile). > You''d need to catch the state much earlier, when the first (or just very few) > of these exceptions happened, so that looking at the stack can actually > provide some insight.How? There seems to be something in tools/debugger/gdb. It seems to build a special gdbserver. Does this gdbserver stop the domain on an exception or do I need to break explicitely? Bastian -- A princess should not be afraid -- not with a brave knight to protect her. -- McCoy, "Shore Leave", stardate 3025.3 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2007-Aug-29 11:32 UTC
Re: [Xen-devel] [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs
>>> Bastian Blank <bastian@waldi.eu.org> 29.08.07 13:02 >>> >On Wed, Aug 29, 2007 at 11:39:25AM +0100, Jan Beulich wrote: >> These aren''t really meaningful, as the VM was obviously in a loop getting >> repeated exceptions (and the stack pointer clearly went bad meanwhile). >> You''d need to catch the state much earlier, when the first (or just very few) >> of these exceptions happened, so that looking at the stack can actually >> provide some insight. > >How? There seems to be something in tools/debugger/gdb. It seems to >build a special gdbserver. Does this gdbserver stop the domain on an >exception or do I need to break explicitely?I never used it, so I don''t know. But you must have been building the kernel yourself, and given from you testing -rc kernels I also assume you''re familiar with modifying the kernel sources, so it shouldn''t be too difficult to e.g. remove registration of the illegal opcode handler so that Xen dumps the VCPU state (and kills the VM) the first time such an exception occurs in the guest (of course assuming there are no other instances of ''valid'' uses of the exception - you''d see this pretty quickly). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-29 13:09 UTC
Re: [Xen-devel] [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs
Sounds like Bastian already worked out the problem is we do not allow X86_CR4_MCE to be cleared. Xen should probably just ignore attempts to change that bit. Even better would be to remember guest value of that flag and return appropriate value on reads of CR4. But that''s more than required here, I think. Actually, instead of GPF''ing on ''bad'' CR4 writes, we could just log a XENLOG_WARNING and return. That would avoid any problems for any other CR4 bits too. -- Keir On 29/8/07 12:32, "Jan Beulich" <jbeulich@novell.com> wrote:>>>> Bastian Blank <bastian@waldi.eu.org> 29.08.07 13:02 >>> >> On Wed, Aug 29, 2007 at 11:39:25AM +0100, Jan Beulich wrote: >>> These aren''t really meaningful, as the VM was obviously in a loop getting >>> repeated exceptions (and the stack pointer clearly went bad meanwhile). >>> You''d need to catch the state much earlier, when the first (or just very >>> few) >>> of these exceptions happened, so that looking at the stack can actually >>> provide some insight. >> >> How? There seems to be something in tools/debugger/gdb. It seems to >> build a special gdbserver. Does this gdbserver stop the domain on an >> exception or do I need to break explicitely? > > I never used it, so I don''t know. But you must have been building the kernel > yourself, and given from you testing -rc kernels I also assume you''re familiar > with modifying the kernel sources, so it shouldn''t be too difficult to e.g. > remove registration of the illegal opcode handler so that Xen dumps the > VCPU state (and kills the VM) the first time such an exception occurs in the > guest (of course assuming there are no other instances of ''valid'' uses of > the exception - you''d see this pretty quickly). > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bastian Blank
2007-Aug-29 13:23 UTC
Re: [Xen-devel] [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs
On Wed, Aug 29, 2007 at 12:02:59PM +0200, Bastian Blank wrote:> | (XEN) domain_crash_sync called from entry.S (ff16f829) > | (XEN) Domain 1 (vcpu#0) crashed on cpu#0: > | (XEN) ----[ Xen-3.1.0 x86_32p debug=n Not tainted ]----I can''t longer reproduce it. Bastian -- We do not colonize. We conquer. We rule. There is no other way for us. -- Rojan, "By Any Other Name", stardate 4657.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bastian Blank
2007-Aug-29 14:13 UTC
Re: [Xen-devel] [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs
On Wed, Aug 29, 2007 at 02:09:23PM +0100, Keir Fraser wrote:> Sounds like Bastian already worked out the problem is we do not allow > X86_CR4_MCE to be cleared. Xen should probably just ignore attempts to > change that bit. Even better would be to remember guest value of that flag > and return appropriate value on reads of CR4. But that''s more than required > here, I think.What should happen with the upcoming MCE support in Xen?> Actually, instead of GPF''ing on ''bad'' CR4 writes, we could just log a > XENLOG_WARNING and return. That would avoid any problems for any other CR4 > bits too.What is the documented behaviour if a bit is set while the machine lacks support for it? Bastian -- The face of war has never changed. Surely it is more logical to heal than to kill. -- Surak of Vulcan, "The Savage Curtain", stardate 5906.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Aug-29 14:22 UTC
Re: [Xen-devel] [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs
On 29/8/07 15:13, "Bastian Blank" <bastian@waldi.eu.org> wrote:> On Wed, Aug 29, 2007 at 02:09:23PM +0100, Keir Fraser wrote: >> Sounds like Bastian already worked out the problem is we do not allow >> X86_CR4_MCE to be cleared. Xen should probably just ignore attempts to >> change that bit. Even better would be to remember guest value of that flag >> and return appropriate value on reads of CR4. But that''s more than required >> here, I think. > > What should happen with the upcoming MCE support in Xen?We''ll cross that bridge when we come to it. ;-) The default will be that it all continues to be handled by Xen, and explicit paravirtualisations will be introduced to allow dom0, and perhaps domUs also, to get involved.>> Actually, instead of GPF''ing on ''bad'' CR4 writes, we could just log a >> XENLOG_WARNING and return. That would avoid any problems for any other CR4 >> bits too. > > What is the documented behaviour if a bit is set while the machine lacks > support for it?That should GPF. But no OS actually probes for features that way, as there are perfectly good CPUID feature flags for that. Linux in particular will not be happy if any of its writes to CR4 results in a GPF. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel