I''m trying to install Xen-unstable on a new machine. At first boot, I''m getting a dom0 crash. The crash output is shown below. The complete console log is attached. (XEN) Std. Loglevel: All (XEN) Guest Loglevel: All (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch input to Xe) (XEN) Freed 148kB init memory. mapping kernel into physical memory Xen: setup ISA identity maps (XEN) traps.c:466:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=000] (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-3.5-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff819067de>] (XEN) RFLAGS: 0000000000000212 EM: 1 CONTEXT: pv guest (XEN) rax: ffffffff865ce000 rbx: ffffffff81001000 rcx: 0000000000000006 (XEN) rdx: 0000000000800000 rsi: 00000000deadbeef rdi: 00000000deadbeef (XEN) rbp: ffffffff81807fa8 rsp: ffffffff81807f68 r8: 000000000000001d (XEN) r9: ffffffff81807fd8 r10: 0000000006606000 r11: 00000000818ec000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 0000000919001000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff81807f68: (XEN) 0000000000000006 00000000818ec000 ffffffff819067de 000000010000e030 (XEN) 0000000000010012 ffffffff81807fa8 000000000000e02b ffffffff81906d64 (XEN) ffffffff81807ff8 ffffffff81906166 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000001 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 (XEN) Domain 0 crashed: rebooting machine in 5 seconds. I pulled from Xen unstable yesterday (bd376919f03a tip), and built it simply using "make world". The machine has four Intel E5540 processors and 36GB of RAM. In the BIOS, I have VT-d disabled, and VT-x enabled. Any suggestions on what steps I should take for debugging this problem and getting dom0 to boot? Thanks, bryan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/30/09 11:05, Bryan D. Payne wrote:> I''m trying to install Xen-unstable on a new machine. At first boot, > I''m getting a dom0 crash. The crash output is shown below. The > complete console log is attached. > > (XEN) Std. Loglevel: All > (XEN) Guest Loglevel: All > (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch input to Xe) > (XEN) Freed 148kB init memory. > mapping kernel into physical memory > Xen: setup ISA identity maps > (XEN) traps.c:466:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=000] > (XEN) domain_crash_sync called from entry.S > (XEN) Domain 0 (vcpu#0) crashed on cpu#0: > (XEN) ----[ Xen-3.5-unstable x86_64 debug=y Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e033:[<ffffffff819067de>] > (XEN) RFLAGS: 0000000000000212 EM: 1 CONTEXT: pv guest > (XEN) rax: ffffffff865ce000 rbx: ffffffff81001000 rcx: 0000000000000006 > (XEN) rdx: 0000000000800000 rsi: 00000000deadbeef rdi: 00000000deadbeef > (XEN) rbp: ffffffff81807fa8 rsp: ffffffff81807f68 r8: 000000000000001d > (XEN) r9: ffffffff81807fd8 r10: 0000000006606000 r11: 00000000818ec000 > (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 0000000919001000 cr2: 0000000000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 > (XEN) Guest stack trace from rsp=ffffffff81807f68: > (XEN) 0000000000000006 00000000818ec000 ffffffff819067de 000000010000e030 > (XEN) 0000000000010012 ffffffff81807fa8 000000000000e02b ffffffff81906d64 > (XEN) ffffffff81807ff8 ffffffff81906166 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000001 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 > (XEN) Domain 0 crashed: rebooting machine in 5 seconds. > > I pulled from Xen unstable yesterday (bd376919f03a tip), and built it > simply using "make world". The machine has four Intel E5540 > processors and 36GB of RAM. In the BIOS, I have VT-d disabled, and > VT-x enabled. Any suggestions on what steps I should take for > debugging this problem and getting dom0 to boot? >It looks like something has hit a BUG_ON. The first step is to try to identify which one: $ gdb vmlinux (gdb) x/i 0xffffffff819067de (gdb) x/i 0xffffffff81906d64 (gdb) x/i 0xffffffff81906166 should give a first clue. Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> $ gdb vmlinux > (gdb) x/i 0xffffffff819067de > (gdb) x/i 0xffffffff81906d64 > (gdb) x/i 0xffffffff81906166(gdb) x/i 0xffffffff819067de 0xffffffff819067de <xen_fix_mfn_list+38>: ud2a (gdb) x/i 0xffffffff81906d64 0xffffffff81906d64 <xen_ident_map_ISA+128>: leaveq (gdb) x/i 0xffffffff81906166 0xffffffff81906166 <xen_start_kernel+1343>: mov 0xa028b(%rip),%rax # 0xffffffff819a63f8 <xen_start_info> Thanks, -bryan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/01/09 06:35, Bryan D. Payne wrote:> (gdb) x/i 0xffffffff819067de > 0xffffffff819067de <xen_fix_mfn_list+38>: ud2a > (gdb) x/i 0xffffffff81906d64 > 0xffffffff81906d64 <xen_ident_map_ISA+128>: leaveq > (gdb) x/i 0xffffffff81906166 > 0xffffffff81906166 <xen_start_kernel+1343>: > mov 0xa028b(%rip),%rax # 0xffffffff819a63f8 <xen_start_info> >It looks like its encountering a pfn (page number) that''s greater than the total number of pages given to the domain. Aaah, 36GB memory. What happens if you configure CONFIG_XEN_MAX_DOMAIN_MEMORY larger to match, or set dom0_mem to less than 32GB? (It shouldn''t crash regardless, but it will confirm the diagnosis.) J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> It looks like its encountering a pfn (page number) that''s greater than > the total number of pages given to the domain. > > Aaah, 36GB memory. What happens if you configure > CONFIG_XEN_MAX_DOMAIN_MEMORY larger to match, or set dom0_mem to less > than 32GB? > > (It shouldn''t crash regardless, but it will confirm the diagnosis.)Ok, so I tried setting dom0_mem to a variety of values less than 32GB. I''m still getting a crash, but now it is more random. Basically, I''m watching the boot process via a serial line, and instead of seeing the dom0 crash output that I posted before, I''m simply seeing the output stop, and the boot process hanging, in a different spot with each boot. Removing the dom0_mem value beings me back to the behavior I had before, where the dom0 crash output shows up reliably each time. -bryan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/06/09 08:13, Bryan D. Payne wrote:> Ok, so I tried setting dom0_mem to a variety of values less than 32GB. > I''m still getting a crash, but now it is more random. Basically, I''m > watching the boot process via a serial line, and instead of seeing the > dom0 crash output that I posted before, I''m simply seeing the output > stop, and the boot process hanging, in a different spot with each > boot. Removing the dom0_mem value beings me back to the behavior I > had before, where the dom0 crash output shows up reliably each time. >That''s mysterious. My first thought is that this is a separate problem. Is it ever stable? What happens if you set dom0_mem to 4G or less? Is the console responsive when the kernel hangs? That is, can you type "Ctrl-A Ctrl-A Ctrl-A" to get Xen, then enter debug keys? ''0'' (zero) should dump the context for dom0 and give some clue about where it is dying. Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> That''s mysterious. My first thought is that this is a separate problem.I agree... and thanks for your help in diagnosing this issue.> Is it ever stable? What happens if you set dom0_mem to 4G or less?It still crashes, even with 2G of dom0 memory.> Is the console responsive when the kernel hangs? That is, can you type > "Ctrl-A Ctrl-A Ctrl-A" to get Xen, then enter debug keys? ''0'' (zero) > should dump the context for dom0 and give some clue about where it is dying.Yes, this seems to work. Typing 0 dumped info for each of the cores (i.e., 15 vcpu states). I''ve attached the output. -bryan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/06/09 10:23, Bryan D. Payne wrote:>> Is it ever stable? What happens if you set dom0_mem to 4G or less? >> > It still crashes, even with 2G of dom0 memory. >What console output do you get?>> Is the console responsive when the kernel hangs? That is, can you type >> "Ctrl-A Ctrl-A Ctrl-A" to get Xen, then enter debug keys? ''0'' (zero) >> should dump the context for dom0 and give some clue about where it is dying. >> > Yes, this seems to work. Typing 0 dumped info for each of the cores > (i.e., 15 vcpu states). I''ve attached the output. >Could you map the RIP values to a symbol with gdb? From a quick look, it seems that most or all of the cpus are at the same place. No, vcpu 8 is doing something else at least. What happens if you boot dom0 with fewer cpus? Do you have CONFIG_PARAVIRT_SPINLOCKS enabled? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> What console output do you get?It''s the same as starting dom0 with 4G - 32G of memory, just freezes at a different place in the boot each time.> Could you map the RIP values to a symbol with gdb? From a quick look, > it seems that most or all of the cpus are at the same place. No, vcpu 8 > is doing something else at least.(gdb) x/i 0xffffffff8100930a 0xffffffff8100930a <hypercall_page+778>: add %al,(%rax) (gdb) x/i 0xffffffff811fea48 0xffffffff811fea48 <delay_tsc+62>: cmpq $0x0,0x632538(%rip) # 0xffffffff81830f88 <pv_cpu_ops+264>> What happens if you boot dom0 with fewer cpus?I tried adding "maxcpus=1" to the linux kernel line in grub. I used this in conjunction with the "dom0_mem=2G" option for xen. Dom0 still crashes the same as without the maxcpus option. Just for kicks, I tried a few other options... * setting mem=4G with dom0_mem=2G seemed still resulted in random dom0 crashing * setting noapic resulted in a consistent crash within Xen: (XEN) ----[ Xen-3.5-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c48014fe29>] add_pin_to_irq+0x24/0xcc (XEN) RFLAGS: 0000000000010296 CONTEXT: hypervisor> Do you have CONFIG_PARAVIRT_SPINLOCKS enabled?No. -bryan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/06/09 11:25, Bryan D. Payne wrote:> (gdb) x/i 0xffffffff8100930a > 0xffffffff8100930a <hypercall_page+778>: add %al,(%rax) >778/32 = hypercall 24 = vcpuop. Probably idling.> (gdb) x/i 0xffffffff811fea48 > 0xffffffff811fea48 <delay_tsc+62>: > cmpq $0x0,0x632538(%rip) # 0xffffffff81830f88 <pv_cpu_ops+264> >That''s almost certainly a kernel panic of some kind. Working out where it came from will be rather tedious: you need to look through the stack dump to find code-ish looking addresses then x/i them (they''ll be the same basic format as 0xffffffff8xxxxxxx). (I really need to work out why they tend not to get printed.)>> What happens if you boot dom0 with fewer cpus? >> > I tried adding "maxcpus=1" to the linux kernel line in grub. I used > this in conjunction with the "dom0_mem=2G" option for xen. Dom0 still > crashes the same as without the maxcpus option. > > Just for kicks, I tried a few other options... > > * setting mem=4G with dom0_mem=2G seemed still resulted in random dom0 crashing > * setting noapic resulted in a consistent crash within Xen: > > (XEN) ----[ Xen-3.5-unstable x86_64 debug=y Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82c48014fe29>] add_pin_to_irq+0x24/0xcc > (XEN) RFLAGS: 0000000000010296 CONTEXT: hypervisor >That shouldn''t happen. Sounds like it might be fall-out from the recent interrupt changes in Xen. Did you supply "noapic" to Xen, dom0 or both? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> That''s almost certainly a kernel panic of some kind. Working out where > it came from will be rather tedious: you need to look through the stack > dump to find code-ish looking addresses then x/i them (they''ll be the > same basic format as 0xffffffff8xxxxxxx).Nice ;-) I''ll let you know what I find out.>> (XEN) ----[ Xen-3.5-unstable x86_64 debug=y Not tainted ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[<ffff82c48014fe29>] add_pin_to_irq+0x24/0xcc >> (XEN) RFLAGS: 0000000000010296 CONTEXT: hypervisor > > That shouldn''t happen. Sounds like it might be fall-out from the recent > interrupt changes in Xen. Did you supply "noapic" to Xen, dom0 or both?Just Xen. Should it go to both? -bryan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/06/09 12:02, Bryan D. Payne wrote:> Just Xen. Should it go to both? >Yeah, they need to agree what interrupt model they''re using. Still shouldn''t crash, regardless. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Jeremy Fitzhardinge <jeremy@goop.org> 06.10.09 21:37 >>> >On 10/06/09 12:02, Bryan D. Payne wrote: >> Just Xen. Should it go to both? >> > >Yeah, they need to agree what interrupt model they''re using. Still >shouldn''t crash, regardless.Not really - Xen automatically passes noapic to the kernel when it had been passed that option (see the dom0_cmdline handling in __start_xen()). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/07/09 04:47, Jan Beulich wrote:> Not really - Xen automatically passes noapic to the kernel when it had > been passed that option (see the dom0_cmdline handling in > __start_xen()). >Ah, OK. I hadn''t realized that. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel