Sylvain Munaut
2013-Feb-18 10:47 UTC
Xen BUG at mm-locks.h:118 in 4.2.1 - mm locking order violation - Dom0 reboot
Hi, I''ve just installed a self-built Xen 4.2.1 package on a debian wheezy and when trying to run a HVM VM (that I was previously running with the official xen 4.0 package on squeeze), it starts fine and I can even use the VM for a few minutes then suddenly I loose all communication with VM and the Dom0 and it just reboots ... I enabled the xen serial console and this is what I got when the crash happens: (XEN) mm locking order violation: 260 > 222 (XEN) Xen BUG at mm-locks.h:118 (XEN) ----[ Xen-4.2.1 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c4801e15fd>] p2m_pod_demand_populate+0x87d/0x8a0 (XEN) RFLAGS: 0000000000010296 CONTEXT: hypervisor (XEN) rax: ffff82c4802e8e20 rbx: ffff8302278ee820 rcx: 0000000000000000 (XEN) rdx: ffff82c48029ff18 rsi: 000000000000000a rdi: ffff82c480258640 (XEN) rbp: 0000000000000000 rsp: ffff82c48029f978 r8: 0000000000000004 (XEN) r9: 0000000000000003 r10: 0000000000000002 r11: ffff82c4802c8c80 (XEN) r12: 0000000000000000 r13: ffff83022795f000 r14: 000000000005f70a (XEN) r15: 000000000005fb0a cr0: 0000000080050033 cr4: 00000000000026f0 (XEN) cr3: 00000002277ac000 cr2: 00000000d8b86058 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff82c48029f978: (XEN) 000000000017e26b 000000000005fb0b ffff8302278eed08 000000010a040000 (XEN) ffff82c48029ff18 600000017e26b067 6000000203243267 60000002279be467 (XEN) 0000000000000100 0000000000000000 ffff8302278ee820 000000000000a040 (XEN) ffff82c48029faf4 ffff82c48016a4bd 0000000000000000 ffff82c4801d6666 (XEN) 0000000000000000 ffff82c48029ff18 0000002000000020 ffff82c48029faf4 (XEN) ffff8302278ee820 ffff82c48029fa70 000000000000a040 000000000005fb0b (XEN) ffff82c48029fbec 0000000000000000 ffff8000002fd858 ffff8302278ee820 (XEN) 0000000000000006 ffff82c4801dbec3 ffff83022795fad0 ffff82c400000001 (XEN) 0000000000000001 000000005fb0b000 000000000000a040 806000000010b000 (XEN) 6000000172210267 60000002015bd467 ffff83017e26b000 0000000000000001 (XEN) ffff8302278ee820 000000000005fb0b ffff82c48029fbec ffff82c48029fbf4 (XEN) 0000000000000000 ffff82c4801d6666 0000000000000000 0000000000000000 (XEN) 0000000000001e00 ffff83022795f000 ffff8300d7d10000 000000000005fb0b (XEN) ffff82c48029ff18 0000000080000b0e ffff8300d7d10000 ffff82c4801fa23f (XEN) ffff830000000001 ffff83022795f000 0000000000000008 0000000000001e00 (XEN) 00007d0a00000006 00000000b9fb2000 000000000003fae9 ffff83022795fb40 (XEN) 000000000017ecb9 00000000000b9fb2 ffff82c4802e9c60 ffff83022795fad0 (XEN) ffff8300d7d10920 0000000000000060 ffff82c48029ff18 0000000000000002 (XEN) 0000000000000e78 0000000000000000 00000000d7d10000 0000000000000d90 (XEN) 0000000000000000 ffff82c4801cdda8 00000004fffe0080 0000000700000000 (XEN) Xen call trace: (XEN) [<ffff82c4801e15fd>] p2m_pod_demand_populate+0x87d/0x8a0 (XEN) [<ffff82c48016a4bd>] get_page+0x2d/0x100 (XEN) [<ffff82c4801d6666>] __get_gfn_type_access+0x86/0x260 (XEN) [<ffff82c4801dbec3>] p2m_gfn_to_mfn+0x693/0x810 (XEN) [<ffff82c4801d6666>] __get_gfn_type_access+0x86/0x260 (XEN) [<ffff82c4801fa23f>] sh_page_fault__guest_3+0x24f/0x1e40 (XEN) [<ffff82c4801cdda8>] vmx_update_guest_cr+0x78/0x5d0 (XEN) [<ffff82c4801ae2da>] hvm_set_cr0+0x2ea/0x480 (XEN) [<ffff82c4801b2bb4>] hvm_mov_to_cr+0xe4/0x1a0 (XEN) [<ffff82c4801cfa63>] vmx_vmexit_handler+0xd33/0x1790 (XEN) [<ffff82c4801cafb5>] vmx_do_resume+0xb5/0x170 (XEN) [<ffff82c48015968c>] context_switch+0x15c/0xdf0 (XEN) [<ffff82c480125d7b>] add_entry+0x4b/0xb0 (XEN) [<ffff82c480125d7b>] add_entry+0x4b/0xb0 (XEN) [<ffff82c4801bf3c7>] pt_update_irq+0x27/0x200 (XEN) [<ffff82c480119830>] csched_tick+0x0/0x2e0 (XEN) [<ffff82c4801bd5a1>] vlapic_has_pending_irq+0x21/0x60 (XEN) [<ffff82c4801b5fca>] hvm_vcpu_has_pending_irq+0x4a/0x90 (XEN) [<ffff82c4801c85c4>] vmx_intr_assist+0x54/0x290 (XEN) [<ffff82c4801d2911>] nvmx_switch_guest+0x51/0x6c0 (XEN) [<ffff82c4801d4256>] vmx_asm_do_vmentry+0x0/0xea (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Xen BUG at mm-locks.h:118 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... Any suggestions ? It is very reproducible and it''s on a test machine I can reboot any time, so if you need more debug info, I can collect it. I don''t have any different hw to test on unfortunately. Cheers, Sylvain
Jan Beulich
2013-Feb-18 11:05 UTC
Re: Xen BUG at mm-locks.h:118 in 4.2.1 - mm locking order violation - Dom0 reboot
>>> On 18.02.13 at 11:47, Sylvain Munaut <s.munaut@whatever-company.com> wrote: > It is very reproducible and it''s on a test machine I can reboot any > time, so if you need more debug info, I can collect it. > I don''t have any different hw to test on unfortunately.Minimally you will want to let us know at what changeset you cloned your tree. Jan
Andrew Cooper
2013-Feb-18 11:09 UTC
Re: Xen BUG at mm-locks.h:118 in 4.2.1 - mm locking order violation - Dom0 reboot
On 18/02/13 10:47, Sylvain Munaut wrote:> Hi, > > > I''ve just installed a self-built Xen 4.2.1 package on a debian wheezy > and when trying to run a HVM VM (that I was previously running with > the official xen 4.0 package on squeeze), it starts fine and I can > even use the VM for a few minutes then suddenly I loose all > communication with VM and the Dom0 and it just reboots ... > > I enabled the xen serial console and this is what I got when the crash happens: > > > (XEN) mm locking order violation: 260 > 222 > (XEN) Xen BUG at mm-locks.h:118 > (XEN) ----[ Xen-4.2.1 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82c4801e15fd>] p2m_pod_demand_populate+0x87d/0x8a0 > (XEN) RFLAGS: 0000000000010296 CONTEXT: hypervisor > (XEN) rax: ffff82c4802e8e20 rbx: ffff8302278ee820 rcx: 0000000000000000 > (XEN) rdx: ffff82c48029ff18 rsi: 000000000000000a rdi: ffff82c480258640 > (XEN) rbp: 0000000000000000 rsp: ffff82c48029f978 r8: 0000000000000004 > (XEN) r9: 0000000000000003 r10: 0000000000000002 r11: ffff82c4802c8c80 > (XEN) r12: 0000000000000000 r13: ffff83022795f000 r14: 000000000005f70a > (XEN) r15: 000000000005fb0a cr0: 0000000080050033 cr4: 00000000000026f0 > (XEN) cr3: 00000002277ac000 cr2: 00000000d8b86058 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff82c48029f978: > (XEN) 000000000017e26b 000000000005fb0b ffff8302278eed08 000000010a040000 > (XEN) ffff82c48029ff18 600000017e26b067 6000000203243267 60000002279be467 > (XEN) 0000000000000100 0000000000000000 ffff8302278ee820 000000000000a040 > (XEN) ffff82c48029faf4 ffff82c48016a4bd 0000000000000000 ffff82c4801d6666 > (XEN) 0000000000000000 ffff82c48029ff18 0000002000000020 ffff82c48029faf4 > (XEN) ffff8302278ee820 ffff82c48029fa70 000000000000a040 000000000005fb0b > (XEN) ffff82c48029fbec 0000000000000000 ffff8000002fd858 ffff8302278ee820 > (XEN) 0000000000000006 ffff82c4801dbec3 ffff83022795fad0 ffff82c400000001 > (XEN) 0000000000000001 000000005fb0b000 000000000000a040 806000000010b000 > (XEN) 6000000172210267 60000002015bd467 ffff83017e26b000 0000000000000001 > (XEN) ffff8302278ee820 000000000005fb0b ffff82c48029fbec ffff82c48029fbf4 > (XEN) 0000000000000000 ffff82c4801d6666 0000000000000000 0000000000000000 > (XEN) 0000000000001e00 ffff83022795f000 ffff8300d7d10000 000000000005fb0b > (XEN) ffff82c48029ff18 0000000080000b0e ffff8300d7d10000 ffff82c4801fa23f > (XEN) ffff830000000001 ffff83022795f000 0000000000000008 0000000000001e00 > (XEN) 00007d0a00000006 00000000b9fb2000 000000000003fae9 ffff83022795fb40 > (XEN) 000000000017ecb9 00000000000b9fb2 ffff82c4802e9c60 ffff83022795fad0 > (XEN) ffff8300d7d10920 0000000000000060 ffff82c48029ff18 0000000000000002 > (XEN) 0000000000000e78 0000000000000000 00000000d7d10000 0000000000000d90 > (XEN) 0000000000000000 ffff82c4801cdda8 00000004fffe0080 0000000700000000 > (XEN) Xen call trace: > (XEN) [<ffff82c4801e15fd>] p2m_pod_demand_populate+0x87d/0x8a0 > (XEN) [<ffff82c48016a4bd>] get_page+0x2d/0x100 > (XEN) [<ffff82c4801d6666>] __get_gfn_type_access+0x86/0x260 > (XEN) [<ffff82c4801dbec3>] p2m_gfn_to_mfn+0x693/0x810 > (XEN) [<ffff82c4801d6666>] __get_gfn_type_access+0x86/0x260 > (XEN) [<ffff82c4801fa23f>] sh_page_fault__guest_3+0x24f/0x1e40 > (XEN) [<ffff82c4801cdda8>] vmx_update_guest_cr+0x78/0x5d0 > (XEN) [<ffff82c4801ae2da>] hvm_set_cr0+0x2ea/0x480 > (XEN) [<ffff82c4801b2bb4>] hvm_mov_to_cr+0xe4/0x1a0 > (XEN) [<ffff82c4801cfa63>] vmx_vmexit_handler+0xd33/0x1790 > (XEN) [<ffff82c4801cafb5>] vmx_do_resume+0xb5/0x170 > (XEN) [<ffff82c48015968c>] context_switch+0x15c/0xdf0 > (XEN) [<ffff82c480125d7b>] add_entry+0x4b/0xb0 > (XEN) [<ffff82c480125d7b>] add_entry+0x4b/0xb0 > (XEN) [<ffff82c4801bf3c7>] pt_update_irq+0x27/0x200 > (XEN) [<ffff82c480119830>] csched_tick+0x0/0x2e0 > (XEN) [<ffff82c4801bd5a1>] vlapic_has_pending_irq+0x21/0x60 > (XEN) [<ffff82c4801b5fca>] hvm_vcpu_has_pending_irq+0x4a/0x90 > (XEN) [<ffff82c4801c85c4>] vmx_intr_assist+0x54/0x290 > (XEN) [<ffff82c4801d2911>] nvmx_switch_guest+0x51/0x6c0 > (XEN) [<ffff82c4801d4256>] vmx_asm_do_vmentry+0x0/0xea > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Xen BUG at mm-locks.h:118 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > > > Any suggestions ? > > It is very reproducible and it''s on a test machine I can reboot any > time, so if you need more debug info, I can collect it. > I don''t have any different hw to test on unfortunately. > > > Cheers, > > SylvainFrom the stack trace, I assume that the guest is running in shadow mode ? Can you confirm this? ~Andrew> > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Feb-18 11:13 UTC
Re: Xen BUG at mm-locks.h:118 in 4.2.1 - mm locking order violation - Dom0 reboot
On Mon, 2013-02-18 at 10:47 +0000, Sylvain Munaut wrote:> Hi, > > > I''ve just installed a self-built Xen 4.2.1 package on a debian wheezyIs this exactly 4.2.1, some later revision from 4.2-testing or otherwise patches? Can you let us know the comit id.> and when trying to run a HVM VM (that I was previously running with > the official xen 4.0 package on squeeze), it starts fine and I can > even use the VM for a few minutes then suddenly I loose all > communication with VM and the Dom0 and it just reboots ...Please can you share the domain configuration. Are you running PV drivers (esp. ballooning) within it?> I enabled the xen serial console and this is what I got when the crash happens: > > > (XEN) mm locking order violation: 260 > 222260 == pod lock, 222 is the p2m lock. I''ve CCd George and Tim.> (XEN) Xen BUG at mm-locks.h:118 > (XEN) ----[ Xen-4.2.1 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82c4801e15fd>] p2m_pod_demand_populate+0x87d/0x8a0 > (XEN) RFLAGS: 0000000000010296 CONTEXT: hypervisor > (XEN) rax: ffff82c4802e8e20 rbx: ffff8302278ee820 rcx: 0000000000000000 > (XEN) rdx: ffff82c48029ff18 rsi: 000000000000000a rdi: ffff82c480258640 > (XEN) rbp: 0000000000000000 rsp: ffff82c48029f978 r8: 0000000000000004 > (XEN) r9: 0000000000000003 r10: 0000000000000002 r11: ffff82c4802c8c80 > (XEN) r12: 0000000000000000 r13: ffff83022795f000 r14: 000000000005f70a > (XEN) r15: 000000000005fb0a cr0: 0000000080050033 cr4: 00000000000026f0 > (XEN) cr3: 00000002277ac000 cr2: 00000000d8b86058 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff82c48029f978: > (XEN) 000000000017e26b 000000000005fb0b ffff8302278eed08 000000010a040000 > (XEN) ffff82c48029ff18 600000017e26b067 6000000203243267 60000002279be467 > (XEN) 0000000000000100 0000000000000000 ffff8302278ee820 000000000000a040 > (XEN) ffff82c48029faf4 ffff82c48016a4bd 0000000000000000 ffff82c4801d6666 > (XEN) 0000000000000000 ffff82c48029ff18 0000002000000020 ffff82c48029faf4 > (XEN) ffff8302278ee820 ffff82c48029fa70 000000000000a040 000000000005fb0b > (XEN) ffff82c48029fbec 0000000000000000 ffff8000002fd858 ffff8302278ee820 > (XEN) 0000000000000006 ffff82c4801dbec3 ffff83022795fad0 ffff82c400000001 > (XEN) 0000000000000001 000000005fb0b000 000000000000a040 806000000010b000 > (XEN) 6000000172210267 60000002015bd467 ffff83017e26b000 0000000000000001 > (XEN) ffff8302278ee820 000000000005fb0b ffff82c48029fbec ffff82c48029fbf4 > (XEN) 0000000000000000 ffff82c4801d6666 0000000000000000 0000000000000000 > (XEN) 0000000000001e00 ffff83022795f000 ffff8300d7d10000 000000000005fb0b > (XEN) ffff82c48029ff18 0000000080000b0e ffff8300d7d10000 ffff82c4801fa23f > (XEN) ffff830000000001 ffff83022795f000 0000000000000008 0000000000001e00 > (XEN) 00007d0a00000006 00000000b9fb2000 000000000003fae9 ffff83022795fb40 > (XEN) 000000000017ecb9 00000000000b9fb2 ffff82c4802e9c60 ffff83022795fad0 > (XEN) ffff8300d7d10920 0000000000000060 ffff82c48029ff18 0000000000000002 > (XEN) 0000000000000e78 0000000000000000 00000000d7d10000 0000000000000d90 > (XEN) 0000000000000000 ffff82c4801cdda8 00000004fffe0080 0000000700000000 > (XEN) Xen call trace: > (XEN) [<ffff82c4801e15fd>] p2m_pod_demand_populate+0x87d/0x8a0 > (XEN) [<ffff82c48016a4bd>] get_page+0x2d/0x100 > (XEN) [<ffff82c4801d6666>] __get_gfn_type_access+0x86/0x260 > (XEN) [<ffff82c4801dbec3>] p2m_gfn_to_mfn+0x693/0x810 > (XEN) [<ffff82c4801d6666>] __get_gfn_type_access+0x86/0x260 > (XEN) [<ffff82c4801fa23f>] sh_page_fault__guest_3+0x24f/0x1e40 > (XEN) [<ffff82c4801cdda8>] vmx_update_guest_cr+0x78/0x5d0 > (XEN) [<ffff82c4801ae2da>] hvm_set_cr0+0x2ea/0x480 > (XEN) [<ffff82c4801b2bb4>] hvm_mov_to_cr+0xe4/0x1a0 > (XEN) [<ffff82c4801cfa63>] vmx_vmexit_handler+0xd33/0x1790 > (XEN) [<ffff82c4801cafb5>] vmx_do_resume+0xb5/0x170 > (XEN) [<ffff82c48015968c>] context_switch+0x15c/0xdf0 > (XEN) [<ffff82c480125d7b>] add_entry+0x4b/0xb0 > (XEN) [<ffff82c480125d7b>] add_entry+0x4b/0xb0 > (XEN) [<ffff82c4801bf3c7>] pt_update_irq+0x27/0x200 > (XEN) [<ffff82c480119830>] csched_tick+0x0/0x2e0 > (XEN) [<ffff82c4801bd5a1>] vlapic_has_pending_irq+0x21/0x60 > (XEN) [<ffff82c4801b5fca>] hvm_vcpu_has_pending_irq+0x4a/0x90 > (XEN) [<ffff82c4801c85c4>] vmx_intr_assist+0x54/0x290 > (XEN) [<ffff82c4801d2911>] nvmx_switch_guest+0x51/0x6c0 > (XEN) [<ffff82c4801d4256>] vmx_asm_do_vmentry+0x0/0xea > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Xen BUG at mm-locks.h:118 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > > > Any suggestions ? > > It is very reproducible and it''s on a test machine I can reboot any > time, so if you need more debug info, I can collect it. > I don''t have any different hw to test on unfortunately. > > > Cheers, > > Sylvain > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Tim Deegan
2013-Feb-18 11:35 UTC
Re: Xen BUG at mm-locks.h:118 in 4.2.1 - mm locking order violation - Dom0 reboot
Hi, Thanks for the report. At 11:47 +0100 on 18 Feb (1361188042), Sylvain Munaut wrote:> I''ve just installed a self-built Xen 4.2.1 package on a debian wheezy > and when trying to run a HVM VM (that I was previously running with > the official xen 4.0 package on squeeze), it starts fine and I can > even use the VM for a few minutes then suddenly I loose all > communication with VM and the Dom0 and it just reboots ...Did you make any changes to Xen before you built it, or were you just building your own to get 4.2?> (XEN) mm locking order violation: 260 > 222 > (XEN) Xen BUG at mm-locks.h:118Hmm, taking the p2m lock with the pod lock held. :( My guess would be the p2m_lock() in p2m_pod_emergency_sweep(). Do you by any chance have the xen-syms file from when you built Xen? That would let us see exactly what''s happened. In the meantime, perhaps you could try the attached (untested) patch. If my guess is right, it ought to stop the crashes but you might find the VM''s performance suffers. Cheers, Tim.> (XEN) [<ffff82c4801e15fd>] p2m_pod_demand_populate+0x87d/0x8a0 > (XEN) [<ffff82c48016a4bd>] get_page+0x2d/0x100 > (XEN) [<ffff82c4801d6666>] __get_gfn_type_access+0x86/0x260 > (XEN) [<ffff82c4801dbec3>] p2m_gfn_to_mfn+0x693/0x810 > (XEN) [<ffff82c4801d6666>] __get_gfn_type_access+0x86/0x260 > (XEN) [<ffff82c4801fa23f>] sh_page_fault__guest_3+0x24f/0x1e40 > (XEN) [<ffff82c4801cdda8>] vmx_update_guest_cr+0x78/0x5d0 > (XEN) [<ffff82c4801ae2da>] hvm_set_cr0+0x2ea/0x480 > (XEN) [<ffff82c4801b2bb4>] hvm_mov_to_cr+0xe4/0x1a0 > (XEN) [<ffff82c4801cfa63>] vmx_vmexit_handler+0xd33/0x1790 > (XEN) [<ffff82c4801cafb5>] vmx_do_resume+0xb5/0x170 > (XEN) [<ffff82c48015968c>] context_switch+0x15c/0xdf0 > (XEN) [<ffff82c480125d7b>] add_entry+0x4b/0xb0 > (XEN) [<ffff82c480125d7b>] add_entry+0x4b/0xb0 > (XEN) [<ffff82c4801bf3c7>] pt_update_irq+0x27/0x200 > (XEN) [<ffff82c480119830>] csched_tick+0x0/0x2e0 > (XEN) [<ffff82c4801bd5a1>] vlapic_has_pending_irq+0x21/0x60 > (XEN) [<ffff82c4801b5fca>] hvm_vcpu_has_pending_irq+0x4a/0x90 > (XEN) [<ffff82c4801c85c4>] vmx_intr_assist+0x54/0x290 > (XEN) [<ffff82c4801d2911>] nvmx_switch_guest+0x51/0x6c0 > (XEN) [<ffff82c4801d4256>] vmx_asm_do_vmentry+0x0/0xea_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Sylvain Munaut
2013-Feb-18 13:17 UTC
Re: Xen BUG at mm-locks.h:118 in 4.2.1 - mm locking order violation - Dom0 reboot
Hi all, Thanks for the feedback, let me try to answer the various questions.> Did you make any changes to Xen before you built it, or were you just > building your own to get 4.2?It''s based on the official .tar.bz2 from the website, and built using the debian/ from the 4.2.0 debian package. There are some patches applied in the debian build but I don''t see any that patch the actual code, just small adaptation to the build and install system to follow the debian conventions. So it should be functionally equivalent to an official 4.2.1 . I can put the compiled binary online if needed.> Please can you share the domain configuration. Are you running PV > drivers (esp. ballooning) within it?There is no xen driver running in there. Here''s the config which is based on the example hvm config: --------- builder = "hvm" name = "wxp-00" vcpus = 2 memory = 1536 maxmem = 2048 viridian = 1 vif = [ ''type=ioemu,bridge=br0,mac=00:16:3e:35:ad:12'' ] disk = [ ''/dev/xen-disks/wxp-00-test,raw,xvda,w'', ] on_poweroff = ''destroy'' on_reboot = ''restart'' on_crash = ''restart'' vnc=1 vncunused=0 vnclisten = ''0.0.0.0'' vncdisplay=0 vncconsole=1 vncpasswd=''xxx'' --------> Do you by any chance have the xen-syms file from when you built Xen? > That would let us see exactly what''s happened.You can get it at http://ge.tt/7DBEjmY/v/0> In the meantime, perhaps you could try the attached (untested) patch. > If my guess is right, it ought to stop the crashes but you might find > the VM''s performance suffers.I''ll try it and report here.> From the stack trace, I assume that the guest is running in shadow mode > ? Can you confirm this?Sorry, no idea what this means. How can I check / test ? I didn''t configure anything relative to "shadow mode" at least. Cheers, Sylvain
Andres Lagar-Cavilla
2013-Feb-18 14:27 UTC
Re: Xen BUG at mm-locks.h:118 in 4.2.1 - mm locking order violation - Dom0 reboot
>> Hi, >> >> >> I''ve just installed a self-built Xen 4.2.1 package on a debian wheezy > > Is this exactly 4.2.1, some later revision from 4.2-testing or otherwise > patches? Can you let us know the comit id. > >> and when trying to run a HVM VM (that I was previously running with >> the official xen 4.0 package on squeeze), it starts fine and I can >> even use the VM for a few minutes then suddenly I loose all >> communication with VM and the Dom0 and it just reboots ... > > Please can you share the domain configuration. Are you running PV > drivers (esp. ballooning) within it? > >> I enabled the xen serial console and this is what I got when the crash happens: >> >> >> (XEN) mm locking order violation: 260 > 222 > > 260 == pod lock, 222 is the p2m lock. I''ve CCd George and Tim.It''s a bad locking interaction between shadow and PoD, introduced in 4.2. The one-line fix is to turn on locking p2m for shadow, as well. But we need to make sure that doing that doesn''t introduce other regressions in shadow. Andres> >> (XEN) Xen BUG at mm-locks.h:118 >> (XEN) ----[ Xen-4.2.1 x86_64 debug=n Not tainted ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[<ffff82c4801e15fd>] p2m_pod_demand_populate+0x87d/0x8a0 >> (XEN) RFLAGS: 0000000000010296 CONTEXT: hypervisor >> (XEN) rax: ffff82c4802e8e20 rbx: ffff8302278ee820 rcx: 0000000000000000 >> (XEN) rdx: ffff82c48029ff18 rsi: 000000000000000a rdi: ffff82c480258640 >> (XEN) rbp: 0000000000000000 rsp: ffff82c48029f978 r8: 0000000000000004 >> (XEN) r9: 0000000000000003 r10: 0000000000000002 r11: ffff82c4802c8c80 >> (XEN) r12: 0000000000000000 r13: ffff83022795f000 r14: 000000000005f70a >> (XEN) r15: 000000000005fb0a cr0: 0000000080050033 cr4: 00000000000026f0 >> (XEN) cr3: 00000002277ac000 cr2: 00000000d8b86058 >> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 >> (XEN) Xen stack trace from rsp=ffff82c48029f978: >> (XEN) 000000000017e26b 000000000005fb0b ffff8302278eed08 000000010a040000 >> (XEN) ffff82c48029ff18 600000017e26b067 6000000203243267 60000002279be467 >> (XEN) 0000000000000100 0000000000000000 ffff8302278ee820 000000000000a040 >> (XEN) ffff82c48029faf4 ffff82c48016a4bd 0000000000000000 ffff82c4801d6666 >> (XEN) 0000000000000000 ffff82c48029ff18 0000002000000020 ffff82c48029faf4 >> (XEN) ffff8302278ee820 ffff82c48029fa70 000000000000a040 000000000005fb0b >> (XEN) ffff82c48029fbec 0000000000000000 ffff8000002fd858 ffff8302278ee820 >> (XEN) 0000000000000006 ffff82c4801dbec3 ffff83022795fad0 ffff82c400000001 >> (XEN) 0000000000000001 000000005fb0b000 000000000000a040 806000000010b000 >> (XEN) 6000000172210267 60000002015bd467 ffff83017e26b000 0000000000000001 >> (XEN) ffff8302278ee820 000000000005fb0b ffff82c48029fbec ffff82c48029fbf4 >> (XEN) 0000000000000000 ffff82c4801d6666 0000000000000000 0000000000000000 >> (XEN) 0000000000001e00 ffff83022795f000 ffff8300d7d10000 000000000005fb0b >> (XEN) ffff82c48029ff18 0000000080000b0e ffff8300d7d10000 ffff82c4801fa23f >> (XEN) ffff830000000001 ffff83022795f000 0000000000000008 0000000000001e00 >> (XEN) 00007d0a00000006 00000000b9fb2000 000000000003fae9 ffff83022795fb40 >> (XEN) 000000000017ecb9 00000000000b9fb2 ffff82c4802e9c60 ffff83022795fad0 >> (XEN) ffff8300d7d10920 0000000000000060 ffff82c48029ff18 0000000000000002 >> (XEN) 0000000000000e78 0000000000000000 00000000d7d10000 0000000000000d90 >> (XEN) 0000000000000000 ffff82c4801cdda8 00000004fffe0080 0000000700000000 >> (XEN) Xen call trace: >> (XEN) [<ffff82c4801e15fd>] p2m_pod_demand_populate+0x87d/0x8a0 >> (XEN) [<ffff82c48016a4bd>] get_page+0x2d/0x100 >> (XEN) [<ffff82c4801d6666>] __get_gfn_type_access+0x86/0x260 >> (XEN) [<ffff82c4801dbec3>] p2m_gfn_to_mfn+0x693/0x810 >> (XEN) [<ffff82c4801d6666>] __get_gfn_type_access+0x86/0x260 >> (XEN) [<ffff82c4801fa23f>] sh_page_fault__guest_3+0x24f/0x1e40 >> (XEN) [<ffff82c4801cdda8>] vmx_update_guest_cr+0x78/0x5d0 >> (XEN) [<ffff82c4801ae2da>] hvm_set_cr0+0x2ea/0x480 >> (XEN) [<ffff82c4801b2bb4>] hvm_mov_to_cr+0xe4/0x1a0 >> (XEN) [<ffff82c4801cfa63>] vmx_vmexit_handler+0xd33/0x1790 >> (XEN) [<ffff82c4801cafb5>] vmx_do_resume+0xb5/0x170 >> (XEN) [<ffff82c48015968c>] context_switch+0x15c/0xdf0 >> (XEN) [<ffff82c480125d7b>] add_entry+0x4b/0xb0 >> (XEN) [<ffff82c480125d7b>] add_entry+0x4b/0xb0 >> (XEN) [<ffff82c4801bf3c7>] pt_update_irq+0x27/0x200 >> (XEN) [<ffff82c480119830>] csched_tick+0x0/0x2e0 >> (XEN) [<ffff82c4801bd5a1>] vlapic_has_pending_irq+0x21/0x60 >> (XEN) [<ffff82c4801b5fca>] hvm_vcpu_has_pending_irq+0x4a/0x90 >> (XEN) [<ffff82c4801c85c4>] vmx_intr_assist+0x54/0x290 >> (XEN) [<ffff82c4801d2911>] nvmx_switch_guest+0x51/0x6c0 >> (XEN) [<ffff82c4801d4256>] vmx_asm_do_vmentry+0x0/0xea >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Xen BUG at mm-locks.h:118 >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> >> >> Any suggestions ? >> >> It is very reproducible and it''s on a test machine I can reboot any >> time, so if you need more debug info, I can collect it. >> I don''t have any different hw to test on unfortunately. >> >> >> Cheers, >> >> Sylvain >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > > > > > > ------------------------------ > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > > > End of Xen-devel Digest, Vol 96, Issue 248 > ******************************************
Sylvain Munaut
2013-Feb-18 14:47 UTC
Re: Xen BUG at mm-locks.h:118 in 4.2.1 - mm locking order violation - Dom0 reboot
Hi,> In the meantime, perhaps you could try the attached (untested) patch. > If my guess is right, it ought to stop the crashes but you might find > the VM''s performance suffers.The patch seems to have fixed this issue. I did however encounter a : (XEN) p2m_pod_demand_populate: Dom1 out of PoD memory! (tot=392185 ents=131072 dom0) (XEN) domain_crash called from p2m-pod.c:1077 (XEN) Domain 1 reported crashed by domain 0 on cpu#1: The domU then rebooted and it doesn''t seem to happen again, but since it''s related to PoD, it might be related to that same issue ... Cheers, Sylvain
Ian Campbell
2013-Feb-18 15:02 UTC
Re: Xen BUG at mm-locks.h:118 in 4.2.1 - mm locking order violation - Dom0 reboot
On Mon, 2013-02-18 at 13:17 +0000, Sylvain Munaut wrote:> > Please can you share the domain configuration. Are you running PV > > drivers (esp. ballooning) within it? > > There is no xen driver running in there. > > Here''s the config which is based on the example hvm config: > > --------- > builder = "hvm" > name = "wxp-00" > vcpus = 2 > memory = 1536 > maxmem = 2048This is the cause of your second "Dom1 out of PoD memory" bug. If you aren''t running at least a balloon driver inside the guest then this isn''t valid, since you have requested a different initial memory allocation to what you are actually giving the guest and something needs to bridge that gap. Initially this is PoD but eventually a balloon driver must come along, PoD is not intended for use other than during boot until a balloon driver can be started. Ian.
Sylvain Munaut
2013-Feb-18 16:31 UTC
Re: Xen BUG at mm-locks.h:118 in 4.2.1 - mm locking order violation - Dom0 reboot
Hi Ian,>> memory = 1536 >> maxmem = 2048 > > > This is the cause of your second "Dom1 out of PoD memory" bug. > > If you aren''t running at least a balloon driver inside the guest then > this isn''t valid, since you have requested a different initial memory > allocation to what you are actually giving the guest and something needs > to bridge that gap. Initially this is PoD but eventually a balloon > driver must come along, PoD is not intended for use other than during > boot until a balloon driver can be started.Indeed this fixed it. Interestingly, it seems to avoid the first issue as well ... I guess this is why nobody hit that before. Although hard rebooting the dom0 might be a severe punishement for a config mistake :p Cheers & thanks to all. Sylvain
Ian Campbell
2013-Feb-18 16:42 UTC
Re: Xen BUG at mm-locks.h:118 in 4.2.1 - mm locking order violation - Dom0 reboot
On Mon, 2013-02-18 at 16:31 +0000, Sylvain Munaut wrote:> Hi Ian, > > >> memory = 1536 > >> maxmem = 2048 > > > > > > This is the cause of your second "Dom1 out of PoD memory" bug. > > > > If you aren''t running at least a balloon driver inside the guest then > > this isn''t valid, since you have requested a different initial memory > > allocation to what you are actually giving the guest and something needs > > to bridge that gap. Initially this is PoD but eventually a balloon > > driver must come along, PoD is not intended for use other than during > > boot until a balloon driver can be started. > > Indeed this fixed it.Good.> Interestingly, it seems to avoid the first issue as well ...If memory == maxmem then PoD (the crashing subsystem) is never activated so that is to be expected. Ian.
Tim Deegan
2013-Feb-21 15:25 UTC
Re: Xen BUG at mm-locks.h:118 in 4.2.1 - mm locking order violation - Dom0 reboot
Hi, At 15:47 +0100 on 18 Feb (1361202472), Sylvain Munaut wrote:> > In the meantime, perhaps you could try the attached (untested) patch. > > If my guess is right, it ought to stop the crashes but you might find > > the VM''s performance suffers. > > The patch seems to have fixed this issue.Excellent, thanks. I''ve just applied it to xen-unstable. Tim.