Ashish Bijlani
2010-Feb-23 08:57 UTC
[Xen-devel] live migration fails (assert in shadow_hash_delete)
Hi, I''m working on a project that requires live migration of a 64-bit PV VM (on a 64-bit platform). "xm save" and "xm restore" work fine. However, live migration fails with the following err msg: mapping kernel into physical memory about to get started... (XEN) traps.c:2306:d3 Domain attempted WRMSR 000000000000008b from 00000a07:00000000 to 00000000:000000. (XEN) Assertion ''x'' failed at common.c:2139 (XEN) ----[ Xen-4.0.0-rc3-pre x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (XEN) rax: ffff8300040e2770 rbx: ffff830223ce0000 rcx: 0000000000000000 (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: ffff82f60443c8a0 (XEN) rbp: ffff82c4802efb48 rsp: ffff82c4802efb18 r8: ffff82f600000000 (XEN) r9: 0000000000000000 r10: ffff830223ce0000 r11: 00000000000041c5 (XEN) r12: 0000000000221e45 r13: 00000000000000ec r14: ffff82f600000000 (XEN) r15: ffff8300cfaea000 cr0: 0000000080050033 cr4: 00000000000006f0 (XEN) cr3: 0000000210154000 cr2: ffff8801dd5508c8 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff82c4802efb18: (XEN) 3333333333333333 00000000000041c5 ffff82c4802efb32 000000000000000d (XEN) 00000000000041c5 ffff8300cfaea000 ffff82c4802efba8 ffff82c4801e5d2e (XEN) 0000005600ed79a0 0000000800000000 0000000100000000 0000000000221e45 (XEN) 0000000000000008 0000000000000000 ffff82f60443c8a0 ffff82f60404ab00 (XEN) ffff82f600000000 ffff8300cfaea000 ffff82c4802efbd8 ffff82c4801c766d (XEN) ffff82c4802efc18 0000000000000282 0000000000000281 0000000000221e45 (XEN) ffff82c4802efc28 ffff82c4801cb18f 000000000f69d1d0 ffff830223ce0000 (XEN) ffff82c4802efc18 ffff830223ce0000 ffff82c4802eff28 ffff830223ce0e28 (XEN) 0000000000000002 ffff8300040de000 ffff82c4802efc58 ffff82c4801cba8d (XEN) 0000000000000282 ffff82c4802efe58 0000000000010000 0000000000008000 (XEN) ffff82c4802efce8 ffff82c4801bb394 0000000100000000 ffff8302236e8000 (XEN) ffff830223ce0f08 ffff8300040e1000 00000001802efd48 ffff82c48031f640 (XEN) ffff830223ce0000 0000000100000001 ffff82c4802eff28 ffff8300040e0000 (XEN) ffff82c4802efce8 ffff830223ce0000 ffff82c4802efe58 00007fff0f69d1d0 (XEN) ffff82c4802efe48 0000000000000000 ffff82c4802efd08 ffff82c4801bb56a (XEN) fffffffffffffff3 0000000000f71000 ffff82c4802efdc8 ffff82c48014796c (XEN) ffff82c4802efd28 ffff82c48016b0d4 ffff82c4802efd48 ffff82c48011dce7 (XEN) 0000000000000008 ffff82c480163d8c ffff82c4802efd68 ffff82c480118755 (XEN) 0000000000000008 ffff8300cfafa000 ffff82c4802efdc8 0000000000000286 (XEN) ffff82c4802efd98 0000000000000286 ffff82c4802eff28 ffff82c4802eff28 (XEN) Xen call trace: (XEN) [<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c (XEN) [<ffff82c4801e5d2e>] sh_destroy_l4_shadow__guest_4+0xb5/0x371 (XEN) [<ffff82c4801c766d>] sh_destroy_shadow+0x17d/0x1ad (XEN) [<ffff82c4801cb18f>] shadow_blow_tables+0x20b/0x302 (XEN) [<ffff82c4801cba8d>] shadow_clean_dirty_bitmap+0xba/0x10a (XEN) [<ffff82c4801bb394>] paging_log_dirty_op+0x506/0x58c (XEN) [<ffff82c4801bb56a>] paging_domctl+0x150/0x181 (XEN) [<ffff82c48014796c>] arch_do_domctl+0x5c/0x1f64 (XEN) [<ffff82c4801053b3>] do_domctl+0x1169/0x11e6 (XEN) [<ffff82c4801f11bf>] syscall_enter+0xef/0x149 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Assertion ''x'' failed at common.c:2139 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... Any ideas what could be wrong here. Thanks, Ashish _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Feb-23 09:25 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
Hi, At 08:57 +0000 on 23 Feb (1266915448), Ashish Bijlani wrote:> I''m working on a project that requires live migration of a 64-bit PV > VM (on a 64-bit platform). "xm save" and "xm restore" work fine. > However, live migration fails with the following err msg:Oh dear. I take it this is on the sending machine. What version of Xen are you using? Does it happen every time or only intermittently? Does it happen only with one particular guest or all 64bit guests? Have you made any modifications to Xen? It looks like the shadow pagetable code has got very confused - a page is marked as shadowed but isn''t in the hash-table of shadowed pages. Cheers, Tim.> mapping kernel into physical memory > about to get started... > (XEN) traps.c:2306:d3 Domain attempted WRMSR 000000000000008b from > 00000a07:00000000 to 00000000:000000. > (XEN) Assertion ''x'' failed at common.c:2139 > (XEN) ----[ Xen-4.0.0-rc3-pre x86_64 debug=y Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > (XEN) rax: ffff8300040e2770 rbx: ffff830223ce0000 rcx: 0000000000000000 > (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: ffff82f60443c8a0 > (XEN) rbp: ffff82c4802efb48 rsp: ffff82c4802efb18 r8: ffff82f600000000 > (XEN) r9: 0000000000000000 r10: ffff830223ce0000 r11: 00000000000041c5 > (XEN) r12: 0000000000221e45 r13: 00000000000000ec r14: ffff82f600000000 > (XEN) r15: ffff8300cfaea000 cr0: 0000000080050033 cr4: 00000000000006f0 > (XEN) cr3: 0000000210154000 cr2: ffff8801dd5508c8 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff82c4802efb18: > (XEN) 3333333333333333 00000000000041c5 ffff82c4802efb32 000000000000000d > (XEN) 00000000000041c5 ffff8300cfaea000 ffff82c4802efba8 ffff82c4801e5d2e > (XEN) 0000005600ed79a0 0000000800000000 0000000100000000 0000000000221e45 > (XEN) 0000000000000008 0000000000000000 ffff82f60443c8a0 ffff82f60404ab00 > (XEN) ffff82f600000000 ffff8300cfaea000 ffff82c4802efbd8 ffff82c4801c766d > (XEN) ffff82c4802efc18 0000000000000282 0000000000000281 0000000000221e45 > (XEN) ffff82c4802efc28 ffff82c4801cb18f 000000000f69d1d0 ffff830223ce0000 > (XEN) ffff82c4802efc18 ffff830223ce0000 ffff82c4802eff28 ffff830223ce0e28 > (XEN) 0000000000000002 ffff8300040de000 ffff82c4802efc58 ffff82c4801cba8d > (XEN) 0000000000000282 ffff82c4802efe58 0000000000010000 0000000000008000 > (XEN) ffff82c4802efce8 ffff82c4801bb394 0000000100000000 ffff8302236e8000 > (XEN) ffff830223ce0f08 ffff8300040e1000 00000001802efd48 ffff82c48031f640 > (XEN) ffff830223ce0000 0000000100000001 ffff82c4802eff28 ffff8300040e0000 > (XEN) ffff82c4802efce8 ffff830223ce0000 ffff82c4802efe58 00007fff0f69d1d0 > (XEN) ffff82c4802efe48 0000000000000000 ffff82c4802efd08 ffff82c4801bb56a > (XEN) fffffffffffffff3 0000000000f71000 ffff82c4802efdc8 ffff82c48014796c > (XEN) ffff82c4802efd28 ffff82c48016b0d4 ffff82c4802efd48 ffff82c48011dce7 > (XEN) 0000000000000008 ffff82c480163d8c ffff82c4802efd68 ffff82c480118755 > (XEN) 0000000000000008 ffff8300cfafa000 ffff82c4802efdc8 0000000000000286 > (XEN) ffff82c4802efd98 0000000000000286 ffff82c4802eff28 ffff82c4802eff28 > (XEN) Xen call trace: > (XEN) [<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c > (XEN) [<ffff82c4801e5d2e>] sh_destroy_l4_shadow__guest_4+0xb5/0x371 > (XEN) [<ffff82c4801c766d>] sh_destroy_shadow+0x17d/0x1ad > (XEN) [<ffff82c4801cb18f>] shadow_blow_tables+0x20b/0x302 > (XEN) [<ffff82c4801cba8d>] shadow_clean_dirty_bitmap+0xba/0x10a > (XEN) [<ffff82c4801bb394>] paging_log_dirty_op+0x506/0x58c > (XEN) [<ffff82c4801bb56a>] paging_domctl+0x150/0x181 > (XEN) [<ffff82c48014796c>] arch_do_domctl+0x5c/0x1f64 > (XEN) [<ffff82c4801053b3>] do_domctl+0x1169/0x11e6 > (XEN) [<ffff82c4801f11bf>] syscall_enter+0xef/0x149 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Assertion ''x'' failed at common.c:2139 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > > Any ideas what could be wrong here. > > Thanks, > AshishContent-Description: ATT00001.txt> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Devdutt Patnaik
2010-Feb-23 10:19 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
Tim, Yes, this is happening on the sending machine. We just used the xen-unstable version from 2 weeks ago, and haven''t really modified it. We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels. Any suggestions on what might be a better bet in terms of xen, Dom0 and DomU kernel versions. We wish to use 64-bit PV VMs for our experiments. We have only been able to do a successful migration 3 times, out of maybe 30 odd attempts. Thanks, Devdutt. On Tue, Feb 23, 2010 at 1:25 AM, Tim Deegan <Tim.Deegan@citrix.com> wrote:> Hi, > > At 08:57 +0000 on 23 Feb (1266915448), Ashish Bijlani wrote: > > I''m working on a project that requires live migration of a 64-bit PV > > VM (on a 64-bit platform). "xm save" and "xm restore" work fine. > > However, live migration fails with the following err msg: > > Oh dear. I take it this is on the sending machine. What version of Xen > are you using? > > Does it happen every time or only intermittently? > > Does it happen only with one particular guest or all 64bit guests? > > Have you made any modifications to Xen? > > It looks like the shadow pagetable code has got very confused - a page > is marked as shadowed but isn''t in the hash-table of shadowed pages. > > Cheers, > > Tim. > > > mapping kernel into physical memory > > about to get started... > > (XEN) traps.c:2306:d3 Domain attempted WRMSR 000000000000008b from > > 00000a07:00000000 to 00000000:000000. > > (XEN) Assertion ''x'' failed at common.c:2139 > > (XEN) ----[ Xen-4.0.0-rc3-pre x86_64 debug=y Not tainted ]---- > > (XEN) CPU: 0 > > (XEN) RIP: e008:[<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c > > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > > (XEN) rax: ffff8300040e2770 rbx: ffff830223ce0000 rcx: > 0000000000000000 > > (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: > ffff82f60443c8a0 > > (XEN) rbp: ffff82c4802efb48 rsp: ffff82c4802efb18 r8: > ffff82f600000000 > > (XEN) r9: 0000000000000000 r10: ffff830223ce0000 r11: > 00000000000041c5 > > (XEN) r12: 0000000000221e45 r13: 00000000000000ec r14: > ffff82f600000000 > > (XEN) r15: ffff8300cfaea000 cr0: 0000000080050033 cr4: > 00000000000006f0 > > (XEN) cr3: 0000000210154000 cr2: ffff8801dd5508c8 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > > (XEN) Xen stack trace from rsp=ffff82c4802efb18: > > (XEN) 3333333333333333 00000000000041c5 ffff82c4802efb32 > 000000000000000d > > (XEN) 00000000000041c5 ffff8300cfaea000 ffff82c4802efba8 > ffff82c4801e5d2e > > (XEN) 0000005600ed79a0 0000000800000000 0000000100000000 > 0000000000221e45 > > (XEN) 0000000000000008 0000000000000000 ffff82f60443c8a0 > ffff82f60404ab00 > > (XEN) ffff82f600000000 ffff8300cfaea000 ffff82c4802efbd8 > ffff82c4801c766d > > (XEN) ffff82c4802efc18 0000000000000282 0000000000000281 > 0000000000221e45 > > (XEN) ffff82c4802efc28 ffff82c4801cb18f 000000000f69d1d0 > ffff830223ce0000 > > (XEN) ffff82c4802efc18 ffff830223ce0000 ffff82c4802eff28 > ffff830223ce0e28 > > (XEN) 0000000000000002 ffff8300040de000 ffff82c4802efc58 > ffff82c4801cba8d > > (XEN) 0000000000000282 ffff82c4802efe58 0000000000010000 > 0000000000008000 > > (XEN) ffff82c4802efce8 ffff82c4801bb394 0000000100000000 > ffff8302236e8000 > > (XEN) ffff830223ce0f08 ffff8300040e1000 00000001802efd48 > ffff82c48031f640 > > (XEN) ffff830223ce0000 0000000100000001 ffff82c4802eff28 > ffff8300040e0000 > > (XEN) ffff82c4802efce8 ffff830223ce0000 ffff82c4802efe58 > 00007fff0f69d1d0 > > (XEN) ffff82c4802efe48 0000000000000000 ffff82c4802efd08 > ffff82c4801bb56a > > (XEN) fffffffffffffff3 0000000000f71000 ffff82c4802efdc8 > ffff82c48014796c > > (XEN) ffff82c4802efd28 ffff82c48016b0d4 ffff82c4802efd48 > ffff82c48011dce7 > > (XEN) 0000000000000008 ffff82c480163d8c ffff82c4802efd68 > ffff82c480118755 > > (XEN) 0000000000000008 ffff8300cfafa000 ffff82c4802efdc8 > 0000000000000286 > > (XEN) ffff82c4802efd98 0000000000000286 ffff82c4802eff28 > ffff82c4802eff28 > > (XEN) Xen call trace: > > (XEN) [<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c > > (XEN) [<ffff82c4801e5d2e>] sh_destroy_l4_shadow__guest_4+0xb5/0x371 > > (XEN) [<ffff82c4801c766d>] sh_destroy_shadow+0x17d/0x1ad > > (XEN) [<ffff82c4801cb18f>] shadow_blow_tables+0x20b/0x302 > > (XEN) [<ffff82c4801cba8d>] shadow_clean_dirty_bitmap+0xba/0x10a > > (XEN) [<ffff82c4801bb394>] paging_log_dirty_op+0x506/0x58c > > (XEN) [<ffff82c4801bb56a>] paging_domctl+0x150/0x181 > > (XEN) [<ffff82c48014796c>] arch_do_domctl+0x5c/0x1f64 > > (XEN) [<ffff82c4801053b3>] do_domctl+0x1169/0x11e6 > > (XEN) [<ffff82c4801f11bf>] syscall_enter+0xef/0x149 > > (XEN) > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 0: > > (XEN) Assertion ''x'' failed at common.c:2139 > > (XEN) **************************************** > > (XEN) > > (XEN) Reboot in five seconds... > > > > Any ideas what could be wrong here. > > > > Thanks, > > Ashish > > > > > Content-Description: ATT00001.txt > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, XenServer Engineering > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Devdutt Patnaik
2010-Feb-23 10:25 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
Forgot to specifically mention thats its Xen 4.0. Its the rc3 version. Thanks, Devdutt. On Tue, Feb 23, 2010 at 2:19 AM, Devdutt Patnaik <xendevid@gmail.com> wrote:> Tim, > > Yes, this is happening on the sending machine. > > We just used the xen-unstable version from 2 weeks ago, and haven''t really > modified it. > We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels. > > Any suggestions on what might be a better bet in terms of xen, Dom0 and > DomU kernel versions. > We wish to use 64-bit PV VMs for our experiments. > > We have only been able to do a successful migration 3 times, out of maybe > 30 odd attempts. > > Thanks, > Devdutt. > > > On Tue, Feb 23, 2010 at 1:25 AM, Tim Deegan <Tim.Deegan@citrix.com> wrote: > >> Hi, >> >> At 08:57 +0000 on 23 Feb (1266915448), Ashish Bijlani wrote: >> > I''m working on a project that requires live migration of a 64-bit PV >> > VM (on a 64-bit platform). "xm save" and "xm restore" work fine. >> > However, live migration fails with the following err msg: >> >> Oh dear. I take it this is on the sending machine. What version of Xen >> are you using? >> >> Does it happen every time or only intermittently? >> >> Does it happen only with one particular guest or all 64bit guests? >> >> Have you made any modifications to Xen? >> >> It looks like the shadow pagetable code has got very confused - a page >> is marked as shadowed but isn''t in the hash-table of shadowed pages. >> >> Cheers, >> >> Tim. >> >> > mapping kernel into physical memory >> > about to get started... >> > (XEN) traps.c:2306:d3 Domain attempted WRMSR 000000000000008b from >> > 00000a07:00000000 to 00000000:000000. >> > (XEN) Assertion ''x'' failed at common.c:2139 >> > (XEN) ----[ Xen-4.0.0-rc3-pre x86_64 debug=y Not tainted ]---- >> > (XEN) CPU: 0 >> > (XEN) RIP: e008:[<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c >> > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor >> > (XEN) rax: ffff8300040e2770 rbx: ffff830223ce0000 rcx: >> 0000000000000000 >> > (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: >> ffff82f60443c8a0 >> > (XEN) rbp: ffff82c4802efb48 rsp: ffff82c4802efb18 r8: >> ffff82f600000000 >> > (XEN) r9: 0000000000000000 r10: ffff830223ce0000 r11: >> 00000000000041c5 >> > (XEN) r12: 0000000000221e45 r13: 00000000000000ec r14: >> ffff82f600000000 >> > (XEN) r15: ffff8300cfaea000 cr0: 0000000080050033 cr4: >> 00000000000006f0 >> > (XEN) cr3: 0000000210154000 cr2: ffff8801dd5508c8 >> > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >> > (XEN) Xen stack trace from rsp=ffff82c4802efb18: >> > (XEN) 3333333333333333 00000000000041c5 ffff82c4802efb32 >> 000000000000000d >> > (XEN) 00000000000041c5 ffff8300cfaea000 ffff82c4802efba8 >> ffff82c4801e5d2e >> > (XEN) 0000005600ed79a0 0000000800000000 0000000100000000 >> 0000000000221e45 >> > (XEN) 0000000000000008 0000000000000000 ffff82f60443c8a0 >> ffff82f60404ab00 >> > (XEN) ffff82f600000000 ffff8300cfaea000 ffff82c4802efbd8 >> ffff82c4801c766d >> > (XEN) ffff82c4802efc18 0000000000000282 0000000000000281 >> 0000000000221e45 >> > (XEN) ffff82c4802efc28 ffff82c4801cb18f 000000000f69d1d0 >> ffff830223ce0000 >> > (XEN) ffff82c4802efc18 ffff830223ce0000 ffff82c4802eff28 >> ffff830223ce0e28 >> > (XEN) 0000000000000002 ffff8300040de000 ffff82c4802efc58 >> ffff82c4801cba8d >> > (XEN) 0000000000000282 ffff82c4802efe58 0000000000010000 >> 0000000000008000 >> > (XEN) ffff82c4802efce8 ffff82c4801bb394 0000000100000000 >> ffff8302236e8000 >> > (XEN) ffff830223ce0f08 ffff8300040e1000 00000001802efd48 >> ffff82c48031f640 >> > (XEN) ffff830223ce0000 0000000100000001 ffff82c4802eff28 >> ffff8300040e0000 >> > (XEN) ffff82c4802efce8 ffff830223ce0000 ffff82c4802efe58 >> 00007fff0f69d1d0 >> > (XEN) ffff82c4802efe48 0000000000000000 ffff82c4802efd08 >> ffff82c4801bb56a >> > (XEN) fffffffffffffff3 0000000000f71000 ffff82c4802efdc8 >> ffff82c48014796c >> > (XEN) ffff82c4802efd28 ffff82c48016b0d4 ffff82c4802efd48 >> ffff82c48011dce7 >> > (XEN) 0000000000000008 ffff82c480163d8c ffff82c4802efd68 >> ffff82c480118755 >> > (XEN) 0000000000000008 ffff8300cfafa000 ffff82c4802efdc8 >> 0000000000000286 >> > (XEN) ffff82c4802efd98 0000000000000286 ffff82c4802eff28 >> ffff82c4802eff28 >> > (XEN) Xen call trace: >> > (XEN) [<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c >> > (XEN) [<ffff82c4801e5d2e>] sh_destroy_l4_shadow__guest_4+0xb5/0x371 >> > (XEN) [<ffff82c4801c766d>] sh_destroy_shadow+0x17d/0x1ad >> > (XEN) [<ffff82c4801cb18f>] shadow_blow_tables+0x20b/0x302 >> > (XEN) [<ffff82c4801cba8d>] shadow_clean_dirty_bitmap+0xba/0x10a >> > (XEN) [<ffff82c4801bb394>] paging_log_dirty_op+0x506/0x58c >> > (XEN) [<ffff82c4801bb56a>] paging_domctl+0x150/0x181 >> > (XEN) [<ffff82c48014796c>] arch_do_domctl+0x5c/0x1f64 >> > (XEN) [<ffff82c4801053b3>] do_domctl+0x1169/0x11e6 >> > (XEN) [<ffff82c4801f11bf>] syscall_enter+0xef/0x149 >> > (XEN) >> > (XEN) >> > (XEN) **************************************** >> > (XEN) Panic on CPU 0: >> > (XEN) Assertion ''x'' failed at common.c:2139 >> > (XEN) **************************************** >> > (XEN) >> > (XEN) Reboot in five seconds... >> > >> > Any ideas what could be wrong here. >> > >> > Thanks, >> > Ashish >> >> >> >> >> Content-Description: ATT00001.txt >> > _______________________________________________ >> > Xen-devel mailing list >> > Xen-devel@lists.xensource.com >> > http://lists.xensource.com/xen-devel >> >> >> -- >> Tim Deegan <Tim.Deegan@citrix.com> >> Principal Software Engineer, XenServer Engineering >> Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Feb-23 10:46 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote:> We just used the xen-unstable version from 2 weeks ago, and haven''t really modified it. > We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels.OK. This really needs to be fixed to the 4.0 release. Keir, have we had any other testing on 64-bit PV live migrations? By "haven''t really modified it" do you mean you have modified it or not?> Any suggestions on what might be a better bet in terms of xen, Dom0 and DomU kernel versions. > We wish to use 64-bit PV VMs for our experiments.Xen 3.4.x should be stabler if you need to carry on immediately. Cheers, Tim.> We have only been able to do a successful migration 3 times, out of maybe 30 odd attempts. > > Thanks, > Devdutt. > > On Tue, Feb 23, 2010 at 1:25 AM, Tim Deegan <Tim.Deegan@citrix.com<mailto:Tim.Deegan@citrix.com>> wrote: > Hi, > > At 08:57 +0000 on 23 Feb (1266915448), Ashish Bijlani wrote: > > I''m working on a project that requires live migration of a 64-bit PV > > VM (on a 64-bit platform). "xm save" and "xm restore" work fine. > > However, live migration fails with the following err msg: > > Oh dear. I take it this is on the sending machine. What version of Xen > are you using? > > Does it happen every time or only intermittently? > > Does it happen only with one particular guest or all 64bit guests? > > Have you made any modifications to Xen? > > It looks like the shadow pagetable code has got very confused - a page > is marked as shadowed but isn''t in the hash-table of shadowed pages. > > Cheers, > > Tim. > > > mapping kernel into physical memory > > about to get started... > > (XEN) traps.c:2306:d3 Domain attempted WRMSR 000000000000008b from > > 00000a07:00000000 to 00000000:000000. > > (XEN) Assertion ''x'' failed at common.c:2139 > > (XEN) ----[ Xen-4.0.0-rc3-pre x86_64 debug=y Not tainted ]---- > > (XEN) CPU: 0 > > (XEN) RIP: e008:[<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c > > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > > (XEN) rax: ffff8300040e2770 rbx: ffff830223ce0000 rcx: 0000000000000000 > > (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: ffff82f60443c8a0 > > (XEN) rbp: ffff82c4802efb48 rsp: ffff82c4802efb18 r8: ffff82f600000000 > > (XEN) r9: 0000000000000000 r10: ffff830223ce0000 r11: 00000000000041c5 > > (XEN) r12: 0000000000221e45 r13: 00000000000000ec r14: ffff82f600000000 > > (XEN) r15: ffff8300cfaea000 cr0: 0000000080050033 cr4: 00000000000006f0 > > (XEN) cr3: 0000000210154000 cr2: ffff8801dd5508c8 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > > (XEN) Xen stack trace from rsp=ffff82c4802efb18: > > (XEN) 3333333333333333 00000000000041c5 ffff82c4802efb32 000000000000000d > > (XEN) 00000000000041c5 ffff8300cfaea000 ffff82c4802efba8 ffff82c4801e5d2e > > (XEN) 0000005600ed79a0 0000000800000000 0000000100000000 0000000000221e45 > > (XEN) 0000000000000008 0000000000000000 ffff82f60443c8a0 ffff82f60404ab00 > > (XEN) ffff82f600000000 ffff8300cfaea000 ffff82c4802efbd8 ffff82c4801c766d > > (XEN) ffff82c4802efc18 0000000000000282 0000000000000281 0000000000221e45 > > (XEN) ffff82c4802efc28 ffff82c4801cb18f 000000000f69d1d0 ffff830223ce0000 > > (XEN) ffff82c4802efc18 ffff830223ce0000 ffff82c4802eff28 ffff830223ce0e28 > > (XEN) 0000000000000002 ffff8300040de000 ffff82c4802efc58 ffff82c4801cba8d > > (XEN) 0000000000000282 ffff82c4802efe58 0000000000010000 0000000000008000 > > (XEN) ffff82c4802efce8 ffff82c4801bb394 0000000100000000 ffff8302236e8000 > > (XEN) ffff830223ce0f08 ffff8300040e1000 00000001802efd48 ffff82c48031f640 > > (XEN) ffff830223ce0000 0000000100000001 ffff82c4802eff28 ffff8300040e0000 > > (XEN) ffff82c4802efce8 ffff830223ce0000 ffff82c4802efe58 00007fff0f69d1d0 > > (XEN) ffff82c4802efe48 0000000000000000 ffff82c4802efd08 ffff82c4801bb56a > > (XEN) fffffffffffffff3 0000000000f71000 ffff82c4802efdc8 ffff82c48014796c > > (XEN) ffff82c4802efd28 ffff82c48016b0d4 ffff82c4802efd48 ffff82c48011dce7 > > (XEN) 0000000000000008 ffff82c480163d8c ffff82c4802efd68 ffff82c480118755 > > (XEN) 0000000000000008 ffff8300cfafa000 ffff82c4802efdc8 0000000000000286 > > (XEN) ffff82c4802efd98 0000000000000286 ffff82c4802eff28 ffff82c4802eff28 > > (XEN) Xen call trace: > > (XEN) [<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c > > (XEN) [<ffff82c4801e5d2e>] sh_destroy_l4_shadow__guest_4+0xb5/0x371 > > (XEN) [<ffff82c4801c766d>] sh_destroy_shadow+0x17d/0x1ad > > (XEN) [<ffff82c4801cb18f>] shadow_blow_tables+0x20b/0x302 > > (XEN) [<ffff82c4801cba8d>] shadow_clean_dirty_bitmap+0xba/0x10a > > (XEN) [<ffff82c4801bb394>] paging_log_dirty_op+0x506/0x58c > > (XEN) [<ffff82c4801bb56a>] paging_domctl+0x150/0x181 > > (XEN) [<ffff82c48014796c>] arch_do_domctl+0x5c/0x1f64 > > (XEN) [<ffff82c4801053b3>] do_domctl+0x1169/0x11e6 > > (XEN) [<ffff82c4801f11bf>] syscall_enter+0xef/0x149 > > (XEN) > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 0: > > (XEN) Assertion ''x'' failed at common.c:2139 > > (XEN) **************************************** > > (XEN) > > (XEN) Reboot in five seconds... > > > > Any ideas what could be wrong here. > > > > Thanks, > > Ashish > > > > > Content-Description: ATT00001.txt > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com<mailto:Xen-devel@lists.xensource.com> > > http://lists.xensource.com/xen-devel > > > -- > Tim Deegan <Tim.Deegan@citrix.com<mailto:Tim.Deegan@citrix.com>> > Principal Software Engineer, XenServer Engineering > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com<mailto:Xen-devel@lists.xensource.com> > http://lists.xensource.com/xen-devel >-- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Devdutt Patnaik
2010-Feb-23 10:51 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
Tim, Its just the stock xen-unstable (Xen4.0-rc3 ) unmodified. Has this feature been evaluated/tested on the latest xen-unstable ? Alright, we will give Xen 3.4.x a shot. Thanks, Devdutt. On Tue, Feb 23, 2010 at 2:46 AM, Tim Deegan <Tim.Deegan@citrix.com> wrote:> At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote: > > We just used the xen-unstable version from 2 weeks ago, and haven''t > really modified it. > > We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels. > > OK. This really needs to be fixed to the 4.0 release. Keir, have we > had any other testing on 64-bit PV live migrations? > > By "haven''t really modified it" do you mean you have modified it or not? > > > Any suggestions on what might be a better bet in terms of xen, Dom0 and > DomU kernel versions. > > We wish to use 64-bit PV VMs for our experiments. > > Xen 3.4.x should be stabler if you need to carry on immediately. > > Cheers, > > Tim. > > > We have only been able to do a successful migration 3 times, out of maybe > 30 odd attempts. > > > > Thanks, > > Devdutt. > > > > On Tue, Feb 23, 2010 at 1:25 AM, Tim Deegan <Tim.Deegan@citrix.com > <mailto:Tim.Deegan@citrix.com>> wrote: > > Hi, > > > > At 08:57 +0000 on 23 Feb (1266915448), Ashish Bijlani wrote: > > > I''m working on a project that requires live migration of a 64-bit PV > > > VM (on a 64-bit platform). "xm save" and "xm restore" work fine. > > > However, live migration fails with the following err msg: > > > > Oh dear. I take it this is on the sending machine. What version of Xen > > are you using? > > > > Does it happen every time or only intermittently? > > > > Does it happen only with one particular guest or all 64bit guests? > > > > Have you made any modifications to Xen? > > > > It looks like the shadow pagetable code has got very confused - a page > > is marked as shadowed but isn''t in the hash-table of shadowed pages. > > > > Cheers, > > > > Tim. > > > > > mapping kernel into physical memory > > > about to get started... > > > (XEN) traps.c:2306:d3 Domain attempted WRMSR 000000000000008b from > > > 00000a07:00000000 to 00000000:000000. > > > (XEN) Assertion ''x'' failed at common.c:2139 > > > (XEN) ----[ Xen-4.0.0-rc3-pre x86_64 debug=y Not tainted ]---- > > > (XEN) CPU: 0 > > > (XEN) RIP: e008:[<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c > > > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > > > (XEN) rax: ffff8300040e2770 rbx: ffff830223ce0000 rcx: > 0000000000000000 > > > (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: > ffff82f60443c8a0 > > > (XEN) rbp: ffff82c4802efb48 rsp: ffff82c4802efb18 r8: > ffff82f600000000 > > > (XEN) r9: 0000000000000000 r10: ffff830223ce0000 r11: > 00000000000041c5 > > > (XEN) r12: 0000000000221e45 r13: 00000000000000ec r14: > ffff82f600000000 > > > (XEN) r15: ffff8300cfaea000 cr0: 0000000080050033 cr4: > 00000000000006f0 > > > (XEN) cr3: 0000000210154000 cr2: ffff8801dd5508c8 > > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > > > (XEN) Xen stack trace from rsp=ffff82c4802efb18: > > > (XEN) 3333333333333333 00000000000041c5 ffff82c4802efb32 > 000000000000000d > > > (XEN) 00000000000041c5 ffff8300cfaea000 ffff82c4802efba8 > ffff82c4801e5d2e > > > (XEN) 0000005600ed79a0 0000000800000000 0000000100000000 > 0000000000221e45 > > > (XEN) 0000000000000008 0000000000000000 ffff82f60443c8a0 > ffff82f60404ab00 > > > (XEN) ffff82f600000000 ffff8300cfaea000 ffff82c4802efbd8 > ffff82c4801c766d > > > (XEN) ffff82c4802efc18 0000000000000282 0000000000000281 > 0000000000221e45 > > > (XEN) ffff82c4802efc28 ffff82c4801cb18f 000000000f69d1d0 > ffff830223ce0000 > > > (XEN) ffff82c4802efc18 ffff830223ce0000 ffff82c4802eff28 > ffff830223ce0e28 > > > (XEN) 0000000000000002 ffff8300040de000 ffff82c4802efc58 > ffff82c4801cba8d > > > (XEN) 0000000000000282 ffff82c4802efe58 0000000000010000 > 0000000000008000 > > > (XEN) ffff82c4802efce8 ffff82c4801bb394 0000000100000000 > ffff8302236e8000 > > > (XEN) ffff830223ce0f08 ffff8300040e1000 00000001802efd48 > ffff82c48031f640 > > > (XEN) ffff830223ce0000 0000000100000001 ffff82c4802eff28 > ffff8300040e0000 > > > (XEN) ffff82c4802efce8 ffff830223ce0000 ffff82c4802efe58 > 00007fff0f69d1d0 > > > (XEN) ffff82c4802efe48 0000000000000000 ffff82c4802efd08 > ffff82c4801bb56a > > > (XEN) fffffffffffffff3 0000000000f71000 ffff82c4802efdc8 > ffff82c48014796c > > > (XEN) ffff82c4802efd28 ffff82c48016b0d4 ffff82c4802efd48 > ffff82c48011dce7 > > > (XEN) 0000000000000008 ffff82c480163d8c ffff82c4802efd68 > ffff82c480118755 > > > (XEN) 0000000000000008 ffff8300cfafa000 ffff82c4802efdc8 > 0000000000000286 > > > (XEN) ffff82c4802efd98 0000000000000286 ffff82c4802eff28 > ffff82c4802eff28 > > > (XEN) Xen call trace: > > > (XEN) [<ffff82c4801c8a08>] shadow_hash_delete+0x12e/0x18c > > > (XEN) [<ffff82c4801e5d2e>] sh_destroy_l4_shadow__guest_4+0xb5/0x371 > > > (XEN) [<ffff82c4801c766d>] sh_destroy_shadow+0x17d/0x1ad > > > (XEN) [<ffff82c4801cb18f>] shadow_blow_tables+0x20b/0x302 > > > (XEN) [<ffff82c4801cba8d>] shadow_clean_dirty_bitmap+0xba/0x10a > > > (XEN) [<ffff82c4801bb394>] paging_log_dirty_op+0x506/0x58c > > > (XEN) [<ffff82c4801bb56a>] paging_domctl+0x150/0x181 > > > (XEN) [<ffff82c48014796c>] arch_do_domctl+0x5c/0x1f64 > > > (XEN) [<ffff82c4801053b3>] do_domctl+0x1169/0x11e6 > > > (XEN) [<ffff82c4801f11bf>] syscall_enter+0xef/0x149 > > > (XEN) > > > (XEN) > > > (XEN) **************************************** > > > (XEN) Panic on CPU 0: > > > (XEN) Assertion ''x'' failed at common.c:2139 > > > (XEN) **************************************** > > > (XEN) > > > (XEN) Reboot in five seconds... > > > > > > Any ideas what could be wrong here. > > > > > > Thanks, > > > Ashish > > > > > > > > > > Content-Description: ATT00001.txt > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xensource.com<mailto:Xen-devel@lists.xensource.com> > > > http://lists.xensource.com/xen-devel > > > > > > -- > > Tim Deegan <Tim.Deegan@citrix.com<mailto:Tim.Deegan@citrix.com>> > > Principal Software Engineer, XenServer Engineering > > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com<mailto:Xen-devel@lists.xensource.com> > > http://lists.xensource.com/xen-devel > > > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, XenServer Engineering > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Feb-23 10:54 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
>>> Devdutt Patnaik <xendevid@gmail.com> 23.02.10 11:19 >>> >We have only been able to do a successful migration 3 times, out of maybe 30 >odd attempts.And is it always a very similar (or identical) register/stack dump you get? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Feb-23 10:59 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
On 23/02/2010 10:46, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:> At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote: >> We just used the xen-unstable version from 2 weeks ago, and haven''t really >> modified it. >> We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels. > > OK. This really needs to be fixed to the 4.0 release. Keir, have we > had any other testing on 64-bit PV live migrations?Localhost migrations were just added to the automated tests. But I think maybe they are trivially failing due to trying to do them via the ''xl'' interface, which doesn''t support it(!). Ian? In short, there''s probably been little or no testing of live migration in the recent past, as I don''t think Intel tests it either. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Devdutt Patnaik
2010-Feb-23 11:05 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
We are trying remote migrations and use the "xm migrate --live" command. -Devdutt. On Tue, Feb 23, 2010 at 2:59 AM, Keir Fraser <keir.fraser@eu.citrix.com>wrote:> On 23/02/2010 10:46, "Tim Deegan" <Tim.Deegan@citrix.com> wrote: > > > At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote: > >> We just used the xen-unstable version from 2 weeks ago, and haven''t > really > >> modified it. > >> We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU > kernels. > > > > OK. This really needs to be fixed to the 4.0 release. Keir, have we > > had any other testing on 64-bit PV live migrations? > > Localhost migrations were just added to the automated tests. But I think > maybe they are trivially failing due to trying to do them via the ''xl'' > interface, which doesn''t support it(!). Ian? > > In short, there''s probably been little or no testing of live migration in > the recent past, as I don''t think Intel tests it either. > > -- Keir > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Feb-23 11:10 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
On 23/02/2010 10:59, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> Localhost migrations were just added to the automated tests. But I think > maybe they are trivially failing due to trying to do them via the ''xl'' > interface, which doesn''t support it(!). Ian? > > In short, there''s probably been little or no testing of live migration in > the recent past, as I don''t think Intel tests it either.A quick manual test indicates it''s very easy to get Xen to blow up. I got the following on my first localhost live migration attempt, which is a different looking crash in the shadow code. This is with 2.6.18 dom0 and domU by the way, so it''s not pv_ops tickling the hypervisor in an unexpected way... (XEN) sh error: sh_page_fault__guest_4(): Recursive shadow fault: lock was taken by sh_page_fault__guest_4 (XEN) ----[ Xen-4.0.0-rc4 x86_64 debug=y Not tainted ]---- (XEN) CPU: 3 (XEN) RIP: e008:[<ffff82c4801c7984>] shadow_hash_lookup+0x11f/0x268 (XEN) RFLAGS: 0000000000010206 CONTEXT: hypervisor (XEN) rax: 00000000c0000000 rbx: 0000000000085111 rcx: 0000000000000000 (XEN) rdx: 000000007339c000 rsi: 0000000000000000 rdi: ffff82f600000000 (XEN) rbp: ffff8300bfcdfc88 rsp: ffff8300bfcdfc18 r8: ffffffffffffffff (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000001 (XEN) r12: 0000000000000008 r13: ffff8300bfce0000 r14: 0000000000000000 (XEN) r15: 00000000c0000000 cr0: 000000008005003b cr4: 00000000000026f4 (XEN) cr3: 0000000082b46000 cr2: 00000000c0000010 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff8300bfcdfc18: (XEN) ffff8300bc42e000 ffff8300bfce0000 ffff8300bfcdfc88 ffff82c4801e56dd (XEN) ffff8300bfcdfc88 0000000000000000 0000000082b44067 0000000000085111 (XEN) ffff8300bfcdff28 ffff8300bfce0000 ffff8300bfcdff28 ffff8300bc42e000 (XEN) 0000000000082b45 0000000000000001 ffff8300bfcdfed8 ffff82c4801e7d7a (XEN) ffff82c48016ae3e 0000000000000260 00000000000001f0 0000000000000d20 (XEN) 0000000000083037 ffff8300bfce0218 ffff8300bfcdff28 ffff8300bfcdff28 (XEN) 0000000000083037 ffff8300bfcdff28 ffff8300bfcdff28 0000000000083037 (XEN) 00000000000000d8 ffff82c480265ce0 ffff8300bfcdff28 00000002ae907c4c (XEN) 00000000bc42e000 ffff81c0e0655d20 ffff8300bfce0e28 0000000000082b44 (XEN) ffff81c0caba41f0 0000000000083037 00002ae907c4c0ff 00000002bfce0000 (XEN) 0000000082b44067 ffff8300bfcdfd78 ffff82c48011e433 ffff8300bc42e000 (XEN) ffff8300bfcdfe18 00000001801ca140 ffff8300bfcdfdb8 ffff8300bfcdfde0 (XEN) ffff8300bfcdff28 ffff8300bfcdff28 ffff8300bfcdff28 ffff8300bfcdff28 (XEN) ffff8300bfcdfe18 00000001801e021e ffff8300bfcdfe18 0000000100000100 (XEN) ffffffff8020d84d ffff8300bc42e000 ffff8300bfce0000 ffff8300bc42e000 (XEN) ffff8300bc42fa38 0000000000082b67 0000000000000001 ffff82f601056ce0 (XEN) ffff8300bfcdfe68 ffff82c4801e9ae5 ffff8300bfce0000 ffff82f600000001 (XEN) ffff8300bfcdff08 ffff8300bc42e000 ffff8300bc42e000 0000000000583440 (XEN) 00002ae907c4c0ff 0000000084a81067 0000000084cfa067 0000000085111067 (XEN) 0000000083037125 000000000008550a 0000000000084a81 0000000000084cfa (XEN) Xen call trace: (XEN) [<ffff82c4801c7984>] shadow_hash_lookup+0x11f/0x268 (XEN) [<ffff82c4801e7d7a>] sh_page_fault__guest_4+0xf4f/0x1fee (XEN) [<ffff82c48017735e>] do_page_fault+0x3b2/0x4f0 (XEN) (XEN) Pagetable walk from 00000000c0000010: (XEN) L4[0x000] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 3: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: 00000000c0000010 (XEN) **************************************** _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Feb-23 17:05 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
Keir Fraser writes ("Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)"):> Localhost migrations were just added to the automated tests. But I think > maybe they are trivially failing due to trying to do them via the ''xl'' > interface, which doesn''t support it(!). Ian?Localhost migration does work in most combinations in our tests. It was only recently added and there are a few teething troubles with it so I don''t have a full slate of results. It doesn''t work at all with libxl because it''s not implemented. Keir:> A quick manual test indicates it''s very easy to get Xen to blow up. I got > the following on my first localhost live migration attempt, which is a > different looking crash in the shadow code. This is with 2.6.18 dom0 and > domU by the way, so it''s not pv_ops tickling the hypervisor in an unexpected > way...2.6.18 doesn''t boot on my test hardware so I''m just building it, not running it. So I haven''t reproduced your test, which explains the different results. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ashish Bijlani
2010-Feb-24 09:31 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
xen barfs while live migrating a 32-bit VM (on 64-bit platform): (XEN) Assertion ''__mfn_valid(mfn_x(smfn))'' failed at multi.c:2561 (XEN) ----[ Xen-4.0.0-rc4 x86_64 debug=y Not tainted ]---- (XEN) CPU: 2 (XEN) RIP: e008:[<ffff82c4801e0639>] sh_map_and_validate_gl4e__guest_4+0x6d/0x1d4 (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff830213187000 rcx: 00000000000000d3 (XEN) rdx: 0000000049ba6ceb rsi: 00000000000000d3 rdi: ffffffffffffffff (XEN) rbp: ffff83022ff2fb68 rsp: ffff83022ff2faf8 r8: 0000000000213187 (XEN) r9: 007fffffffffffff r10: ffff82c480207e90 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000213187 r14: 0000000000000008 (XEN) r15: 0000000000000008 cr0: 000000008005003b cr4: 00000000000006f4 (XEN) cr3: 000000020ee46000 cr2: 00000000c1829248 (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff83022ff2faf8: (XEN) ffff83022ff2fb68 ffff82c4801bbe32 ffff830004060000 ffffffffffffffff (XEN) ffff83022ff2ff28 ffff83022ff2ff28 ffff8301f5330000 0000000000000000 (XEN) ffff83022ff2fb78 ffff82f6042630e0 0000000000000000 0000000000213187 (XEN) ffff830004060000 0000000000000008 ffff83022ff2fbb8 ffff82c4801c785b (XEN) ffff83022ff2fbc8 ffff830213187000 3000000000000000 ffff830004060000 (XEN) 00000001dc092027 ffff83022ff2fc60 0000000000000000 ffff830213187000 (XEN) ffff83022ff2fc08 ffff82c4801c79a8 0000000000213187 00000001dbaa5027 (XEN) ffff830004060000 00000001dc092027 ffff830004060000 00000001dc092027 (XEN) 0000000000000000 00000001dbaa5027 ffff83022ff2fc98 ffff82c480163091 (XEN) ffff83022ff2fc88 ffff82c4801e1180 ffff830100000000 ffff83022ff2fc60 (XEN) 0000000000213187 00000001dbaa5027 00000001dbaa5027 ffff830213187000 (XEN) ffff83022ff2ff28 00000001dc092027 ffff83022ff2ff28 ffff830004060000 (XEN) ffff8301f5330000 00000000001dbaa5 0000000000000005 ffff83022ff2ff28 (XEN) ffff83022ff2fcc8 ffff82c480163242 ffff8301f5330000 0000000000000000 (XEN) ffff83022ff24000 0000000000000005 ffff83022ff2fdc8 ffff82c480163ba7 (XEN) ffff8301f5330018 00007ff0d8c3c148 0000000000000000 ffff82c480265db0 (XEN) ffff82c480265db8 ffff83022ff2ff28 ffff83022ff2ff28 ffff8301f5330218 (XEN) 000000200000007b ffff81800060c148 ffff830004060000 ffff8301f5330000 (XEN) ffff818000000000 00000001001d8462 0000000000000000 00000006cfd24000 (XEN) 80000001d9582021 ffff830000000001 ffff83022ff2fd78 0000000004060060 (XEN) Xen call trace: (XEN) [<ffff82c4801e0639>] sh_map_and_validate_gl4e__guest_4+0x6d/0x1d4 (XEN) [<ffff82c4801c785b>] sh_validate_guest_entry+0x17e/0x1c6 (XEN) [<ffff82c4801c79a8>] shadow_cmpxchg_guest_entry+0x105/0x189 (XEN) [<ffff82c480163091>] mod_l4_entry+0x2fd/0x3e3 (XEN) [<ffff82c480163242>] new_guest_cr3+0xcb/0x269 (XEN) [<ffff82c480163ba7>] do_mmuext_op+0x7c7/0x14b8 (XEN) [<ffff82c4801f2248>] compat_mmuext_op+0x217/0x3a9 (XEN) [<ffff82c4801309b9>] compat_multicall+0x269/0x404 (XEN) [<ffff82c4801ff580>] compat_hypercall+0xc0/0x119 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 2: (XEN) Assertion ''__mfn_valid(mfn_x(smfn))'' failed at multi.c:2561 (XEN) **************************************** (XEN) Any ideas how to fix this prob? Is live migration not stable enough with xen-4.0 (rc4) yet? Thanks, Ashish On Tue, Feb 23, 2010 at 12:05 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:> Keir Fraser writes ("Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)"): >> Localhost migrations were just added to the automated tests. But I think >> maybe they are trivially failing due to trying to do them via the ''xl'' >> interface, which doesn''t support it(!). Ian? > > Localhost migration does work in most combinations in our tests. It > was only recently added and there are a few teething troubles with it > so I don''t have a full slate of results. > > It doesn''t work at all with libxl because it''s not implemented. > > Keir: >> A quick manual test indicates it''s very easy to get Xen to blow up. I got >> the following on my first localhost live migration attempt, which is a >> different looking crash in the shadow code. This is with 2.6.18 dom0 and >> domU by the way, so it''s not pv_ops tickling the hypervisor in an unexpected >> way... > > 2.6.18 doesn''t boot on my test hardware so I''m just building it, not > running it. So I haven''t reproduced your test, which explains the > different results. > > Ian. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Feb-24 11:01 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
On 24/02/2010 09:31, "Ashish Bijlani" <ashish.bijlani@gmail.com> wrote:> Any ideas how to fix this prob? > > Is live migration not stable enough with xen-4.0 (rc4) yet?Tim Deegan''s kindly offered to investigate this ahead of rc5. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xu, Jiajun
2010-Feb-26 06:12 UTC
RE: [Xen-devel] live migration fails (assert in shadow_hash_delete)
Our normal testing covers local live migration testing for HVM with Pv_ops. These cases can pass in Xen-4.0.0 RCx testing. And I just now tried HVM live migration between two machines with xen c/s 20964 and Pv_ops, it can work.> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser > Sent: Tuesday, February 23, 2010 7:00 PM > To: Tim Deegan; Devdutt Patnaik; Ian Jackson > Cc: Ashish Bijlani; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] live migration fails (assert in shadow_hash_delete) > > On 23/02/2010 10:46, "Tim Deegan" <Tim.Deegan@citrix.com> wrote: > > > At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote: > >> We just used the xen-unstable version from 2 weeks ago, and haven''t really > >> modified it. > >> We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels. > > > > OK. This really needs to be fixed to the 4.0 release. Keir, have we > > had any other testing on 64-bit PV live migrations? > > Localhost migrations were just added to the automated tests. But I think > maybe they are trivially failing due to trying to do them via the ''xl'' > interface, which doesn''t support it(!). Ian? > > In short, there''s probably been little or no testing of live migration in > the recent past, as I don''t think Intel tests it either. > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2010-Feb-26 08:38 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
On Fri, Feb 26, 2010 at 02:12:22PM +0800, Xu, Jiajun wrote:> Our normal testing covers local live migration testing for HVM with Pv_ops. These cases can pass in Xen-4.0.0 RCx testing. > And I just now tried HVM live migration between two machines with xen c/s 20964 and Pv_ops, it can work. >I guess the problem here was PV live migration.. do you test that aswell? -- Pasi> > -----Original Message----- > > From: xen-devel-bounces@lists.xensource.com > > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser > > Sent: Tuesday, February 23, 2010 7:00 PM > > To: Tim Deegan; Devdutt Patnaik; Ian Jackson > > Cc: Ashish Bijlani; xen-devel@lists.xensource.com > > Subject: Re: [Xen-devel] live migration fails (assert in shadow_hash_delete) > > > > On 23/02/2010 10:46, "Tim Deegan" <Tim.Deegan@citrix.com> wrote: > > > > > At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote: > > >> We just used the xen-unstable version from 2 weeks ago, and haven''t really > > >> modified it. > > >> We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels. > > > > > > OK. This really needs to be fixed to the 4.0 release. Keir, have we > > > had any other testing on 64-bit PV live migrations? > > > > Localhost migrations were just added to the automated tests. But I think > > maybe they are trivially failing due to trying to do them via the ''xl'' > > interface, which doesn''t support it(!). Ian? > > > > In short, there''s probably been little or no testing of live migration in > > the recent past, as I don''t think Intel tests it either. > > > > -- Keir > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Feb-26 08:39 UTC
RE: [Xen-devel] live migration fails (assert in shadow_hash_delete)
HVM with pv-ops? Seems irrelevant whether the kernel used in a hvm guest has pv-ops. The point is that HVM live migration appears to work fine (also according to our internal testing), just pv seems to be broken (and unfortunately with no consistent crash pattern). Jan>>> "Xu, Jiajun" <jiajun.xu@intel.com> 26.02.10 07:12 >>>Our normal testing covers local live migration testing for HVM with Pv_ops. These cases can pass in Xen-4.0.0 RCx testing. And I just now tried HVM live migration between two machines with xen c/s 20964 and Pv_ops, it can work.> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser > Sent: Tuesday, February 23, 2010 7:00 PM > To: Tim Deegan; Devdutt Patnaik; Ian Jackson > Cc: Ashish Bijlani; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] live migration fails (assert in shadow_hash_delete) > > On 23/02/2010 10:46, "Tim Deegan" <Tim.Deegan@citrix.com> wrote: > > > At 10:19 +0000 on 23 Feb (1266920353), Devdutt Patnaik wrote: > >> We just used the xen-unstable version from 2 weeks ago, and haven''t really > >> modified it. > >> We tried this with 64-bit versions of 2.6.31.6 and 2.6.32.8 DomU kernels. > > > > OK. This really needs to be fixed to the 4.0 release. Keir, have we > > had any other testing on 64-bit PV live migrations? > > Localhost migrations were just added to the automated tests. But I think > maybe they are trivially failing due to trying to do them via the ''xl'' > interface, which doesn''t support it(!). Ian? > > In short, there''s probably been little or no testing of live migration in > the recent past, as I don''t think Intel tests it either. > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Feb-26 09:52 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
At 11:01 +0000 on 24 Feb (1267009267), Keir Fraser wrote:> On 24/02/2010 09:31, "Ashish Bijlani" <ashish.bijlani@gmail.com> wrote: > > > Any ideas how to fix this prob? > > > > Is live migration not stable enough with xen-4.0 (rc4) yet? > > Tim Deegan''s kindly offered to investigate this ahead of rc5.For the curious: The bug seems to have come in between 4.0.0 rc1 (20789) and 20822. Bisecting between those is more fun because PV domain creation and migration were broken in libxc then. Reverting 20808 (the only cset there that touches the shadow code) doesn''t fix the problem. Selective backporting yesterday seemed to blame 20812 ("xend: NUMA: fix division by zero on unpopulated nodes"), which seems unlikely. I''ll dig further. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Feb-26 10:27 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
>>> Tim Deegan <Tim.Deegan@citrix.com> 26.02.10 10:52 >>> >The bug seems to have come in between 4.0.0 rc1 (20789) and 20822. >Bisecting between those is more fun because PV domain creation and migration >were broken in libxc then. Reverting 20808 (the only cset there that >touches the shadow code) doesn''t fix the problem. > >Selective backporting yesterday seemed to blame 20812 ("xend: NUMA: fix >division by zero on unpopulated nodes"), which seems unlikely. I''ll dig >further.I''d think 20792 is a good candidate - a copy-and-paste mistake would cause the page subsequent to the one allocated to be overwritten. Will send a patch in a minute. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2010-Feb-26 14:22 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
At 10:27 +0000 on 26 Feb (1267180053), Jan Beulich wrote:> >>> Tim Deegan <Tim.Deegan@citrix.com> 26.02.10 10:52 >>> > >The bug seems to have come in between 4.0.0 rc1 (20789) and 20822. > >Bisecting between those is more fun because PV domain creation and migration > >were broken in libxc then. Reverting 20808 (the only cset there that > >touches the shadow code) doesn''t fix the problem. > > > >Selective backporting yesterday seemed to blame 20812 ("xend: NUMA: fix > >division by zero on unpopulated nodes"), which seems unlikely. I''ll dig > >further. > > I''d think 20792 is a good candidate - a copy-and-paste mistake would > cause the page subsequent to the one allocated to be overwritten. > Will send a patch in a minute.Thanks for that. Keir, I''m still seeing (different) crashes on unstable tip even with Jan''s fix; the proximate cause is c/s 20954, which changes the paths taken when log-dirty mode is turned off after the live migration. Reverting c/s 20954 fixes migration for me and is probably the best thing to get the 4.0 release schedule going again. I''ll try to find the actual bug at some later date. Cheers, Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Feb-26 14:48 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
On 26/02/2010 14:22, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:>> I''d think 20792 is a good candidate - a copy-and-paste mistake would >> cause the page subsequent to the one allocated to be overwritten. >> Will send a patch in a minute. > > Thanks for that. > > Keir, I''m still seeing (different) crashes on unstable tip even with > Jan''s fix; the proximate cause is c/s 20954, which changes the paths > taken when log-dirty mode is turned off after the live migration. > > Reverting c/s 20954 fixes migration for me and is probably the best > thing to get the 4.0 release schedule going again. I''ll try to find > the actual bug at some later date.Hm, yes, it looks like properly performing XEN_DOMCTL_SHADOW_OP_OFF on a PV domain doesn''t work. Removing the break stmts stops {shadow,hap}_domctl() ever being called for the OFF operation -- so logdirty gets disabled but nothing else -- and then I guess that we get teardown right for domain destruction. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Feb-26 14:49 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
>>> Tim Deegan <Tim.Deegan@citrix.com> 26.02.10 15:22 >>> >Keir, I''m still seeing (different) crashes on unstable tip even with >Jan''s fix; the proximate cause is c/s 20954, which changes the paths >taken when log-dirty mode is turned off after the live migration. > >Reverting c/s 20954 fixes migration for me and is probably the best >thing to get the 4.0 release schedule going again. I''ll try to find >the actual bug at some later date.So perhaps the fall-through there was really intended? I had pointed out that these missing break statements looked suspicious, so maybe it''s simply that those two places should be annotated accordingly? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Feb-26 15:29 UTC
Re: [Xen-devel] live migration fails (assert in shadow_hash_delete)
On 26/02/2010 14:49, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> Tim Deegan <Tim.Deegan@citrix.com> 26.02.10 15:22 >>> >> Keir, I''m still seeing (different) crashes on unstable tip even with >> Jan''s fix; the proximate cause is c/s 20954, which changes the paths >> taken when log-dirty mode is turned off after the live migration. >> >> Reverting c/s 20954 fixes migration for me and is probably the best >> thing to get the 4.0 release schedule going again. I''ll try to find >> the actual bug at some later date. > > So perhaps the fall-through there was really intended? I had pointed > out that these missing break statements looked suspicious, so maybe > it''s simply that those two places should be annotated accordingly?Mmmm.. No. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel