Ian Jackson
2010-Nov-29 11:59 UTC
[Xen-devel] dom0 pvops crash apparently due to guest migration
One of my test boxes encountered the crash whose oops you see below. It doesn''t do it every time, or even every time on this machine (since the credit2 test in the same run worked). The crash seems to have occurred just at the end of the migration of a PV guest. The setup is 32-bit dom0 and domU on 64-bit Xen. The pvops kernel version was 56eabf9f2a6632d3b2ef. The complete logs are here: http://www.chiark.greenend.org.uk/~xensrcts/logs/2847/test-amd64-i386-xl-multivcpu/ (The machine has since been reused so those logs are what there is.) Ian. ------------[ cut here ]------------ kernel BUG at arch/x86/mm/fault.c:210! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/virtual/net/lo/operstate Modules linked in: e1000e [last unloaded: scsi_wait_scan] Pid: 22, comm: xenwatch Not tainted (2.6.32.26 #1) EIP: 0061:[<c104c058>] EFLAGS: 00010082 CPU: 0 EIP is at vmalloc_sync_one+0x118/0x128 EAX: 003f8360 EBX: 1fc1b067 ECX: ffffffe0 EDX: ab273fff ESI: 00000000 EDI: c182adf0 EBP: dfcdbe88 ESP: dfcdbe64 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 Process xenwatch (pid: 22, ti=dfcda000 task=dfccc510 task.ti=dfcda000) Stack: dbd7b384 00cdbe88 00000000 c568f200 dbd7b384 ab273fff f7c00000 c568f200 <0> dbd7b384 dfcdbea8 c104ca9a c182adf0 c1780204 dbd75f40 dfd45a20 dbd75f40 <0> dfcdbf5c dfcdbeb4 c10df14a dfcdbf1c dfcdbef8 c12313b1 0000001b 00000008 Call Trace: [<c104ca9a>] ? vmalloc_sync_all+0x5c/0xbe [<c10df14a>] ? alloc_vm_area+0x44/0x4b [<c12313b1>] ? blkif_map+0x2d/0x204 [<c1230cbb>] ? frontend_changed+0x194/0x209 [<c1229b39>] ? xenbus_otherend_changed+0x5c/0x61 [<c1229c97>] ? frontend_changed+0xa/0xd [<c1228783>] ? xenwatch_thread+0xf6/0x11e [<c10795df>] ? autoremove_wake_function+0x0/0x33 [<c122868d>] ? xenwatch_thread+0x0/0x11e [<c1079397>] ? kthread+0x61/0x66 [<c1079336>] ? kthread+0x0/0x66 [<c1030dd7>] ? kernel_thread_helper+0x7/0x10 Code: eb fe 89 d8 89 f2 ff 15 08 7d 68 c1 89 d6 8b 55 f0 89 c3 89 c8 0f ac d0 0c 89 c1 89 d8 0f ac f0 0c c1 e1 05 c1 e0 05 39 c1 74 06 <0f> 0b eb fe 31 ff 83 c4 18 89 f8 5b 5e 5f 5d c3 55 89 e5 56 53 EIP: [<c104c058>] vmalloc_sync_one+0x118/0x128 SS:ESP 0069:dfcdbe64 ---[ end trace 7b608ed9c5e5ed4e ]--- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Nov-29 18:53 UTC
[Xen-devel] Re: dom0 pvops crash apparently due to guest migration
On 11/29/2010 03:59 AM, Ian Jackson wrote:> One of my test boxes encountered the crash whose oops you see below. > It doesn''t do it every time, or even every time on this machine (since > the credit2 test in the same run worked). The crash seems to have > occurred just at the end of the migration of a PV guest.Do you have a feel for what the likelihood of failure is? Has this started happening recently?> The setup is 32-bit dom0 and domU on 64-bit Xen. > The pvops kernel version was 56eabf9f2a6632d3b2ef. > > The complete logs are here: > http://www.chiark.greenend.org.uk/~xensrcts/logs/2847/test-amd64-i386-xl-multivcpu/ > (The machine has since been reused so those logs are what there is.) > > Ian. > > ------------[ cut here ]------------ > kernel BUG at arch/x86/mm/fault.c:210! > invalid opcode: 0000 [#1] SMP > last sysfs file: /sys/devices/virtual/net/lo/operstate > Modules linked in: e1000e [last unloaded: scsi_wait_scan] > > Pid: 22, comm: xenwatch Not tainted (2.6.32.26 #1) > EIP: 0061:[<c104c058>] EFLAGS: 00010082 CPU: 0 > EIP is at vmalloc_sync_one+0x118/0x128 > EAX: 003f8360 EBX: 1fc1b067 ECX: ffffffe0 EDX: ab273fff > ESI: 00000000 EDI: c182adf0 EBP: dfcdbe88 ESP: dfcdbe64 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 > Process xenwatch (pid: 22, ti=dfcda000 task=dfccc510 task.ti=dfcda000) > Stack: > dbd7b384 00cdbe88 00000000 c568f200 dbd7b384 ab273fff f7c00000 c568f200 > <0> dbd7b384 dfcdbea8 c104ca9a c182adf0 c1780204 dbd75f40 dfd45a20 dbd75f40 > <0> dfcdbf5c dfcdbeb4 c10df14a dfcdbf1c dfcdbef8 c12313b1 0000001b 00000008 > Call Trace: > [<c104ca9a>] ? vmalloc_sync_all+0x5c/0xbe > [<c10df14a>] ? alloc_vm_area+0x44/0x4bHm, I''m still not really sure why alloc_vm_area() does a vmalloc_sync_all in the first place... But that BUG shouldn''t happen regardless. J> [<c12313b1>] ? blkif_map+0x2d/0x204 > [<c1230cbb>] ? frontend_changed+0x194/0x209 > [<c1229b39>] ? xenbus_otherend_changed+0x5c/0x61 > [<c1229c97>] ? frontend_changed+0xa/0xd > [<c1228783>] ? xenwatch_thread+0xf6/0x11e > [<c10795df>] ? autoremove_wake_function+0x0/0x33 > [<c122868d>] ? xenwatch_thread+0x0/0x11e > [<c1079397>] ? kthread+0x61/0x66 > [<c1079336>] ? kthread+0x0/0x66 > [<c1030dd7>] ? kernel_thread_helper+0x7/0x10 > Code: eb fe 89 d8 89 f2 ff 15 08 7d 68 c1 89 d6 8b 55 f0 89 c3 89 c8 0f ac d0 0c 89 c1 89 d8 0f ac f0 0c c1 e1 05 c1 e0 05 39 c1 74 06 <0f> 0b eb fe 31 ff 83 c4 18 89 f8 5b 5e 5f 5d c3 55 89 e5 56 53 > EIP: [<c104c058>] vmalloc_sync_one+0x118/0x128 SS:ESP 0069:dfcdbe64 > ---[ end trace 7b608ed9c5e5ed4e ]--- >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Nov-30 11:45 UTC
[Xen-devel] Re: dom0 pvops crash apparently due to guest migration
Jeremy Fitzhardinge writes ("Re: dom0 pvops crash apparently due to guest migration"):> On 11/29/2010 03:59 AM, Ian Jackson wrote: > > One of my test boxes encountered the crash whose oops you see below. > > It doesn''t do it every time, or even every time on this machine (since > > the credit2 test in the same run worked). The crash seems to have > > occurred just at the end of the migration of a PV guest. > > Do you have a feel for what the likelihood of failure is? Has this > started happening recently?The probability of failure seems reasonably high. This is a different test machine so it is possible that there is something wrong with the hardware, but all of the tests with the XCP kernel work fine.> Hm, I''m still not really sure why alloc_vm_area() does a > vmalloc_sync_all in the first place... But that BUG shouldn''t happen > regardless.It''s not just blkback; here''s one that shows a call trace with netback instead: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/fault.c:210! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/virtual/net/xenbr0/bridge/topology_change_detected Modules linked in: e1000e [last unloaded: scsi_wait_scan] Pid: 22, comm: xenwatch Not tainted (2.6.32.26 #1) EIP: 0061:[<c104c058>] EFLAGS: 00010086 CPU: 0 EIP is at vmalloc_sync_one+0x118/0x128 EAX: 00088480 EBX: 04424067 ECX: ffffffe0 EDX: 7c83ffff ESI: 00000000 EDI: c182ae00 EBP: dfcdbeb0 ESP: dfcdbe8c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 Process xenwatch (pid: 22, ti=dfcda000 task=dfccc510 task.ti=dfcda000) Stack: dff99c44 00cdbeb0 00000000 c568f800 dff99c44 7c83ffff f8000000 c568f800 <0> dff99c44 dfcdbed0 c104ca9a c182ae00 c1780204 c445d600 cd034dc0 c445d600 <0> c445d620 dfcdbedc c10df14a fffffff4 dfcdbf34 c1236b30 00000301 00000300 Call Trace: [<c104ca9a>] ? vmalloc_sync_all+0x5c/0xbe [<c10df14a>] ? alloc_vm_area+0x44/0x4b [<c1236b30>] ? netif_map+0x2d/0x2e3 [<c10e95c8>] ? kfree+0x111/0x119 [<c1229291>] ? xenbus_scanf+0x38/0x4b [<c1229291>] ? xenbus_scanf+0x38/0x4b [<c12361fa>] ? frontend_changed+0x2c3/0x526 [<c1229b39>] ? xenbus_otherend_changed+0x5c/0x61 [<c1229c97>] ? frontend_changed+0xa/0xd [<c1228783>] ? xenwatch_thread+0xf6/0x11e [<c10795df>] ? autoremove_wake_function+0x0/0x33 [<c122868d>] ? xenwatch_thread+0x0/0x11e [<c1079397>] ? kthread+0x61/0x66 [<c1079336>] ? kthread+0x0/0x66 [<c1030dd7>] ? kernel_thread_helper+0x7/0x10 Code: eb fe 89 d8 89 f2 ff 15 08 7d 68 c1 89 d6 8b 55 f0 89 c3 89 c8 0f ac d0 0c 89 c1 89 d8 0f ac f0 0c c1 e1 05 c1 e0 05 39 c1 74 06 <0f> 0b eb fe 31 ff 83 c4 18 89 f8 5b 5e 5f 5d c3 55 89 e5 56 53 EIP: [<c104c058>] vmalloc_sync_one+0x118/0x128 SS:ESP 0069:dfcdbe8c ---[ end trace 008e317122f8c510 ]--- Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Nov-30 12:17 UTC
Re: [Xen-devel] Re: dom0 pvops crash apparently due to guest migration
On 29/11/2010 18:53, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:> Hm, I''m still not really sure why alloc_vm_area() does a > vmalloc_sync_all in the first place... But that BUG shouldn''t happen > regardless.I think vmalloc_sync_all() is required only if alloc_vm_area()''d regions are used as hypercall buffers. I''m not sure if they ever are, these days. The sync wouldn''t be needed for allocated regions use as shared rings, for example. You might be able to do a quick audit of users and then remove the vmalloc_sync_all(). Presumably if v_s_a is planned to go away it''d be nice to get rid of this usage for that reason also. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Nov-30 20:37 UTC
Re: [Xen-devel] Re: dom0 pvops crash apparently due to guest migration
On 11/30/2010 04:17 AM, Keir Fraser wrote:> On 29/11/2010 18:53, "Jeremy Fitzhardinge" <jeremy@goop.org> wrote: > >> Hm, I''m still not really sure why alloc_vm_area() does a >> vmalloc_sync_all in the first place... But that BUG shouldn''t happen >> regardless. > I think vmalloc_sync_all() is required only if alloc_vm_area()''d regions are > used as hypercall buffers. I''m not sure if they ever are, these days.Doesn''t look like it. And it would need the buffer to be filled out by one task and then the hypercall issued by another one, which seems unlikely. And even if we did issue hypercalls from a vmalloc area, it shouldn''t be alloc_vm_area()''s job to make sure that works, since its a generic core function now. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel