thr3ads.net - Xen devel - [Xen-devel] dom0 pvops crash apparently due to guest migration [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Ian Jackson

2010-Nov-29 11:59 UTC

[Xen-devel] dom0 pvops crash apparently due to guest migration

One of my test boxes encountered the crash whose oops you see below.
It doesn''t do it every time, or even every time on this machine (since
the credit2 test in the same run worked).  The crash seems to have
occurred just at the end of the migration of a PV guest.

The setup is 32-bit dom0 and domU on 64-bit Xen.
The pvops kernel version was 56eabf9f2a6632d3b2ef.

The complete logs are here:
 
http://www.chiark.greenend.org.uk/~xensrcts/logs/2847/test-amd64-i386-xl-multivcpu/
(The machine has since been reused so those logs are what there is.)

Ian.

------------[ cut here ]------------
kernel BUG at arch/x86/mm/fault.c:210!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/virtual/net/lo/operstate
Modules linked in: e1000e [last unloaded: scsi_wait_scan]

Pid: 22, comm: xenwatch Not tainted (2.6.32.26 #1)         
EIP: 0061:[<c104c058>] EFLAGS: 00010082 CPU: 0
EIP is at vmalloc_sync_one+0x118/0x128
EAX: 003f8360 EBX: 1fc1b067 ECX: ffffffe0 EDX: ab273fff
ESI: 00000000 EDI: c182adf0 EBP: dfcdbe88 ESP: dfcdbe64
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process xenwatch (pid: 22, ti=dfcda000 task=dfccc510 task.ti=dfcda000)
Stack:
 dbd7b384 00cdbe88 00000000 c568f200 dbd7b384 ab273fff f7c00000 c568f200
<0> dbd7b384 dfcdbea8 c104ca9a c182adf0 c1780204 dbd75f40 dfd45a20
dbd75f40
<0> dfcdbf5c dfcdbeb4 c10df14a dfcdbf1c dfcdbef8 c12313b1 0000001b
00000008
Call Trace:
 [<c104ca9a>] ? vmalloc_sync_all+0x5c/0xbe
 [<c10df14a>] ? alloc_vm_area+0x44/0x4b
 [<c12313b1>] ? blkif_map+0x2d/0x204
 [<c1230cbb>] ? frontend_changed+0x194/0x209
 [<c1229b39>] ? xenbus_otherend_changed+0x5c/0x61
 [<c1229c97>] ? frontend_changed+0xa/0xd
 [<c1228783>] ? xenwatch_thread+0xf6/0x11e
 [<c10795df>] ? autoremove_wake_function+0x0/0x33
 [<c122868d>] ? xenwatch_thread+0x0/0x11e
 [<c1079397>] ? kthread+0x61/0x66
 [<c1079336>] ? kthread+0x0/0x66
 [<c1030dd7>] ? kernel_thread_helper+0x7/0x10
Code: eb fe 89 d8 89 f2 ff 15 08 7d 68 c1 89 d6 8b 55 f0 89 c3 89 c8 0f ac d0 0c
89 c1 89 d8 0f ac f0 0c c1 e1 05 c1 e0 05 39 c1 74 06 <0f> 0b eb fe 31 ff
83 c4 18 89 f8 5b 5e 5f 5d c3 55 89 e5 56 53
EIP: [<c104c058>] vmalloc_sync_one+0x118/0x128 SS:ESP 0069:dfcdbe64
---[ end trace 7b608ed9c5e5ed4e ]---

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2010-Nov-29 18:53 UTC

head link

[Xen-devel] Re: dom0 pvops crash apparently due to guest migration

On 11/29/2010 03:59 AM, Ian Jackson wrote:> One of my test boxes encountered the crash whose oops you see below.
> It doesn''t do it every time, or even every time on this machine
(since
> the credit2 test in the same run worked).  The crash seems to have
> occurred just at the end of the migration of a PV guest.
Do you have a feel for what the likelihood of failure is?  Has this
started happening recently?
> The setup is 32-bit dom0 and domU on 64-bit Xen.
> The pvops kernel version was 56eabf9f2a6632d3b2ef.
>
> The complete logs are here:
>  
http://www.chiark.greenend.org.uk/~xensrcts/logs/2847/test-amd64-i386-xl-multivcpu/
> (The machine has since been reused so those logs are what there is.)
>
> Ian.
>
> ------------[ cut here ]------------
> kernel BUG at arch/x86/mm/fault.c:210!
> invalid opcode: 0000 [#1] SMP 
> last sysfs file: /sys/devices/virtual/net/lo/operstate
> Modules linked in: e1000e [last unloaded: scsi_wait_scan]
>
> Pid: 22, comm: xenwatch Not tainted (2.6.32.26 #1)         
> EIP: 0061:[<c104c058>] EFLAGS: 00010082 CPU: 0
> EIP is at vmalloc_sync_one+0x118/0x128
> EAX: 003f8360 EBX: 1fc1b067 ECX: ffffffe0 EDX: ab273fff
> ESI: 00000000 EDI: c182adf0 EBP: dfcdbe88 ESP: dfcdbe64
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> Process xenwatch (pid: 22, ti=dfcda000 task=dfccc510 task.ti=dfcda000)
> Stack:
>  dbd7b384 00cdbe88 00000000 c568f200 dbd7b384 ab273fff f7c00000 c568f200
> <0> dbd7b384 dfcdbea8 c104ca9a c182adf0 c1780204 dbd75f40 dfd45a20
dbd75f40
> <0> dfcdbf5c dfcdbeb4 c10df14a dfcdbf1c dfcdbef8 c12313b1 0000001b
00000008
> Call Trace:
>  [<c104ca9a>] ? vmalloc_sync_all+0x5c/0xbe
>  [<c10df14a>] ? alloc_vm_area+0x44/0x4b
Hm, I''m still not really sure why alloc_vm_area() does a
vmalloc_sync_all in the first place...  But that BUG shouldn''t happen
regardless.

    J
>  [<c12313b1>] ? blkif_map+0x2d/0x204
>  [<c1230cbb>] ? frontend_changed+0x194/0x209
>  [<c1229b39>] ? xenbus_otherend_changed+0x5c/0x61
>  [<c1229c97>] ? frontend_changed+0xa/0xd
>  [<c1228783>] ? xenwatch_thread+0xf6/0x11e
>  [<c10795df>] ? autoremove_wake_function+0x0/0x33
>  [<c122868d>] ? xenwatch_thread+0x0/0x11e
>  [<c1079397>] ? kthread+0x61/0x66
>  [<c1079336>] ? kthread+0x0/0x66
>  [<c1030dd7>] ? kernel_thread_helper+0x7/0x10
> Code: eb fe 89 d8 89 f2 ff 15 08 7d 68 c1 89 d6 8b 55 f0 89 c3 89 c8 0f ac
d0 0c 89 c1 89 d8 0f ac f0 0c c1 e1 05 c1 e0 05 39 c1 74 06 <0f> 0b eb fe
31 ff 83 c4 18 89 f8 5b 5e 5f 5d c3 55 89 e5 56 53
> EIP: [<c104c058>] vmalloc_sync_one+0x118/0x128 SS:ESP 0069:dfcdbe64
> ---[ end trace 7b608ed9c5e5ed4e ]---
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Jackson

2010-Nov-30 11:45 UTC

head link

[Xen-devel] Re: dom0 pvops crash apparently due to guest migration

Jeremy Fitzhardinge writes ("Re: dom0 pvops crash apparently due to guest
migration"):> On 11/29/2010 03:59 AM, Ian Jackson wrote:
> > One of my test boxes encountered the crash whose oops you see below.
> > It doesn''t do it every time, or even every time on this
machine (since
> > the credit2 test in the same run worked).  The crash seems to have
> > occurred just at the end of the migration of a PV guest.
> 
> Do you have a feel for what the likelihood of failure is?  Has this
> started happening recently?
The probability of failure seems reasonably high.  This is a different
test machine so it is possible that there is something wrong with the
hardware, but all of the tests with the XCP kernel work fine.
> Hm, I''m still not really sure why alloc_vm_area() does a
> vmalloc_sync_all in the first place...  But that BUG shouldn''t
happen
> regardless.
It''s not just blkback; here''s one that shows a call trace with
netback
instead:

 ------------[ cut here ]------------
 kernel BUG at arch/x86/mm/fault.c:210!
 invalid opcode: 0000 [#1] SMP 
 last sysfs file:
/sys/devices/virtual/net/xenbr0/bridge/topology_change_detected
 Modules linked in: e1000e [last unloaded: scsi_wait_scan]
 
 Pid: 22, comm: xenwatch Not tainted (2.6.32.26 #1)         
 EIP: 0061:[<c104c058>] EFLAGS: 00010086 CPU: 0
 EIP is at vmalloc_sync_one+0x118/0x128
 EAX: 00088480 EBX: 04424067 ECX: ffffffe0 EDX: 7c83ffff
 ESI: 00000000 EDI: c182ae00 EBP: dfcdbeb0 ESP: dfcdbe8c
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
 Process xenwatch (pid: 22, ti=dfcda000 task=dfccc510 task.ti=dfcda000)
 Stack:
  dff99c44 00cdbeb0 00000000 c568f800 dff99c44 7c83ffff f8000000 c568f800
 <0> dff99c44 dfcdbed0 c104ca9a c182ae00 c1780204 c445d600 cd034dc0
c445d600
 <0> c445d620 dfcdbedc c10df14a fffffff4 dfcdbf34 c1236b30 00000301
00000300
 Call Trace:
  [<c104ca9a>] ? vmalloc_sync_all+0x5c/0xbe
  [<c10df14a>] ? alloc_vm_area+0x44/0x4b
  [<c1236b30>] ? netif_map+0x2d/0x2e3
  [<c10e95c8>] ? kfree+0x111/0x119
  [<c1229291>] ? xenbus_scanf+0x38/0x4b
  [<c1229291>] ? xenbus_scanf+0x38/0x4b
  [<c12361fa>] ? frontend_changed+0x2c3/0x526
  [<c1229b39>] ? xenbus_otherend_changed+0x5c/0x61
  [<c1229c97>] ? frontend_changed+0xa/0xd
  [<c1228783>] ? xenwatch_thread+0xf6/0x11e
  [<c10795df>] ? autoremove_wake_function+0x0/0x33
  [<c122868d>] ? xenwatch_thread+0x0/0x11e
  [<c1079397>] ? kthread+0x61/0x66
  [<c1079336>] ? kthread+0x0/0x66
  [<c1030dd7>] ? kernel_thread_helper+0x7/0x10
 Code: eb fe 89 d8 89 f2 ff 15 08 7d 68 c1 89 d6 8b 55 f0 89 c3 89 c8 0f ac d0
0c 89 c1 89 d8 0f ac f0 0c c1 e1 05 c1 e0 05 39 c1 74 06 <0f> 0b eb fe 31
ff 83 c4 18 89 f8 5b 5e 5f 5d c3 55 89 e5 56 53
 EIP: [<c104c058>] vmalloc_sync_one+0x118/0x128 SS:ESP 0069:dfcdbe8c
 ---[ end trace 008e317122f8c510 ]---

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-30 12:17 UTC

head link

Re: [Xen-devel] Re: dom0 pvops crash apparently due to guest migration

On 29/11/2010 18:53, "Jeremy Fitzhardinge" <jeremy@goop.org>
wrote:
> Hm, I''m still not really sure why alloc_vm_area() does a
> vmalloc_sync_all in the first place...  But that BUG shouldn''t
happen
> regardless.
I think vmalloc_sync_all() is required only if alloc_vm_area()''d
regions are
used as hypercall buffers. I''m not sure if they ever are, these days.
The
sync wouldn''t be needed for allocated regions use as shared rings, for
example. You might be able to do a quick audit of users and then remove the
vmalloc_sync_all(). Presumably if v_s_a is planned to go away it''d be
nice
to get rid of this usage for that reason also.

 -- Keir




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2010-Nov-30 20:37 UTC

head link

Re: [Xen-devel] Re: dom0 pvops crash apparently due to guest migration

On 11/30/2010 04:17 AM, Keir Fraser wrote:> On 29/11/2010 18:53, "Jeremy Fitzhardinge"
<jeremy@goop.org> wrote:
>
>> Hm, I''m still not really sure why alloc_vm_area() does a
>> vmalloc_sync_all in the first place...  But that BUG shouldn''t
happen
>> regardless.
> I think vmalloc_sync_all() is required only if alloc_vm_area()''d
regions are
> used as hypercall buffers. I''m not sure if they ever are, these
days.
Doesn''t look like it.  And it would need the buffer to be filled out by
one task and then the hypercall issued by another one, which seems unlikely.

And even if we did issue hypercalls from a vmalloc area, it shouldn''t
be
alloc_vm_area()''s job to make sure that works, since its a generic core
function now.

    J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2010 - dom0 pvops crash apparently due to guest migration

[Xen-devel] dom0 pvops crash apparently due to guest migration

[Xen-devel] Re: dom0 pvops crash apparently due to guest migration

[Xen-devel] Re: dom0 pvops crash apparently due to guest migration

Re: [Xen-devel] Re: dom0 pvops crash apparently due to guest migration

Re: [Xen-devel] Re: dom0 pvops crash apparently due to guest migration