thr3ads.net - Xen devel - [Xen-devel] hang on restore in 3.3.1 [Feb 2009]

If this information is useful, please help other people find it:
Share via:

James Harper

2009-Feb-10 01:45 UTC

[Xen-devel] hang on restore in 3.3.1

I am having problems with save/restore under 3.3.1 in the GPLPV drivers.
I call hvm_shutdown(xpdd, SHUTDOWN_suspend), but as soon as I lower IRQL
(enabling interrupts), qemu goes to 100% CPU and the DomU load goes
right up too.

Xentrace is showing a whole lot of this going on:


CPU0  200130258143212 (+     770)  hypercall  [ rip 0x000000008020632a, eax =
0xffffffff ]
CPU0  200130258151107 (+    7895)  hypercall  [ rip 0x00000000802062eb, eax =
0xffffffff ]
CPU0  200130258156293 (+    5186)  hypercall  [ rip 0x00000000802062eb, eax =
0xffffffff ]
CPU0  200130258161233 (+    4940)  hypercall  [ rip 0x00000000802062eb, eax =
0xffffffff ]
CPU0  200130258165467 (+    4234)  hypercall  [ rip 0x000000008020640a, eax =
0xffffffff ]
CPU0  200130258167202 (+    1735)  domain_wake       [ domid 0x00000062, edomid
= 0x00000000 ]
CPU0  200130258168511 (+    1309)  switch_infprev    [ old_domid 0x00000000,
runtime = 31143 ]
CPU0  200130258168716 (+     205)  switch_infnext    [ new_domid 0x00000062,
time = 786, r_time = 30000000 ]
CPU0  200130258169338 (+     622)  __enter_scheduler [
prev<domid:edomid> = 0x00000000 : 0x00000000, next<domid:edomid>
0x00000062 : 0x00000000 ]
CPU0  200130258175532 (+    6194)  VMENTRY     [ dom:vcpu = 0x00000062 ]
CPU0  200130258179633 (+    4101)  VMEXIT      [ dom:vcpu = 0x00000062,
exitcode = 0x0000004e, rIP  = 0x0000000080a562b9 ]
CPU0  0 (+       0)  MMIO_AST_WR [ address = 0xfee000b0, data 0x00000000 ]
CPU0  0 (+       0)  PF_XEN      [ dom:vcpu = 0x00000062, errorcode 0x0b, virt =
0xfffe00b0 ]
CPU0  0 (+       0)  INJ_VIRQ    [ dom:vcpu = 0x00000062, vector = 0x00,
fake = 1 ]
CPU0  200130258185932 (+    6299)  VMENTRY     [ dom:vcpu = 0x00000062 ]
CPU0  200130258189737 (+    3805)  VMEXIT      [ dom:vcpu = 0x00000062,
exitcode = 0x00000064, rIP  = 0x0000000080a560ad ]
CPU0  0 (+       0)  INJ_VIRQ    [ dom:vcpu = 0x00000062, vector = 0x83,
fake = 0 ]
CPU0  200130258190990 (+    1253)  VMENTRY     [ dom:vcpu = 0x00000062 ]
CPU0  200130258194791 (+    3801)  VMEXIT      [ dom:vcpu = 0x00000062,
exitcode = 0x0000007b, rIP  = 0x0000000080a5a29e ]
CPU0  0 (+       0)  IO_ASSIST   [ dom:vcpu = 0x0000c202, data = 0x0000
]
CPU0  200130258198944 (+    4153)  switch_infprev    [ old_domid 0x00000062,
runtime = 17087 ]
CPU0  200130258199132 (+     188)  switch_infnext    [ new_domid 0x00000000,
time = 17087, r_time = 30000000 ]
CPU0  200130258199702 (+     570)  __enter_scheduler [
prev<domid:edomid> = 0x00000062 : 0x00000000, next<domid:edomid>
0x00000000 : 0x00000000 ]
CPU0  200130258206470 (+    6768)  hypercall  [ rip 0x00000000802062eb, eax =
0xffffffff ]
CPU0  200130258210964 (+    4494)  hypercall  [ rip 0x00000000802062eb, eax =
0xffffffff ]
CPU0  200130258214767 (+    3803)  hypercall  [ rip 0x00000000802062eb, eax =
0xffffffff ]
CPU0  200130258218019 (+    3252)  hypercall  [ rip 0x00000000802062eb, eax =
0xffffffff ]
CPU0  200130258227419 (+    9400)  hypercall  [ rip 0x00000000802062eb, eax =
0xffffffff ]

It kind of looks like vector 0x83 is being fired over and over, which
would explain why things hang once I enable interrupts again. I will
look into what vector 0x83 is attached to, but does anyone have any
ideas?

Thanks

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Feb-11 08:45 UTC

head link

RE: [Xen-devel] hang on restore in 3.3.1

Now I''m seeing the same thing but on vector 0x93 instead. There is
nothing on that vector. It appears that when xen is restoring my domain,
an interrupt line is getting ''stuck'' somehow, as the hang
occurs as soon
as I enable interrupts after doing the restore... any suggestions?

Can anyone confirm that "INJ_VIRQ [ dom:vcpu = 0x00000062, vector 0x83,
fake = 0 ]" does actually imply that an interrupt is being set in
my DomU, and that the vector is the actual offset into the vector table?

Thanks

James
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of James Harper
> Sent: Tuesday, 10 February 2009 12:46
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] hang on restore in 3.3.1
> 
> I am having problems with save/restore under 3.3.1 in the GPLPV
drivers.> I call hvm_shutdown(xpdd, SHUTDOWN_suspend), but as soon as I lower
IRQL> (enabling interrupts), qemu goes to 100% CPU and the DomU load goes
> right up too.
> 
> Xentrace is showing a whole lot of this going on:
> 
> 
> CPU0  200130258143212 (+     770)  hypercall  [ rip >
0x000000008020632a, eax = 0xffffffff ]
> CPU0  200130258151107 (+    7895)  hypercall  [ rip >
0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258156293 (+    5186)  hypercall  [ rip >
0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258161233 (+    4940)  hypercall  [ rip >
0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258165467 (+    4234)  hypercall  [ rip >
0x000000008020640a, eax = 0xffffffff ]
> CPU0  200130258167202 (+    1735)  domain_wake       [ domid >
0x00000062, edomid = 0x00000000 ]
> CPU0  200130258168511 (+    1309)  switch_infprev    [ old_domid >
0x00000000, runtime = 31143 ]
> CPU0  200130258168716 (+     205)  switch_infnext    [ new_domid >
0x00000062, time = 786, r_time = 30000000 ]
> CPU0  200130258169338 (+     622)  __enter_scheduler [
> prev<domid:edomid> = 0x00000000 : 0x00000000,
next<domid:edomid> > 0x00000062 : 0x00000000 ]
> CPU0  200130258175532 (+    6194)  VMENTRY     [ dom:vcpu = 0x00000062
]> CPU0  200130258179633 (+    4101)  VMEXIT      [ dom:vcpu 0x00000062,
> exitcode = 0x0000004e, rIP  = 0x0000000080a562b9 ]
> CPU0  0 (+       0)  MMIO_AST_WR [ address = 0xfee000b0, data >
0x00000000 ]
> CPU0  0 (+       0)  PF_XEN      [ dom:vcpu = 0x00000062, errorcode >
0x0b, virt = 0xfffe00b0 ]
> CPU0  0 (+       0)  INJ_VIRQ    [ dom:vcpu = 0x00000062, vector 0x00,
> fake = 1 ]
> CPU0  200130258185932 (+    6299)  VMENTRY     [ dom:vcpu = 0x00000062
]> CPU0  200130258189737 (+    3805)  VMEXIT      [ dom:vcpu 0x00000062,
> exitcode = 0x00000064, rIP  = 0x0000000080a560ad ]
> CPU0  0 (+       0)  INJ_VIRQ    [ dom:vcpu = 0x00000062, vector 0x83,
> fake = 0 ]
> CPU0  200130258190990 (+    1253)  VMENTRY     [ dom:vcpu = 0x00000062
]> CPU0  200130258194791 (+    3801)  VMEXIT      [ dom:vcpu 0x00000062,
> exitcode = 0x0000007b, rIP  = 0x0000000080a5a29e ]
> CPU0  0 (+       0)  IO_ASSIST   [ dom:vcpu = 0x0000c202, data 0x0000
> ]
> CPU0  200130258198944 (+    4153)  switch_infprev    [ old_domid >
0x00000062, runtime = 17087 ]
> CPU0  200130258199132 (+     188)  switch_infnext    [ new_domid >
0x00000000, time = 17087, r_time = 30000000 ]
> CPU0  200130258199702 (+     570)  __enter_scheduler [
> prev<domid:edomid> = 0x00000062 : 0x00000000,
next<domid:edomid> > 0x00000000 : 0x00000000 ]
> CPU0  200130258206470 (+    6768)  hypercall  [ rip >
0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258210964 (+    4494)  hypercall  [ rip >
0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258214767 (+    3803)  hypercall  [ rip >
0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258218019 (+    3252)  hypercall  [ rip >
0x00000000802062eb, eax = 0xffffffff ]
> CPU0  200130258227419 (+    9400)  hypercall  [ rip >
0x00000000802062eb, eax = 0xffffffff ]
> 
> It kind of looks like vector 0x83 is being fired over and over, which
> would explain why things hang once I enable interrupts again. I will
> look into what vector 0x83 is attached to, but does anyone have any
> ideas?
> 
> Thanks
> 
> James
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Feb-11 09:45 UTC

head link

Re: [Xen-devel] hang on restore in 3.3.1

On 11/02/2009 08:45, "James Harper"
<james.harper@bendigoit.com.au> wrote:
> Now I''m seeing the same thing but on vector 0x93 instead. There is
> nothing on that vector. It appears that when xen is restoring my domain,
> an interrupt line is getting ''stuck'' somehow, as the hang
occurs as soon
> as I enable interrupts after doing the restore... any suggestions?
Not for a line that isn''t connected up. Usually this is due to bad
restore
of the evtchn callback irq, or bad restore of irqs from qemu. With 3.3 you
could of course try reverting to the in-tree qemu (CONFIG_QEMU=ioemu) and
see if that makes the problem go away.
> Can anyone confirm that "INJ_VIRQ [ dom:vcpu = 0x00000062, vector >
0x83, fake = 0 ]" does actually imply that an interrupt is being set in
> my DomU, and that the vector is the actual offset into the vector table?
Yes that''s right.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Feb-11 09:49 UTC

head link

RE: [Xen-devel] hang on restore in 3.3.1

> On 11/02/2009 08:45, "James Harper"
<james.harper@bendigoit.com.au>
wrote:> 
> > Now I''m seeing the same thing but on vector 0x93 instead.
There is
> > nothing on that vector. It appears that when xen is restoring my
domain,> > an interrupt line is getting ''stuck'' somehow, as the
hang occurs as
soon> > as I enable interrupts after doing the restore... any suggestions?
> 
> Not for a line that isn''t connected up. Usually this is due to bad
restore> of the evtchn callback irq, or bad restore of irqs from qemu. With 3.3
you> could of course try reverting to the in-tree qemu (CONFIG_QEMU=ioemu)
and> see if that makes the problem go away.
I actually turn off the evtchn callback IRQ by setting it to 0. Even
when I don''t do this though the problem still occurs.

When I analyse the IRR in windows, before enabling interrupts again, I
can definitely see that the bit for vector 0x93 is set.

Time to go digging... any suggestions for places to look?

Thanks

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Feb-11 09:51 UTC

head link

RE: [Xen-devel] hang on restore in 3.3.1

> On 11/02/2009 08:45, "James Harper"
<james.harper@bendigoit.com.au>
wrote:> 
> > Now I''m seeing the same thing but on vector 0x93 instead.
There is
> > nothing on that vector. It appears that when xen is restoring my
domain,> > an interrupt line is getting ''stuck'' somehow, as the
hang occurs as
soon> > as I enable interrupts after doing the restore... any suggestions?
> 
> Not for a line that isn''t connected up. Usually this is due to bad
restore> of the evtchn callback irq, or bad restore of irqs from qemu. With 3.3
you> could of course try reverting to the in-tree qemu (CONFIG_QEMU=ioemu)
and> see if that makes the problem go away.
> 
What do you think the chances are of it being a qemu problem? the
xentrace code would indicate that it was the hypervisor asserting the
interrupt, but that wouldn''t preclude qemu from being the originator of
the interrupt would it?

Thanks

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Feb-11 10:21 UTC

head link

Re: [Xen-devel] hang on restore in 3.3.1

On 11/02/2009 09:51, "James Harper"
<james.harper@bendigoit.com.au> wrote:
> What do you think the chances are of it being a qemu problem? the
> xentrace code would indicate that it was the hypervisor asserting the
> interrupt, but that wouldn''t preclude qemu from being the
originator of
> the interrupt would it?
Most interrupts come from qemu, since it emulates most devices. Switching
your qemu is a pretty easy test (build internal old version rather than
out-of-tree new version).

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Feb-11 11:26 UTC

head link

RE: [Xen-devel] hang on restore in 3.3.1

> On 11/02/2009 09:51, "James Harper"
<james.harper@bendigoit.com.au>
wrote:> 
> > What do you think the chances are of it being a qemu problem? the
> > xentrace code would indicate that it was the hypervisor asserting
the> > interrupt, but that wouldn''t preclude qemu from being the
originator
of> > the interrupt would it?
> 
> Most interrupts come from qemu, since it emulates most devices.
Switching> your qemu is a pretty easy test (build internal old version rather
than> out-of-tree new version).
> 
I just rebooted with my GPLPV drivers inactive (eg not hiding qemu
devices, leaving the PV network device with ''disconnected'' and
not
enumerating block devices), and I found that an NDIS device is using
vector 0x93, which will be the qemu realtek device. I hide it on boot,
but I forgot to hide it again after the restore which will probably be
the cause of my problems... hopefully hiding it on restore again will
stop it generating interrupts!

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

James Harper

2009-Feb-11 13:08 UTC

head link

RE: [Xen-devel] hang on restore in 3.3.1

> > On 11/02/2009 09:51, "James Harper"
<james.harper@bendigoit.com.au>
> wrote:
> >
> > > What do you think the chances are of it being a qemu problem? the
> > > xentrace code would indicate that it was the hypervisor asserting
the> > > interrupt, but that wouldn''t preclude qemu from being
the
originator> of
> > > the interrupt would it?
> >
> > Most interrupts come from qemu, since it emulates most devices.
> Switching
> > your qemu is a pretty easy test (build internal old version rather
than> > out-of-tree new version).
> >
> 
> I just rebooted with my GPLPV drivers inactive (eg not hiding qemu
> devices, leaving the PV network device with
''disconnected'' and not
> enumerating block devices), and I found that an NDIS device is using
> vector 0x93, which will be the qemu realtek device. I hide it on boot,
but> I forgot to hide it again after the restore which will probably be the
> cause of my problems... hopefully hiding it on restore again will stop
it> generating interrupts!
Well it''s not the qemu realtek device like I thought it was - I tried
it
on a domain with no network interfaces at all. The other thing that uses
that vector is the USB interface.

I have noticed that after a restore, the restored computer has reverted
back to the ''mouse'' rather than the ''tablet''
driver... maybe there is
something in that? I''ll keep looking.

James


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Possibly Parallel Threads

Search for more reasonably related threads

Xen devel - Feb 2009 - hang on restore in 3.3.1

[Xen-devel] hang on restore in 3.3.1

RE: [Xen-devel] hang on restore in 3.3.1

Re: [Xen-devel] hang on restore in 3.3.1

RE: [Xen-devel] hang on restore in 3.3.1

RE: [Xen-devel] hang on restore in 3.3.1

Re: [Xen-devel] hang on restore in 3.3.1

RE: [Xen-devel] hang on restore in 3.3.1

RE: [Xen-devel] hang on restore in 3.3.1

Possibly Parallel Threads