I am having problems with save/restore under 3.3.1 in the GPLPV drivers. I call hvm_shutdown(xpdd, SHUTDOWN_suspend), but as soon as I lower IRQL (enabling interrupts), qemu goes to 100% CPU and the DomU load goes right up too. Xentrace is showing a whole lot of this going on: CPU0 200130258143212 (+ 770) hypercall [ rip 0x000000008020632a, eax = 0xffffffff ] CPU0 200130258151107 (+ 7895) hypercall [ rip 0x00000000802062eb, eax = 0xffffffff ] CPU0 200130258156293 (+ 5186) hypercall [ rip 0x00000000802062eb, eax = 0xffffffff ] CPU0 200130258161233 (+ 4940) hypercall [ rip 0x00000000802062eb, eax = 0xffffffff ] CPU0 200130258165467 (+ 4234) hypercall [ rip 0x000000008020640a, eax = 0xffffffff ] CPU0 200130258167202 (+ 1735) domain_wake [ domid 0x00000062, edomid = 0x00000000 ] CPU0 200130258168511 (+ 1309) switch_infprev [ old_domid 0x00000000, runtime = 31143 ] CPU0 200130258168716 (+ 205) switch_infnext [ new_domid 0x00000062, time = 786, r_time = 30000000 ] CPU0 200130258169338 (+ 622) __enter_scheduler [ prev<domid:edomid> = 0x00000000 : 0x00000000, next<domid:edomid> 0x00000062 : 0x00000000 ] CPU0 200130258175532 (+ 6194) VMENTRY [ dom:vcpu = 0x00000062 ] CPU0 200130258179633 (+ 4101) VMEXIT [ dom:vcpu = 0x00000062, exitcode = 0x0000004e, rIP = 0x0000000080a562b9 ] CPU0 0 (+ 0) MMIO_AST_WR [ address = 0xfee000b0, data 0x00000000 ] CPU0 0 (+ 0) PF_XEN [ dom:vcpu = 0x00000062, errorcode 0x0b, virt = 0xfffe00b0 ] CPU0 0 (+ 0) INJ_VIRQ [ dom:vcpu = 0x00000062, vector = 0x00, fake = 1 ] CPU0 200130258185932 (+ 6299) VMENTRY [ dom:vcpu = 0x00000062 ] CPU0 200130258189737 (+ 3805) VMEXIT [ dom:vcpu = 0x00000062, exitcode = 0x00000064, rIP = 0x0000000080a560ad ] CPU0 0 (+ 0) INJ_VIRQ [ dom:vcpu = 0x00000062, vector = 0x83, fake = 0 ] CPU0 200130258190990 (+ 1253) VMENTRY [ dom:vcpu = 0x00000062 ] CPU0 200130258194791 (+ 3801) VMEXIT [ dom:vcpu = 0x00000062, exitcode = 0x0000007b, rIP = 0x0000000080a5a29e ] CPU0 0 (+ 0) IO_ASSIST [ dom:vcpu = 0x0000c202, data = 0x0000 ] CPU0 200130258198944 (+ 4153) switch_infprev [ old_domid 0x00000062, runtime = 17087 ] CPU0 200130258199132 (+ 188) switch_infnext [ new_domid 0x00000000, time = 17087, r_time = 30000000 ] CPU0 200130258199702 (+ 570) __enter_scheduler [ prev<domid:edomid> = 0x00000062 : 0x00000000, next<domid:edomid> 0x00000000 : 0x00000000 ] CPU0 200130258206470 (+ 6768) hypercall [ rip 0x00000000802062eb, eax = 0xffffffff ] CPU0 200130258210964 (+ 4494) hypercall [ rip 0x00000000802062eb, eax = 0xffffffff ] CPU0 200130258214767 (+ 3803) hypercall [ rip 0x00000000802062eb, eax = 0xffffffff ] CPU0 200130258218019 (+ 3252) hypercall [ rip 0x00000000802062eb, eax = 0xffffffff ] CPU0 200130258227419 (+ 9400) hypercall [ rip 0x00000000802062eb, eax = 0xffffffff ] It kind of looks like vector 0x83 is being fired over and over, which would explain why things hang once I enable interrupts again. I will look into what vector 0x83 is attached to, but does anyone have any ideas? Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Now I''m seeing the same thing but on vector 0x93 instead. There is nothing on that vector. It appears that when xen is restoring my domain, an interrupt line is getting ''stuck'' somehow, as the hang occurs as soon as I enable interrupts after doing the restore... any suggestions? Can anyone confirm that "INJ_VIRQ [ dom:vcpu = 0x00000062, vector 0x83, fake = 0 ]" does actually imply that an interrupt is being set in my DomU, and that the vector is the actual offset into the vector table? Thanks James> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of James Harper > Sent: Tuesday, 10 February 2009 12:46 > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] hang on restore in 3.3.1 > > I am having problems with save/restore under 3.3.1 in the GPLPVdrivers.> I call hvm_shutdown(xpdd, SHUTDOWN_suspend), but as soon as I lowerIRQL> (enabling interrupts), qemu goes to 100% CPU and the DomU load goes > right up too. > > Xentrace is showing a whole lot of this going on: > > > CPU0 200130258143212 (+ 770) hypercall [ rip > 0x000000008020632a, eax = 0xffffffff ] > CPU0 200130258151107 (+ 7895) hypercall [ rip > 0x00000000802062eb, eax = 0xffffffff ] > CPU0 200130258156293 (+ 5186) hypercall [ rip > 0x00000000802062eb, eax = 0xffffffff ] > CPU0 200130258161233 (+ 4940) hypercall [ rip > 0x00000000802062eb, eax = 0xffffffff ] > CPU0 200130258165467 (+ 4234) hypercall [ rip > 0x000000008020640a, eax = 0xffffffff ] > CPU0 200130258167202 (+ 1735) domain_wake [ domid > 0x00000062, edomid = 0x00000000 ] > CPU0 200130258168511 (+ 1309) switch_infprev [ old_domid > 0x00000000, runtime = 31143 ] > CPU0 200130258168716 (+ 205) switch_infnext [ new_domid > 0x00000062, time = 786, r_time = 30000000 ] > CPU0 200130258169338 (+ 622) __enter_scheduler [ > prev<domid:edomid> = 0x00000000 : 0x00000000, next<domid:edomid> > 0x00000062 : 0x00000000 ] > CPU0 200130258175532 (+ 6194) VMENTRY [ dom:vcpu = 0x00000062]> CPU0 200130258179633 (+ 4101) VMEXIT [ dom:vcpu 0x00000062, > exitcode = 0x0000004e, rIP = 0x0000000080a562b9 ] > CPU0 0 (+ 0) MMIO_AST_WR [ address = 0xfee000b0, data > 0x00000000 ] > CPU0 0 (+ 0) PF_XEN [ dom:vcpu = 0x00000062, errorcode > 0x0b, virt = 0xfffe00b0 ] > CPU0 0 (+ 0) INJ_VIRQ [ dom:vcpu = 0x00000062, vector 0x00, > fake = 1 ] > CPU0 200130258185932 (+ 6299) VMENTRY [ dom:vcpu = 0x00000062]> CPU0 200130258189737 (+ 3805) VMEXIT [ dom:vcpu 0x00000062, > exitcode = 0x00000064, rIP = 0x0000000080a560ad ] > CPU0 0 (+ 0) INJ_VIRQ [ dom:vcpu = 0x00000062, vector 0x83, > fake = 0 ] > CPU0 200130258190990 (+ 1253) VMENTRY [ dom:vcpu = 0x00000062]> CPU0 200130258194791 (+ 3801) VMEXIT [ dom:vcpu 0x00000062, > exitcode = 0x0000007b, rIP = 0x0000000080a5a29e ] > CPU0 0 (+ 0) IO_ASSIST [ dom:vcpu = 0x0000c202, data 0x0000 > ] > CPU0 200130258198944 (+ 4153) switch_infprev [ old_domid > 0x00000062, runtime = 17087 ] > CPU0 200130258199132 (+ 188) switch_infnext [ new_domid > 0x00000000, time = 17087, r_time = 30000000 ] > CPU0 200130258199702 (+ 570) __enter_scheduler [ > prev<domid:edomid> = 0x00000062 : 0x00000000, next<domid:edomid> > 0x00000000 : 0x00000000 ] > CPU0 200130258206470 (+ 6768) hypercall [ rip > 0x00000000802062eb, eax = 0xffffffff ] > CPU0 200130258210964 (+ 4494) hypercall [ rip > 0x00000000802062eb, eax = 0xffffffff ] > CPU0 200130258214767 (+ 3803) hypercall [ rip > 0x00000000802062eb, eax = 0xffffffff ] > CPU0 200130258218019 (+ 3252) hypercall [ rip > 0x00000000802062eb, eax = 0xffffffff ] > CPU0 200130258227419 (+ 9400) hypercall [ rip > 0x00000000802062eb, eax = 0xffffffff ] > > It kind of looks like vector 0x83 is being fired over and over, which > would explain why things hang once I enable interrupts again. I will > look into what vector 0x83 is attached to, but does anyone have any > ideas? > > Thanks > > James > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/02/2009 08:45, "James Harper" <james.harper@bendigoit.com.au> wrote:> Now I''m seeing the same thing but on vector 0x93 instead. There is > nothing on that vector. It appears that when xen is restoring my domain, > an interrupt line is getting ''stuck'' somehow, as the hang occurs as soon > as I enable interrupts after doing the restore... any suggestions?Not for a line that isn''t connected up. Usually this is due to bad restore of the evtchn callback irq, or bad restore of irqs from qemu. With 3.3 you could of course try reverting to the in-tree qemu (CONFIG_QEMU=ioemu) and see if that makes the problem go away.> Can anyone confirm that "INJ_VIRQ [ dom:vcpu = 0x00000062, vector > 0x83, fake = 0 ]" does actually imply that an interrupt is being set in > my DomU, and that the vector is the actual offset into the vector table?Yes that''s right. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> On 11/02/2009 08:45, "James Harper" <james.harper@bendigoit.com.au>wrote:> > > Now I''m seeing the same thing but on vector 0x93 instead. There is > > nothing on that vector. It appears that when xen is restoring mydomain,> > an interrupt line is getting ''stuck'' somehow, as the hang occurs assoon> > as I enable interrupts after doing the restore... any suggestions? > > Not for a line that isn''t connected up. Usually this is due to badrestore> of the evtchn callback irq, or bad restore of irqs from qemu. With 3.3you> could of course try reverting to the in-tree qemu (CONFIG_QEMU=ioemu)and> see if that makes the problem go away.I actually turn off the evtchn callback IRQ by setting it to 0. Even when I don''t do this though the problem still occurs. When I analyse the IRR in windows, before enabling interrupts again, I can definitely see that the bit for vector 0x93 is set. Time to go digging... any suggestions for places to look? Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> On 11/02/2009 08:45, "James Harper" <james.harper@bendigoit.com.au>wrote:> > > Now I''m seeing the same thing but on vector 0x93 instead. There is > > nothing on that vector. It appears that when xen is restoring mydomain,> > an interrupt line is getting ''stuck'' somehow, as the hang occurs assoon> > as I enable interrupts after doing the restore... any suggestions? > > Not for a line that isn''t connected up. Usually this is due to badrestore> of the evtchn callback irq, or bad restore of irqs from qemu. With 3.3you> could of course try reverting to the in-tree qemu (CONFIG_QEMU=ioemu)and> see if that makes the problem go away. >What do you think the chances are of it being a qemu problem? the xentrace code would indicate that it was the hypervisor asserting the interrupt, but that wouldn''t preclude qemu from being the originator of the interrupt would it? Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/02/2009 09:51, "James Harper" <james.harper@bendigoit.com.au> wrote:> What do you think the chances are of it being a qemu problem? the > xentrace code would indicate that it was the hypervisor asserting the > interrupt, but that wouldn''t preclude qemu from being the originator of > the interrupt would it?Most interrupts come from qemu, since it emulates most devices. Switching your qemu is a pretty easy test (build internal old version rather than out-of-tree new version). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> On 11/02/2009 09:51, "James Harper" <james.harper@bendigoit.com.au>wrote:> > > What do you think the chances are of it being a qemu problem? the > > xentrace code would indicate that it was the hypervisor assertingthe> > interrupt, but that wouldn''t preclude qemu from being the originatorof> > the interrupt would it? > > Most interrupts come from qemu, since it emulates most devices.Switching> your qemu is a pretty easy test (build internal old version ratherthan> out-of-tree new version). >I just rebooted with my GPLPV drivers inactive (eg not hiding qemu devices, leaving the PV network device with ''disconnected'' and not enumerating block devices), and I found that an NDIS device is using vector 0x93, which will be the qemu realtek device. I hide it on boot, but I forgot to hide it again after the restore which will probably be the cause of my problems... hopefully hiding it on restore again will stop it generating interrupts! James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > On 11/02/2009 09:51, "James Harper" <james.harper@bendigoit.com.au> > wrote: > > > > > What do you think the chances are of it being a qemu problem? the > > > xentrace code would indicate that it was the hypervisor assertingthe> > > interrupt, but that wouldn''t preclude qemu from being theoriginator> of > > > the interrupt would it? > > > > Most interrupts come from qemu, since it emulates most devices. > Switching > > your qemu is a pretty easy test (build internal old version ratherthan> > out-of-tree new version). > > > > I just rebooted with my GPLPV drivers inactive (eg not hiding qemu > devices, leaving the PV network device with ''disconnected'' and not > enumerating block devices), and I found that an NDIS device is using > vector 0x93, which will be the qemu realtek device. I hide it on boot,but> I forgot to hide it again after the restore which will probably be the > cause of my problems... hopefully hiding it on restore again will stopit> generating interrupts!Well it''s not the qemu realtek device like I thought it was - I tried it on a domain with no network interfaces at all. The other thing that uses that vector is the USB interface. I have noticed that after a restore, the restored computer has reverted back to the ''mouse'' rather than the ''tablet'' driver... maybe there is something in that? I''ll keep looking. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Apparently Analagous Threads
- [PATCH][xentrace][HVM] introduce HVM tracing to unify SVM and VMX tracing
- how to use xentrace and xentrace_format
- Problems with enabling hypervisor C and P-state control
- Does xentrace write into buffers by default?
- [PATCH] xentrace event mask for memory management class