thr3ads.net - Xen devel - [Xen-devel] PV on HVM guest hang... [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Mukesh Rathor

2008-Aug-27 03:57 UTC

[Xen-devel] PV on HVM guest hang...

I''m debugging a hang of 64bit HVM guest with PV drivers. The problem
happens
during migrate. So far I''ve discovered that the guest is stuck in loop 
receiving interrupt 0xa9/169. In the hypervisor I see that upon vmx exit, it 
sends 0xa9 right away...

(XEN)    [<ffff828c80152680>] vlapic_test_and_set_irr+0x0/0x40   :0xa9
(XEN)    [<ffff828c80151d35>] ioapic_inj_irq+0x95/0x150
(XEN)    [<ffff828c801521d0>] vioapic_deliver+0x3e0/0x440
(XEN)    [<ffff828c801522df>] vioapic_update_EOI+0xaf/0xc0
(XEN)    [<ffff828c8015394b>] vlapic_write+0x2eb/0x7e0
(XEN)    [<ffff828c8014a630>] hvm_mmio_intercept+0xa0/0x360
(XEN)    [<ffff828c8014d03f>] send_mmio_req+0x14f/0x1b0
(XEN)    [<ffff828c8014e568>] mmio_operands+0xa8/0x160
(XEN)    [<ffff828c8014eb96>] handle_mmio+0x576/0x880
(XEN)    [<ffff828c801632b2>] vmx_vmexit_handler+0x1832/0x1900


I''m now trying ot figure out the IP that causes vm exit so I can figure
where
in the guest/guest-driver its writing to the APIC.
On the guest side, I see that evtchn_pending_sel is not set in 
evtchn_interrupt().

Any ideas/suggestions would be great as it is a critical bug.

Thanks
Mukesh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2008-Sep-25 19:10 UTC

head link

[Xen-devel] Re: PV on HVM guest hang...

I was able to finally track this down. Basically, on source machine, if
there''s
an event for the guest at the right moment during live migration, the line
is asserted via the pci_intx.i bit in:

__hvm_pci_intx_assert():

     if ( __test_and_set_bit(device*4 + intx, &hvm_irq->pci_intx.i) )
<-----
             return;

when moved to target, this gets carried over, and gsi is asserted again:

irq_load_pci():

             if ( test_bit(dev*4 + intx, &hvm_irq->pci_intx.i) )
             {
                 /* Direct GSI assert */
                 gsi = hvm_pci_intx_gsi(dev, intx);
                 hvm_irq->gsi_assert_count[gsi]++;   <---
                 /* PCI-ISA bridge assert */
                 link = hvm_pci_intx_link(dev, intx);
                 hvm_irq->pci_link_assert_count[link]++;
             }

As soon as it gets a xen_platform_pci event, the assert count causes it
to be delivered in a loop, hence the guest hang.

My simple fix is to just check for mask:

vioapic_masked():
.....
+    gsi = hvm_pci_intx_gsi(device, intx);
+    if (vioapic_masked(d, gsi))
+       return;
+

vioapic.c:
+int vioapic_masked(struct domain *d, unsigned int irq)
+{
+    struct hvm_hw_vioapic *vioapic = domain_vioapic(d);
+    union vioapic_redir_entry *ent;
+
+    ent = &vioapic->redirtbl[irq];
+    if ( ent->fields.mask )
+        return 1;
+
+    return 0;
+}
+

This seems to work, but not sure if it''s the best fix, and currently
waiting
for feedback from intel, and others here now.

Thanks
mukesh


Sheng Liang wrote:
 > Mukesh,
 >
 > Did you ever get a response to this? Were you able to track it down?
 >
 > Sheng
 >
 > On Tue, Aug 26, 2008 at 8:57 PM, Mukesh Rathor
<mukesh.rathor@oracle.com
 > <mailto:mukesh.rathor@oracle.com>> wrote:
 >
 >     I''m debugging a hang of 64bit HVM guest with PV drivers. The
problem
 >     happens during migrate. So far I''ve discovered that the guest
is
 >     stuck in loop receiving interrupt 0xa9/169. In the hypervisor I see
 >     that upon vmx exit, it sends 0xa9 right away...
 >
 >     (XEN)    [<ffff828c80152680>] vlapic_test_and_set_irr+0x0/0x40  
:0xa9
 >     (XEN)    [<ffff828c80151d35>] ioapic_inj_irq+0x95/0x150
 >     (XEN)    [<ffff828c801521d0>] vioapic_deliver+0x3e0/0x440
 >     (XEN)    [<ffff828c801522df>] vioapic_update_EOI+0xaf/0xc0
 >     (XEN)    [<ffff828c8015394b>] vlapic_write+0x2eb/0x7e0
 >     (XEN)    [<ffff828c8014a630>] hvm_mmio_intercept+0xa0/0x360
 >     (XEN)    [<ffff828c8014d03f>] send_mmio_req+0x14f/0x1b0
 >     (XEN)    [<ffff828c8014e568>] mmio_operands+0xa8/0x160
 >     (XEN)    [<ffff828c8014eb96>] handle_mmio+0x576/0x880
 >     (XEN)    [<ffff828c801632b2>] vmx_vmexit_handler+0x1832/0x1900
 >
 >
 >     I''m now trying ot figure out the IP that causes vm exit so I
can
 >     figure where in the guest/guest-driver its writing to the APIC.
 >     On the guest side, I see that evtchn_pending_sel is not set in
 >     evtchn_interrupt().
 >
 >     Any ideas/suggestions would be great as it is a critical bug.
 >
 >     Thanks
 >     Mukesh
 >
 >     _______________________________________________
 >     Xen-devel mailing list
 >     Xen-devel@lists.xensource.com
<mailto:Xen-devel@lists.xensource.com>
 >     http://lists.xensource.com/xen-devel
 >
 >

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Sep-26 07:06 UTC

head link

Re: [Xen-devel] Re: PV on HVM guest hang...

On 25/9/08 20:10, "Mukesh Rathor" <mukesh.rathor@oracle.com>
wrote:
> This seems to work, but not sure if it''s the best fix, and
currently waiting
> for feedback from intel, and others here now.
The fix is bogus since assertion of a GSI (basically an IO-APIC input pin)
should not be dependent on whether the pin is masked in the IO-APIC -- the
input pin ''voltage level'' is obviously not affected by the
masked/not-masked.

This is *supposed* to just work (assuming it is the PV-on-HVM IRQ that is
getting stuck asserted). See the explicit logic to deassert and then
reassert the PV-on-HVM INTx line in irq_save_pci().

My guess would be that you are using 3.1 branch, where that fix was never
applied (not sure why; possibly I missed it by accident). You want changeset
15691 from xen-unstable.hg.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mukesh Rathor

2008-Oct-03 02:18 UTC

head link

Re: [Xen-devel] Re: PV on HVM guest hang...

Keir Fraser wrote:
 > On 25/9/08 20:10, "Mukesh Rathor"
<mukesh.rathor@oracle.com> wrote:
 >
 >> This seems to work, but not sure if it''s the best fix, and
currently waiting
 >> for feedback from intel, and others here now.
 >
 > The fix is bogus since assertion of a GSI (basically an IO-APIC input pin)
 > should not be dependent on whether the pin is masked in the IO-APIC -- the
 > input pin ''voltage level'' is obviously not affected by
the
 > masked/not-masked.

yeah, it was a shot in the dark... forgot lot of that since college :)..

 > This is *supposed* to just work (assuming it is the PV-on-HVM IRQ that is
 > getting stuck asserted). See the explicit logic to deassert and then
 > reassert the PV-on-HVM INTx line in irq_save_pci().
 >
 > My guess would be that you are using 3.1 branch, where that fix was never
 > applied (not sure why; possibly I missed it by accident). You want
changeset
 > 15691 from xen-unstable.hg.

Correct, 3.1.4. Got the changeset, and looks like it''s fixed now.
Thanks as always... Mukesh


 >  -- Keir
 >

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2008 - PV on HVM guest hang...

[Xen-devel] PV on HVM guest hang...

[Xen-devel] Re: PV on HVM guest hang...

Re: [Xen-devel] Re: PV on HVM guest hang...

Re: [Xen-devel] Re: PV on HVM guest hang...