I''m debugging a hang of 64bit HVM guest with PV drivers. The problem happens during migrate. So far I''ve discovered that the guest is stuck in loop receiving interrupt 0xa9/169. In the hypervisor I see that upon vmx exit, it sends 0xa9 right away... (XEN) [<ffff828c80152680>] vlapic_test_and_set_irr+0x0/0x40 :0xa9 (XEN) [<ffff828c80151d35>] ioapic_inj_irq+0x95/0x150 (XEN) [<ffff828c801521d0>] vioapic_deliver+0x3e0/0x440 (XEN) [<ffff828c801522df>] vioapic_update_EOI+0xaf/0xc0 (XEN) [<ffff828c8015394b>] vlapic_write+0x2eb/0x7e0 (XEN) [<ffff828c8014a630>] hvm_mmio_intercept+0xa0/0x360 (XEN) [<ffff828c8014d03f>] send_mmio_req+0x14f/0x1b0 (XEN) [<ffff828c8014e568>] mmio_operands+0xa8/0x160 (XEN) [<ffff828c8014eb96>] handle_mmio+0x576/0x880 (XEN) [<ffff828c801632b2>] vmx_vmexit_handler+0x1832/0x1900 I''m now trying ot figure out the IP that causes vm exit so I can figure where in the guest/guest-driver its writing to the APIC. On the guest side, I see that evtchn_pending_sel is not set in evtchn_interrupt(). Any ideas/suggestions would be great as it is a critical bug. Thanks Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I was able to finally track this down. Basically, on source machine, if there''s an event for the guest at the right moment during live migration, the line is asserted via the pci_intx.i bit in: __hvm_pci_intx_assert(): if ( __test_and_set_bit(device*4 + intx, &hvm_irq->pci_intx.i) ) <----- return; when moved to target, this gets carried over, and gsi is asserted again: irq_load_pci(): if ( test_bit(dev*4 + intx, &hvm_irq->pci_intx.i) ) { /* Direct GSI assert */ gsi = hvm_pci_intx_gsi(dev, intx); hvm_irq->gsi_assert_count[gsi]++; <--- /* PCI-ISA bridge assert */ link = hvm_pci_intx_link(dev, intx); hvm_irq->pci_link_assert_count[link]++; } As soon as it gets a xen_platform_pci event, the assert count causes it to be delivered in a loop, hence the guest hang. My simple fix is to just check for mask: vioapic_masked(): ..... + gsi = hvm_pci_intx_gsi(device, intx); + if (vioapic_masked(d, gsi)) + return; + vioapic.c: +int vioapic_masked(struct domain *d, unsigned int irq) +{ + struct hvm_hw_vioapic *vioapic = domain_vioapic(d); + union vioapic_redir_entry *ent; + + ent = &vioapic->redirtbl[irq]; + if ( ent->fields.mask ) + return 1; + + return 0; +} + This seems to work, but not sure if it''s the best fix, and currently waiting for feedback from intel, and others here now. Thanks mukesh Sheng Liang wrote: > Mukesh, > > Did you ever get a response to this? Were you able to track it down? > > Sheng > > On Tue, Aug 26, 2008 at 8:57 PM, Mukesh Rathor <mukesh.rathor@oracle.com > <mailto:mukesh.rathor@oracle.com>> wrote: > > I''m debugging a hang of 64bit HVM guest with PV drivers. The problem > happens during migrate. So far I''ve discovered that the guest is > stuck in loop receiving interrupt 0xa9/169. In the hypervisor I see > that upon vmx exit, it sends 0xa9 right away... > > (XEN) [<ffff828c80152680>] vlapic_test_and_set_irr+0x0/0x40 :0xa9 > (XEN) [<ffff828c80151d35>] ioapic_inj_irq+0x95/0x150 > (XEN) [<ffff828c801521d0>] vioapic_deliver+0x3e0/0x440 > (XEN) [<ffff828c801522df>] vioapic_update_EOI+0xaf/0xc0 > (XEN) [<ffff828c8015394b>] vlapic_write+0x2eb/0x7e0 > (XEN) [<ffff828c8014a630>] hvm_mmio_intercept+0xa0/0x360 > (XEN) [<ffff828c8014d03f>] send_mmio_req+0x14f/0x1b0 > (XEN) [<ffff828c8014e568>] mmio_operands+0xa8/0x160 > (XEN) [<ffff828c8014eb96>] handle_mmio+0x576/0x880 > (XEN) [<ffff828c801632b2>] vmx_vmexit_handler+0x1832/0x1900 > > > I''m now trying ot figure out the IP that causes vm exit so I can > figure where in the guest/guest-driver its writing to the APIC. > On the guest side, I see that evtchn_pending_sel is not set in > evtchn_interrupt(). > > Any ideas/suggestions would be great as it is a critical bug. > > Thanks > Mukesh > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com <mailto:Xen-devel@lists.xensource.com> > http://lists.xensource.com/xen-devel > > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25/9/08 20:10, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote:> This seems to work, but not sure if it''s the best fix, and currently waiting > for feedback from intel, and others here now.The fix is bogus since assertion of a GSI (basically an IO-APIC input pin) should not be dependent on whether the pin is masked in the IO-APIC -- the input pin ''voltage level'' is obviously not affected by the masked/not-masked. This is *supposed* to just work (assuming it is the PV-on-HVM IRQ that is getting stuck asserted). See the explicit logic to deassert and then reassert the PV-on-HVM INTx line in irq_save_pci(). My guess would be that you are using 3.1 branch, where that fix was never applied (not sure why; possibly I missed it by accident). You want changeset 15691 from xen-unstable.hg. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote: > On 25/9/08 20:10, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > >> This seems to work, but not sure if it''s the best fix, and currently waiting >> for feedback from intel, and others here now. > > The fix is bogus since assertion of a GSI (basically an IO-APIC input pin) > should not be dependent on whether the pin is masked in the IO-APIC -- the > input pin ''voltage level'' is obviously not affected by the > masked/not-masked. yeah, it was a shot in the dark... forgot lot of that since college :).. > This is *supposed* to just work (assuming it is the PV-on-HVM IRQ that is > getting stuck asserted). See the explicit logic to deassert and then > reassert the PV-on-HVM INTx line in irq_save_pci(). > > My guess would be that you are using 3.1 branch, where that fix was never > applied (not sure why; possibly I missed it by accident). You want changeset > 15691 from xen-unstable.hg. Correct, 3.1.4. Got the changeset, and looks like it''s fixed now. Thanks as always... Mukesh > -- Keir > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel