Ian Jackson
2008-Jan-08 10:38 UTC
[Xen-devel] PV-on-HVM platform interrupt not properly cleared
If you use PV-on-HVM drivers with a guest with multiple vcpus, and the guest uses the APIC to route interrupts to other than vcpu 0, the interrupt asserted count may be cleared tardily after the interrupt is serviced. The problem occurs as follows: * The PV guest configures the emulated IO-APIC to deliver the Xen platform interrupt to a CPU other than 0. (PV-on-HVM drivers do not use EVTCHNOP_bind_vcpu to bind the event channel to a different vcpu; the event channel vcpu is left at 0.) * An incoming event sets evtch_upcall_pending. At next VM entry hvm_set_callback_irq_level sees this, and calls __hvm_pci_intx_assert, which increments the relevant assert_count and calls vioapic_deliver. This makes notes in IRR and TMR so that VM entry will inject the actual interrupt. * The guest''s ISR runs, clears evtch_upcall_pending, and eventually finishes. It writes EOI, resulting in VM exit and a call to vlapic_EOI_set. That clears the ISR and TMR flags, and then calls vioapic_update_EOI. * vioapic_update_EOI spots that the interrupt is still asserted in hvm_irq->gsi_assert_count and reasserts IRR. * We once more prepare for VM entry. If the vcpu is #0 then we rerun hvm_set_callback_irq_level, which sees that upcall_pending has been cleared and deasserts gsi_assert_count. However this is too late to prevent interrupt injection (IRR is still set) and we spuriously but essentially harmlessly enter the ISR in the guest despite upcall_pending being clear. However, if the vcpu to be run next is not #0 then the pre-locking test in hvm_set_callback_irq_level triggers, and we don''t then recheck upcall_pending. The upshot is that interrupt will not be cleared until vcpu #0 is scheduled for some reason. In my tests flood-pinging an otherwise-idle Linux 2.6.18 guest, after a second or two the guest reassigns the IRQ to a different vcpu, eventually returning to vcpu #0 and then the stuck interrupt gets cleared. The preconditions are: * Guest is using PV-on-HVM drivers * Guest uses the (emulated) io-apic to route the Xen platform interrupt to non-0 cpu(s) (ie, non-0 vcpus). The symptoms are likely to be much more severe in setups where the guest has more vcpus than there are physical cpus, or where there is a shortage of cpu time. In that case there can obviously be a longer wait for vcpu 0 to run. I was reluctant to remove the v->vcpu_id != 0 test as that would result in (in most setups) every VM entry taking out hvm_domain.irq_lock. If that''s acceptable then removing that test would be a much simpler change than my patch below. Instead in the attached patch I split hvm_set_callback_irq_level up slightly. This allows me to create an entrypoint which only ever deasserts interrupts rather than asserting them. This is (I think) suitable for calling from vlapic_EOI_set. There is still a slight possible problem I think in some setups: even with the attached change an interrupt bound in the (virtual) io-apic to a non-zero vcpu will still not be recognised until the next VM entry on vcpu 0, and will after than then still not run until the vcpu chosen by the vioapic despatch logic runs. I''ve done a simple ad-hoc test of this patch and it makes the symptoms go away in my configuration. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel