Ian Jackson
2008-Jan-08 10:38 UTC
[Xen-devel] PV-on-HVM platform interrupt not properly cleared
If you use PV-on-HVM drivers with a guest with multiple vcpus, and the
guest uses the APIC to route interrupts to other than vcpu 0, the
interrupt asserted count may be cleared tardily after the interrupt is
serviced.
The problem occurs as follows:
* The PV guest configures the emulated IO-APIC to deliver the Xen
platform interrupt to a CPU other than 0. (PV-on-HVM drivers do
not use EVTCHNOP_bind_vcpu to bind the event channel to a
different vcpu; the event channel vcpu is left at 0.)
* An incoming event sets evtch_upcall_pending. At next VM entry
hvm_set_callback_irq_level sees this, and calls
__hvm_pci_intx_assert, which increments the relevant assert_count
and calls vioapic_deliver. This makes notes in IRR and TMR so
that VM entry will inject the actual interrupt.
* The guest''s ISR runs, clears evtch_upcall_pending, and eventually
finishes. It writes EOI, resulting in VM exit and a call to
vlapic_EOI_set. That clears the ISR and TMR flags, and then calls
vioapic_update_EOI.
* vioapic_update_EOI spots that the interrupt is still asserted in
hvm_irq->gsi_assert_count and reasserts IRR.
* We once more prepare for VM entry. If the vcpu is #0 then we
rerun hvm_set_callback_irq_level, which sees that upcall_pending
has been cleared and deasserts gsi_assert_count. However this is
too late to prevent interrupt injection (IRR is still set) and we
spuriously but essentially harmlessly enter the ISR in the guest
despite upcall_pending being clear. However, if the vcpu to be
run next is not #0 then the pre-locking test in
hvm_set_callback_irq_level triggers, and we don''t then recheck
upcall_pending.
The upshot is that interrupt will not be cleared until vcpu #0 is
scheduled for some reason. In my tests flood-pinging an
otherwise-idle Linux 2.6.18 guest, after a second or two the guest
reassigns the IRQ to a different vcpu, eventually returning to vcpu #0
and then the stuck interrupt gets cleared.
The preconditions are:
* Guest is using PV-on-HVM drivers
* Guest uses the (emulated) io-apic to route the Xen platform
interrupt to non-0 cpu(s) (ie, non-0 vcpus).
The symptoms are likely to be much more severe in setups where the
guest has more vcpus than there are physical cpus, or where there is a
shortage of cpu time. In that case there can obviously be a longer
wait for vcpu 0 to run.
I was reluctant to remove the v->vcpu_id != 0 test as that would
result in (in most setups) every VM entry taking out
hvm_domain.irq_lock. If that''s acceptable then removing that test
would be a much simpler change than my patch below.
Instead in the attached patch I split hvm_set_callback_irq_level up
slightly. This allows me to create an entrypoint which only ever
deasserts interrupts rather than asserting them. This is (I think)
suitable for calling from vlapic_EOI_set.
There is still a slight possible problem I think in some setups: even
with the attached change an interrupt bound in the (virtual) io-apic
to a non-zero vcpu will still not be recognised until the next VM
entry on vcpu 0, and will after than then still not run until the vcpu
chosen by the vioapic despatch logic runs.
I''ve done a simple ad-hoc test of this patch and it makes the symptoms
go away in my configuration.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel