Dave Lively
2006-May-18 20:17 UTC
[Xen-devel] [hvm] lost hd interrupts running native SMP guests
[I changed the subject line to reflect the current topic of
conversation. Maybe someone else is seeing this as well?]
I''m using the 64-bit SMP hypervisor, running on dual-CPU VT machines
(Dell 380, Dell SC430). Then I boot a dual-VCPU HVM guest (with VCPUs
bound to CPUs 0 and 1, respectively), running RedHat Enterprise Linux
4 U2 (64bit, smp kernel). Redhat calls this a 2.6.9 kernel, but it
includes a bunch of cherrypicked patches from later versions (through
roughly 2.6.12, as I remember). I''m enabling both APIC and ACPI in
the hvm domain builder, and using the 2-processor BIOS.
If I tell the Linux kernel "noapic" so that it avoids using the
IOAPIC, I boot and run just fine. Without "noapic", I''m
getting into
userspace and able to access the (QEMU-emulated) hd. But typically
while running my rc3.d scripts, I get: "hda: dma_timer_expiry
dma_status == 0x64", which stops any further progress. I''ve tried
disabling dma for hda in the guest ("ide=nodma"), and it still hangs
this time with no "dma_timer_expiry" message (and sometimes a
"hda:
lost interrupt" msg, though I don''t see that right now).
I tried the patch you just sent, but that doesn''t seem to help (even
when combined with my vioapic locking).
FWIW, I''ve attached my vioapic locking patch. I haven''t been
able to
verify this code yet, nor have I even given it a good look-over since
I first wrote it ... (This is *not* intended to be checked in yet.)
Dave
On 5/18/06, Jiang, Yunhong <yunhong.jiang@intel.com>
wrote:>
> >As I mentioned, I have a very similar patch to make the IOAPIC code
> >SMP safe. But since (even with these changes) I still see a huge
> >number of lost hda interrupts when using the IOAPIC on SMP guests, I
> >haven''t been able to test it yet. I assume others see the
same
> >problems with the IOAPIC?? (I''ll be diving into this soon --
> >probably tonight or tomorrow. At this point I have no clue
what''s
> >going wrong.)
>
> On which situation will the IOAPIC has a lot of hd lost interrupt?
>
> What''s the guest kernel version are you using? I remember some old
> version kernel has problem.
>
> Also there is a bug on the round robin code.Current code will always
> leads interrupt to vcpu 0.
> Followed is the fix for it. But this fix cause problem for timer
> interrupt, I''m not sure the cause, but I suspect it is because the
timer
> is injected in flood.
>
> The below fix is based one of my another APIC patch , so not sure if you
> can apply it directly, but I think you can figure out the changes
> easily.
>
> Thanks
> Yunhong Jiang
>
> diff -r 86d8246c6aff xen/arch/x86/hvm/vlapic.c
> --- a/xen/arch/x86/hvm/vlapic.c Wed May 17 23:15:36 2006 +0100
> +++ b/xen/arch/x86/hvm/vlapic.c Thu May 18 22:30:06 2006 +0800
> @@ -308,8 +308,15 @@ struct vlapic* apic_round_robin(struct d
>
> old = next = d->arch.hvm_domain.round_info[vector];
>
> - do {
> - /* the vcpu array is arranged according to vcpu_id */
> + /* the vcpu array is arranged according to vcpu_id */
> + do
> + {
> + next ++;
> + if ( !d->vcpu[next] ||
> + !test_bit(_VCPUF_initialised,
&d->vcpu[next]->vcpu_flags) ||
> + next == MAX_VIRT_CPUS )
> + next = 0;
> +
> if ( test_bit(next, &bitmap) )
> {
> target = d->vcpu[next]->arch.hvm_vcpu.vlapic;
> @@ -321,12 +328,6 @@ struct vlapic* apic_round_robin(struct d
> }
> break;
> }
> -
> - next ++;
> - if ( !d->vcpu[next] ||
> - !test_bit(_VCPUF_initialised,
&d->vcpu[next]->vcpu_flags)
> ||
> - next == MAX_VIRT_CPUS )
> - next = 0;
> } while ( next != old );
>
> d->arch.hvm_domain.round_info[vector] = next;
> ~
>
>
> >
> >Dave
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xensource.com
> >http://lists.xensource.com/xen-devel
> >
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel