Dave Lively
2006-May-18 20:17 UTC
[Xen-devel] [hvm] lost hd interrupts running native SMP guests
[I changed the subject line to reflect the current topic of conversation. Maybe someone else is seeing this as well?] I''m using the 64-bit SMP hypervisor, running on dual-CPU VT machines (Dell 380, Dell SC430). Then I boot a dual-VCPU HVM guest (with VCPUs bound to CPUs 0 and 1, respectively), running RedHat Enterprise Linux 4 U2 (64bit, smp kernel). Redhat calls this a 2.6.9 kernel, but it includes a bunch of cherrypicked patches from later versions (through roughly 2.6.12, as I remember). I''m enabling both APIC and ACPI in the hvm domain builder, and using the 2-processor BIOS. If I tell the Linux kernel "noapic" so that it avoids using the IOAPIC, I boot and run just fine. Without "noapic", I''m getting into userspace and able to access the (QEMU-emulated) hd. But typically while running my rc3.d scripts, I get: "hda: dma_timer_expiry dma_status == 0x64", which stops any further progress. I''ve tried disabling dma for hda in the guest ("ide=nodma"), and it still hangs this time with no "dma_timer_expiry" message (and sometimes a "hda: lost interrupt" msg, though I don''t see that right now). I tried the patch you just sent, but that doesn''t seem to help (even when combined with my vioapic locking). FWIW, I''ve attached my vioapic locking patch. I haven''t been able to verify this code yet, nor have I even given it a good look-over since I first wrote it ... (This is *not* intended to be checked in yet.) Dave On 5/18/06, Jiang, Yunhong <yunhong.jiang@intel.com> wrote:> > >As I mentioned, I have a very similar patch to make the IOAPIC code > >SMP safe. But since (even with these changes) I still see a huge > >number of lost hda interrupts when using the IOAPIC on SMP guests, I > >haven''t been able to test it yet. I assume others see the same > >problems with the IOAPIC?? (I''ll be diving into this soon -- > >probably tonight or tomorrow. At this point I have no clue what''s > >going wrong.) > > On which situation will the IOAPIC has a lot of hd lost interrupt? > > What''s the guest kernel version are you using? I remember some old > version kernel has problem. > > Also there is a bug on the round robin code.Current code will always > leads interrupt to vcpu 0. > Followed is the fix for it. But this fix cause problem for timer > interrupt, I''m not sure the cause, but I suspect it is because the timer > is injected in flood. > > The below fix is based one of my another APIC patch , so not sure if you > can apply it directly, but I think you can figure out the changes > easily. > > Thanks > Yunhong Jiang > > diff -r 86d8246c6aff xen/arch/x86/hvm/vlapic.c > --- a/xen/arch/x86/hvm/vlapic.c Wed May 17 23:15:36 2006 +0100 > +++ b/xen/arch/x86/hvm/vlapic.c Thu May 18 22:30:06 2006 +0800 > @@ -308,8 +308,15 @@ struct vlapic* apic_round_robin(struct d > > old = next = d->arch.hvm_domain.round_info[vector]; > > - do { > - /* the vcpu array is arranged according to vcpu_id */ > + /* the vcpu array is arranged according to vcpu_id */ > + do > + { > + next ++; > + if ( !d->vcpu[next] || > + !test_bit(_VCPUF_initialised, &d->vcpu[next]->vcpu_flags) || > + next == MAX_VIRT_CPUS ) > + next = 0; > + > if ( test_bit(next, &bitmap) ) > { > target = d->vcpu[next]->arch.hvm_vcpu.vlapic; > @@ -321,12 +328,6 @@ struct vlapic* apic_round_robin(struct d > } > break; > } > - > - next ++; > - if ( !d->vcpu[next] || > - !test_bit(_VCPUF_initialised, &d->vcpu[next]->vcpu_flags) > || > - next == MAX_VIRT_CPUS ) > - next = 0; > } while ( next != old ); > > d->arch.hvm_domain.round_info[vector] = next; > ~ > > > > > >Dave > > > >_______________________________________________ > >Xen-devel mailing list > >Xen-devel@lists.xensource.com > >http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel