Keir,
During regression of testing 32b UP SLES9.3/SUSE10 HVM guests on 64b hv,
we are seeing a problem with the guest becoming permanently blocked (b
state).  Blockage occurs at fairly random times... booting, fsck,
ltp/cerberos - on both AMD-V and VT, and takes from 5 minutes to many
hours to fail.  Last c/s tested was 13947 that we see the problem.
We''ve traced it back to changeset 13320.  if we boot the guest with
hpet=disabled, then the guest runs without problem (tested 48 hours w/o
failure).  Adding the "vcpu_kick" line removed with c/s 13320 also
alleviates the problem (24 hours w/o failure).  
Let me know if you need any more details concerning the guest
configuration or host machine, or if you believe/need alternate testing
parms would be useful, and we can run additional tests.
# HG changeset patch
# User kfraser@localhost.localdomain
# Date 1168438470 0
# Node ID 36fd53b2e3b4a41b4664be1abc059e25622e2ee3
# Parent  0b679a6d8ad083022d2a0463ff3a5fa5a852c7c4
[HVM] Remove unneeded vcpu_kick() from HPET device model.
Signed-off-by: Keir Fraser <keir@xensource.com>
diff -r 0b679a6d8ad0 -r 36fd53b2e3b4 xen/arch/x86/hvm/hpet.c
--- a/xen/arch/x86/hvm/hpet.c	Wed Jan 10 11:09:21 2007 +0000
+++ b/xen/arch/x86/hvm/hpet.c	Wed Jan 10 14:14:30 2007 +0000
@@ -356,8 +356,6 @@ static void hpet_timer_fn(void *opaque)
         }
         set_timer(&h->timers[tn], NOW() + hpet_tick_to_ns(h,
h->period[tn]));
     }
-
-    vcpu_kick(h->vcpu);    
 }
 
 void hpet_migrate_timers(struct vcpu *v)
 
  --Tom
thomas.woller@amd.com
AMD Corporation
5204 E. Ben White Blvd. UBC1
Austin, Texas 78741
+1-512-602-0059
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
On 20/2/07 17:00, "Woller, Thomas" <thomas.woller@amd.com> wrote:> During regression of testing 32b UP SLES9.3/SUSE10 HVM guests on 64b hv, > we are seeing a problem with the guest becoming permanently blocked (b > state). Blockage occurs at fairly random times... booting, fsck, > ltp/cerberos - on both AMD-V and VT, and takes from 5 minutes to many > hours to fail. Last c/s tested was 13947 that we see the problem. > We''ve traced it back to changeset 13320. if we boot the guest with > hpet=disabled, then the guest runs without problem (tested 48 hours w/o > failure). Adding the "vcpu_kick" line removed with c/s 13320 also > alleviates the problem (24 hours w/o failure). > Let me know if you need any more details concerning the guest > configuration or host machine, or if you believe/need alternate testing > parms would be useful, and we can run additional tests.Thanks for tracking this one down to the HPET logic. However, reinstating this changeset is not really the correct fix. A vcpu_kick() may rescue otherwise-lost VCPUs I suppose, but there''s no logical reason that it should be necessary. Any necessary wakeup should occur via an interrupt delivery from hpet_route_interrupt(). After all, there''s no point in waking up a VCPU unless it has work to do, which will usually mean that you are in the process of delivering it an interrupt (hence the vcpu_kick() invocations in vpic.c, vioapic.c and vlapic.c). The invocation in vpt.c is actually correct because it is tied up in the pending_intr_nr logic which gets checked in the exit-to-guest path of a woken VCPU. It''s worth trying to grab some more info about a guest when it hangs: How are the HPET timers configured? In particular, how should interrupts be delivered? Does it look like an interrupt has been delivered but not notified? Etc. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Update: changeset 14039:87f31a0db841 in xen-unstable may well fix this problem. It fixes a going-to-sleep race when executing HLT in an HVM guest. -- Keir On 20/2/07 17:00, "Woller, Thomas" <thomas.woller@amd.com> wrote:> Keir, > During regression of testing 32b UP SLES9.3/SUSE10 HVM guests on 64b hv, > we are seeing a problem with the guest becoming permanently blocked (b > state). Blockage occurs at fairly random times... booting, fsck, > ltp/cerberos - on both AMD-V and VT, and takes from 5 minutes to many > hours to fail. Last c/s tested was 13947 that we see the problem. > We''ve traced it back to changeset 13320. if we boot the guest with > hpet=disabled, then the guest runs without problem (tested 48 hours w/o > failure). Adding the "vcpu_kick" line removed with c/s 13320 also > alleviates the problem (24 hours w/o failure). > Let me know if you need any more details concerning the guest > configuration or host machine, or if you believe/need alternate testing > parms would be useful, and we can run additional tests. > > # HG changeset patch > # User kfraser@localhost.localdomain > # Date 1168438470 0 > # Node ID 36fd53b2e3b4a41b4664be1abc059e25622e2ee3 > # Parent 0b679a6d8ad083022d2a0463ff3a5fa5a852c7c4 > [HVM] Remove unneeded vcpu_kick() from HPET device model. > Signed-off-by: Keir Fraser <keir@xensource.com> > > diff -r 0b679a6d8ad0 -r 36fd53b2e3b4 xen/arch/x86/hvm/hpet.c > --- a/xen/arch/x86/hvm/hpet.c Wed Jan 10 11:09:21 2007 +0000 > +++ b/xen/arch/x86/hvm/hpet.c Wed Jan 10 14:14:30 2007 +0000 > @@ -356,8 +356,6 @@ static void hpet_timer_fn(void *opaque) > } > set_timer(&h->timers[tn], NOW() + hpet_tick_to_ns(h, > h->period[tn])); > } > - > - vcpu_kick(h->vcpu); > } > > void hpet_migrate_timers(struct vcpu *v) > > > --Tom > > thomas.woller@amd.com > AMD Corporation > 5204 E. Ben White Blvd. UBC1 > Austin, Texas 78741 > +1-512-602-0059 > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Update: changeset 14039:87f31a0db841 in xen-unstable may well > fix this problem. It fixes a going-to-sleep race when > executing HLT in an HVM guest.We have had issues with HVM guests last week or 2 (even with c/s 14039), but recent testing shows that c/s 14138 passes our regression for the SLES9.3 and SUSE10 32b HVM guests. So, looks like c/s 14039 did fix the hpet guest block issue we were experiencing. tom _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel