Hi, c/s 22529 and 22530 cause a xen guest hang. While "normal" guests like Linux and NetBSD boot fine I boot Xen itself as a xen guest for my nested virtualization. When I do that then the guest dom0 hangs at boot when it tries to initialize the first vcpu. The bug is introduced somewhere in c/s 22529 and triggers with c/s 22530. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger wrote on 2010-12-17:> > Hi, > > c/s 22529 and 22530 cause a xen guest hang. > > While "normal" guests like Linux and NetBSD boot fine I boot Xen > itself as a xen guest for my nested virtualization. > > When I do that then the guest dom0 hangs at boot when it tries to > initialize the first vcpu. > The bug is introduced somewhere in c/s 22529 and triggers with c/s 22530.Can you enable apic_timer debug info var hvm_debug and give more serial port log around the guest dom0 hangs? I used to test xen guest, it works well expect that it boot a little bit slowly. Jimmy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sunday 19 December 2010 15:26:30 Wei, Gang wrote:> Christoph Egger wrote on 2010-12-17: > > Hi, > > > > c/s 22529 and 22530 cause a xen guest hang. > > > > While "normal" guests like Linux and NetBSD boot fine I boot Xen > > itself as a xen guest for my nested virtualization. > > > > When I do that then the guest dom0 hangs at boot when it tries to > > initialize the first vcpu. > > The bug is introduced somewhere in c/s 22529 and triggers with c/s 22530. > > Can you enable apic_timer debug info var hvm_debug and give more serial > port log around the guest dom0 hangs? I used to test xen guest, it works > well expect that it boot a little bit slowly.This is the log output I get with TSC_DEADLINE feature enabled: (XEN) [HVM:1.0] <vlapic_set_tdcr> timer_divisor: 1 (XEN) [HVM:1.0] <vlapic_reg_write> timer divisor is 0x1 (XEN) [HVM:1.0] <vlapic_get_tmcct> timer initial count 1000000000, timer current count 999546729, offset 453271 (XEN) [HVM:1.0] <vlapic_get_tmcct> timer initial count 1000000000, timer current count 989547039, offset 10452961 (XEN) [HVM:1.0] <vlapic_set_tdcr> timer_divisor: 1 (XEN) [HVM:1.0] <vlapic_reg_write> timer divisor is 0x1 (XEN) [HVM:1.0] <vlapic_set_tdcr> timer_divisor: 1 (XEN) [HVM:1.0] <vlapic_reg_write> timer divisor is 0x1 (XEN) [HVM:1.0] <vlapic_tdt_msr_set> ignore tsc deadline msr write (XEN) [HVM:1.1] <vlapic_set_tdcr> timer_divisor: 1 (XEN) [HVM:1.0] <vlapic_tdt_msr_set> ignore tsc deadline msr write (XEN) [HVM:1.1] <vlapic_reg_write> timer divisor is 0x1 (XEN) [HVM:1.1] <vlapic_tdt_msr_set> ignore tsc deadline msr write (XEN) [HVM:1.1] <vlapic_tdt_msr_set> ignore tsc deadline msr write (XEN) [HVM:1.2] <vlapic_set_tdcr> timer_divisor: 1 (XEN) [HVM:1.2] <vlapic_reg_write> timer divisor is 0x1 (XEN) [HVM:1.2] <vlapic_tdt_msr_set> ignore tsc deadline msr write (XEN) [HVM:1.2] <vlapic_tdt_msr_set> ignore tsc deadline msr write (XEN) [HVM:1.3] <vlapic_set_tdcr> timer_divisor: 1 (XEN) [HVM:1.3] <vlapic_reg_write> timer divisor is 0x1 (XEN) [HVM:1.3] <vlapic_tdt_msr_set> ignore tsc deadline msr write (XEN) [HVM:1.3] <vlapic_tdt_msr_set> ignore tsc deadline msr write The guest dom0 output right before the hang: ioapic0 at mainbus0 apid 1, virtual wire mode hypervisor0 at mainbus0: Xen version 4.1 vcpu0 at hypervisor0 The vcpu driver tries to detect the tsc frequency here. The dom0 uses the xen clock timer, the same a PV guest uses. This is the log output I get with TSC_DEADLINE feature disabled: (XEN) [HVM:1.0] <vlapic_set_tdcr> timer_divisor: 1 (XEN) [HVM:1.0] <vlapic_reg_write> timer divisor is 0x1 (XEN) [HVM:1.0] <vlapic_get_tmcct> timer initial count 1000000000, timer current count 999716563, offset 283437 (XEN) [HVM:1.0] <vlapic_get_tmcct> timer initial count 1000000000, timer current count 989716153, offset 10283847 (XEN) [HVM:1.0] <vlapic_set_tdcr> timer_divisor: 1 (XEN) [HVM:1.0] <vlapic_reg_write> timer divisor is 0x1 (XEN) [HVM:1.0] <vlapic_set_tdcr> timer_divisor: 1 (XEN) [HVM:1.0] <vlapic_reg_write> timer divisor is 0x1 (XEN) [HVM:1.1] <vlapic_set_tdcr> timer_divisor: 1 (XEN) [HVM:1.1] <vlapic_reg_write> timer divisor is 0x1 (XEN) [HVM:1.2] <vlapic_set_tdcr> timer_divisor: 1 (XEN) [HVM:1.2] <vlapic_reg_write> timer divisor is 0x1 (XEN) [HVM:1.3] <vlapic_set_tdcr> timer_divisor: 1 (XEN) [HVM:1.3] <vlapic_reg_write> timer divisor is 0x1 The guest dom0 output: ioapic0 at mainbus0 apid 1, virtual wire mode hypervisor0 at mainbus0: Xen version 4.1 vcpu0 at hypervisor0: AMD 686-class, 1895MHz xenbus0 at hypervisor0: Xen Virtual Bus Interface [...] -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Wei, Gang
2010-Dec-27 03:05 UTC
[PATCH] vtdt: add a missing change (RE: [Xen-devel] Re: xen guest hang with TSC_DEADLINE)
Christoph Egger wrote on 2010-12-20:>>> c/s 22529 and 22530 cause a xen guest hang. >>> >>> While "normal" guests like Linux and NetBSD boot fine I boot Xen >>> itself as a xen guest for my nested virtualization. >>> >>> When I do that then the guest dom0 hangs at boot when it tries to >>> initialize the first vcpu. >>> The bug is introduced somewhere in c/s 22529 and triggers with c/s 22530. >> >> Can you enable apic_timer debug info var hvm_debug and give more >> serial port log around the guest dom0 hangs? I used to test xen >> guest, it works well expect that it boot a little bit slowly. > > This is the log output I get with TSC_DEADLINE feature enabled:Just found one change was missed while the whole patch was checked in. Apply below patch, it should be ok now. diff -r 0133cf2a72f5 xen/arch/x86/hvm/vlapic.c --- a/xen/arch/x86/hvm/vlapic.c Fri Dec 24 10:56:29 2010 +0000 +++ b/xen/arch/x86/hvm/vlapic.c Tue Dec 28 16:53:06 2010 +0800 @@ -56,7 +56,7 @@ static unsigned int vlapic_lvt_mask[VLAP static unsigned int vlapic_lvt_mask[VLAPIC_LVT_NUM] { /* LVTT */ - LVT_MASK | APIC_TIMER_MODE_PERIODIC, + LVT_MASK | APIC_TIMER_MODE_MASK, /* LVTTHMR */ LVT_MASK | APIC_MODE_MASK, /* LVTPC */ Jimmy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2011-Jan-04 11:00 UTC
Re: [PATCH] vtdt: add a missing change (RE: [Xen-devel] Re: xen guest hang with TSC_DEADLINE)
On Monday 27 December 2010 04:05:41 Wei, Gang wrote:> Christoph Egger wrote on 2010-12-20: > >>> c/s 22529 and 22530 cause a xen guest hang. > >>> > >>> While "normal" guests like Linux and NetBSD boot fine I boot Xen > >>> itself as a xen guest for my nested virtualization. > >>> > >>> When I do that then the guest dom0 hangs at boot when it tries to > >>> initialize the first vcpu. > >>> The bug is introduced somewhere in c/s 22529 and triggers with c/s > >>> 22530. > >> > >> Can you enable apic_timer debug info var hvm_debug and give more > >> serial port log around the guest dom0 hangs? I used to test xen > >> guest, it works well expect that it boot a little bit slowly. > > > > This is the log output I get with TSC_DEADLINE feature enabled: > > Just found one change was missed while the whole patch was checked in. > Apply below patch, it should be ok now. > > diff -r 0133cf2a72f5 xen/arch/x86/hvm/vlapic.c > --- a/xen/arch/x86/hvm/vlapic.c Fri Dec 24 10:56:29 2010 +0000 > +++ b/xen/arch/x86/hvm/vlapic.c Tue Dec 28 16:53:06 2010 +0800 > @@ -56,7 +56,7 @@ static unsigned int vlapic_lvt_mask[VLAP > static unsigned int vlapic_lvt_mask[VLAPIC_LVT_NUM] > { > /* LVTT */ > - LVT_MASK | APIC_TIMER_MODE_PERIODIC, > + LVT_MASK | APIC_TIMER_MODE_MASK, > /* LVTTHMR */ > LVT_MASK | APIC_MODE_MASK, > /* LVTPC */ > > JimmyThe hang is still reproducable with this change. Sorry. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2011-Jan-05 10:07 UTC
Re: [PATCH] vtdt: add a missing change (RE: [Xen-devel] Re: xen guest hang with TSC_DEADLINE)
On Tuesday 04 January 2011 12:00:21 Christoph Egger wrote:> On Monday 27 December 2010 04:05:41 Wei, Gang wrote: > > Christoph Egger wrote on 2010-12-20: > > >>> c/s 22529 and 22530 cause a xen guest hang. > > >>> > > >>> While "normal" guests like Linux and NetBSD boot fine I boot Xen > > >>> itself as a xen guest for my nested virtualization. > > >>> > > >>> When I do that then the guest dom0 hangs at boot when it tries to > > >>> initialize the first vcpu. > > >>> The bug is introduced somewhere in c/s 22529 and triggers with c/s > > >>> 22530. > > >> > > >> Can you enable apic_timer debug info var hvm_debug and give more > > >> serial port log around the guest dom0 hangs? I used to test xen > > >> guest, it works well expect that it boot a little bit slowly. > > > > > > This is the log output I get with TSC_DEADLINE feature enabled: > > > > Just found one change was missed while the whole patch was checked in. > > Apply below patch, it should be ok now. > > > > diff -r 0133cf2a72f5 xen/arch/x86/hvm/vlapic.c > > --- a/xen/arch/x86/hvm/vlapic.c Fri Dec 24 10:56:29 2010 +0000 > > +++ b/xen/arch/x86/hvm/vlapic.c Tue Dec 28 16:53:06 2010 +0800 > > @@ -56,7 +56,7 @@ static unsigned int vlapic_lvt_mask[VLAP > > static unsigned int vlapic_lvt_mask[VLAPIC_LVT_NUM] > > { > > /* LVTT */ > > - LVT_MASK | APIC_TIMER_MODE_PERIODIC, > > + LVT_MASK | APIC_TIMER_MODE_MASK, > > /* LVTTHMR */ > > LVT_MASK | APIC_MODE_MASK, > > /* LVTPC */ > > > > Jimmy > > The hang is still reproducable with this change. Sorry.I took a deeper look into this problem. The guest dom0 depends on the xen clock to continue ticking. When the guest xen hypervisor uses the TSC Deadline Timer then the xen clock for the guest dom0 does not continue ticking. By "xen clock" I mean the timer a PV guest uses. When the dom0 initializes the first virtual cpu it determines cpu frequency by reading the TSC, waits 1 second and reads the TSC again. To finish the 1 second wait it depends on the xen clock to tick. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Wei, Gang
2011-Jan-05 10:11 UTC
RE: [PATCH] vtdt: add a missing change (RE: [Xen-devel] Re: xen guest hang with TSC_DEADLINE)
Christoph Egger wrote on 2011-01-04:>>>>> c/s 22529 and 22530 cause a xen guest hang. >>>>> >>>>> While "normal" guests like Linux and NetBSD boot fine I boot Xen >>>>> itself as a xen guest for my nested virtualization. >>>>> >>>>> When I do that then the guest dom0 hangs at boot when it tries >>>>> to initialize the first vcpu. >>>>> The bug is introduced somewhere in c/s 22529 and triggers with >>>>> c/s 22530. >>>> >>>> Can you enable apic_timer debug info var hvm_debug and give more >>>> serial port log around the guest dom0 hangs? I used to test xen >>>> guest, it works well expect that it boot a little bit slowly. >>> >>> This is the log output I get with TSC_DEADLINE feature enabled: >> >> Just found one change was missed while the whole patch was checked in. >> Apply below patch, it should be ok now. >> >> diff -r 0133cf2a72f5 xen/arch/x86/hvm/vlapic.c >> --- a/xen/arch/x86/hvm/vlapic.c Fri Dec 24 10:56:29 2010 +0000 >> +++ b/xen/arch/x86/hvm/vlapic.c Tue Dec 28 16:53:06 2010 +0800 >> @@ -56,7 +56,7 @@ static unsigned int vlapic_lvt_mask[VLAP static >> unsigned int vlapic_lvt_mask[VLAPIC_LVT_NUM] = { >> /* LVTT */ >> - LVT_MASK | APIC_TIMER_MODE_PERIODIC, >> + LVT_MASK | APIC_TIMER_MODE_MASK, >> /* LVTTHMR */ >> LVT_MASK | APIC_MODE_MASK, >> /* LVTPC */ >> Jimmy > > The hang is still reproducable with this change. Sorry.Can you still see below line in the serial log? Is there anything different? (XEN) [HVM:1.0] <vlapic_tdt_msr_set> ignore tsc deadline msr write In my side, I can see above line, and nested dom0 will hang while booting if without this change. But things become ok after applying this change. So could you give more details after apply this change (again, serial log, nested dom0 output, etc) ? Jimmy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Wei, Gang
2011-Jan-05 10:26 UTC
RE: [PATCH] vtdt: add a missing change (RE: [Xen-devel] Re: xen guest hang with TSC_DEADLINE)
Christoph Egger wrote on 2011-01-05:>>>>>> c/s 22529 and 22530 cause a xen guest hang. >>>>>> >>>>>> While "normal" guests like Linux and NetBSD boot fine I boot >>>>>> Xen itself as a xen guest for my nested virtualization. >>>>>> >>>>>> When I do that then the guest dom0 hangs at boot when it tries >>>>>> to initialize the first vcpu. >>>>>> The bug is introduced somewhere in c/s 22529 and triggers with >>>>>> c/s 22530. >>>>> >>>>> Can you enable apic_timer debug info var hvm_debug and give >>>>> more serial port log around the guest dom0 hangs? I used to >>>>> test xen guest, it works well expect that it boot a little bit slowly. >>>> >>>> This is the log output I get with TSC_DEADLINE feature enabled: >>> >>> Just found one change was missed while the whole patch was checked in. >>> Apply below patch, it should be ok now. >>> ... >> >> The hang is still reproducable with this change. Sorry. > > I took a deeper look into this problem. > The guest dom0 depends on the xen clock to continue ticking. > When the guest xen hypervisor uses the TSC Deadline Timer then the xen > clock for the guest dom0 does not continue ticking. > By "xen clock" I mean the timer a PV guest uses. > > When the dom0 initializes the first virtual cpu it determines cpu > frequency by reading the TSC, waits 1 second and reads the TSC again. > To finish the 1 second wait it depends on the xen clock to tick.Along with the serial log, guest dom0 output, could you also give the host grub .conf, guest grub.conf and the hvm guest config file? Jimmy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2011-Jan-05 11:09 UTC
Re: [PATCH] vtdt: add a missing change (RE: [Xen-devel] Re: xen guest hang with TSC_DEADLINE)
On Wednesday 05 January 2011 11:11:46 Wei, Gang wrote:> Christoph Egger wrote on 2011-01-04: > >>>>> c/s 22529 and 22530 cause a xen guest hang. > >>>>> > >>>>> While "normal" guests like Linux and NetBSD boot fine I boot Xen > >>>>> itself as a xen guest for my nested virtualization. > >>>>> > >>>>> When I do that then the guest dom0 hangs at boot when it tries > >>>>> to initialize the first vcpu. > >>>>> The bug is introduced somewhere in c/s 22529 and triggers with > >>>>> c/s 22530. > >>>> > >>>> Can you enable apic_timer debug info var hvm_debug and give more > >>>> serial port log around the guest dom0 hangs? I used to test xen > >>>> guest, it works well expect that it boot a little bit slowly. > >>> > >>> This is the log output I get with TSC_DEADLINE feature enabled: > >> > >> Just found one change was missed while the whole patch was checked in. > >> Apply below patch, it should be ok now. > >> > >> diff -r 0133cf2a72f5 xen/arch/x86/hvm/vlapic.c > >> --- a/xen/arch/x86/hvm/vlapic.c Fri Dec 24 10:56:29 2010 +0000 > >> +++ b/xen/arch/x86/hvm/vlapic.c Tue Dec 28 16:53:06 2010 +0800 > >> @@ -56,7 +56,7 @@ static unsigned int vlapic_lvt_mask[VLAP static > >> unsigned int vlapic_lvt_mask[VLAPIC_LVT_NUM] = { > >> /* LVTT */ > >> - LVT_MASK | APIC_TIMER_MODE_PERIODIC, > >> + LVT_MASK | APIC_TIMER_MODE_MASK, > >> /* LVTTHMR */ > >> LVT_MASK | APIC_MODE_MASK, > >> /* LVTPC */ > >> Jimmy > > > > The hang is still reproducable with this change. Sorry. > > Can you still see below line in the serial log? Is there anything > different? > > (XEN) [HVM:1.0] <vlapic_tdt_msr_set> ignore tsc deadline msr writeThis one disappeared. My serial log is flooded with (XEN) [HVM:1.1] <vlapic_tdt_msr_set> delta[0x00003c7e619fbf47] (XEN) [HVM:1.3] <vlapic_tdt_msr_set> delta[0x00003c7e619fc6de] (XEN) [HVM:1.2] <vlapic_tdt_msr_set> delta[0x00003c7e619fc577] (XEN) [HVM:1.1] <vlapic_tdt_msr_set> tdt_msr[0x00000031ea5955ea], gtsc[0x00000031e936c509], gtime[0x0000001a5ff3f942] (XEN) [HVM:1.3] <vlapic_tdt_msr_set> tdt_msr[0x00000031eb83efd8], gtsc[0x00000031ea615816], gtime[0x0000001a60917e60] (XEN) [HVM:1.2] <vlapic_tdt_msr_set> tdt_msr[0x00000031ecae3dd2], gtsc[0x00000031eb8ba626], gtime[0x0000001a612ee444] But I never see a line starting with [HVM:1.0] or [HVM:1.4]. My guest has four virtual vcpus. nested dom0 output did not change.> > In my side, I can see above line, and nested dom0 will hang while booting > if without this change. But things become ok after applying this change. So > could you give more details after apply this change (again, serial log, > nested dom0 output, etc) ? > > Jimmy-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Wei, Gang
2011-Jan-06 01:35 UTC
RE: [PATCH] vtdt: add a missing change (RE: [Xen-devel] Re: xen guest hang with TSC_DEADLINE)
Christoph Egger wrote on 2011-01-05:>>>>>>> c/s 22529 and 22530 cause a xen guest hang. >>>>>>> >>>>>>> While "normal" guests like Linux and NetBSD boot fine I boot >>>>>>> Xen itself as a xen guest for my nested virtualization. >>>>>>> >>>>>>> When I do that then the guest dom0 hangs at boot when it tries >>>>>>> to initialize the first vcpu. >>>>>>> The bug is introduced somewhere in c/s 22529 and triggers with >>>>>>> c/s 22530. >>>>>> >>>>>> Can you enable apic_timer debug info var hvm_debug and give >>>>>> more serial port log around the guest dom0 hangs? I used to >>>>>> test xen guest, it works well expect that it boot a little bit slowly. >>>>> >>>>> This is the log output I get with TSC_DEADLINE feature enabled: >>>> >>>> Just found one change was missed while the whole patch was checked in. >>>> Apply below patch, it should be ok now. >>>> ... >>> The hang is still reproducable with this change. Sorry. >> >> Can you still see below line in the serial log? Is there anything >> different? >> >> (XEN) [HVM:1.0] <vlapic_tdt_msr_set> ignore tsc deadline msr write > > This one disappeared. > > My serial log is flooded with >... > > But I never see a line starting with [HVM:1.0] or [HVM:1.4]. My guest > has four virtual vcpus. > > nested dom0 output did not change.I tried 4-vcpu guest on my 2-pcpu machine also. It can finally boot up although it hangs for quite a long period (tens of minutes) while guest dom0 do starting udev. It is similar as nested xen tdt=off case. So it may caused by vendor specific code in dom0. I would like to propose a easy workaround: expose tdt feature to guest on Intel platform only. Do you agree? Or do you prefer to find the root cause of current issue? Jimmy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel