thr3ads.net - Xen devel - Debian stable kernel got timer issue when running as PV guest [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Sheng Yang

2012-Apr-12 19:22 UTC

Debian stable kernel got timer issue when running as PV guest

(Sorry for duplicate mail, got a typo in the mailing list address...)

Hi,

Recently we got some reports of Debian(2.6.32-41 package) migration hang on
some certain machines. I''ve identified one issue in Xen, but I think
there
is probably another issue in the kernel.

Here is the case.

[    0.000000] Booting paravirtualized kernel on Xen



[    0.000000] Xen version: 3.4.2 (preserve-AD)



[    0.000000] NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:1 nr_node_ids:1



[    0.000000] PERCPU: Embedded 15 pages/cpu @c1608000 s37656 r0 d23784
u65536


[    0.000000] pcpu-alloc: s37656 r0 d23784 u65536 alloc=16*4096



[    0.000000] pcpu-alloc: [0] 0



[508119.807590] trying to map vcpu_info 0 at c1609010, mfn 992cac, offset
16


[508119.807593] cpu 0 using vcpu_info at c1609010



[508119.807594] Xen: using vcpu_info placement



[508119.807598] Built 1 zonelists in Zone order, mobility grouping on.
 Total pages: 32416

Dmesg show that when booting, timestamp of printk jumped from 0 to a big
number([508119.807590] in this case) immediately.

And when migrating:

[509508.914333] suspending xenstore...



[516212.055921] trying to map vcpu_info 0 at c1609010, mfn 895fd7, offset
16


[516212.055930] cpu 0 using vcpu_info at c1609010

Timestamp jumped again. We can reproduce above issues on our Sandy Bridge
machines.

After this, call trace and guest hang maybe observed on some machines:

[516383.019499] INFO: task xenwatch:12 blocked for more than 120 seconds.



[516383.019566] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables
this message.


[516383.019578] xenwatch      D c1610e20     0    12      2 0x00000000



[516383.019591]  c781eec0 00000246 c1610e58 c1610e20 c781f300 c1441e20
c1441e20 001cf000


[516383.019605]  c781f07c c1610e20 00000000 00000001 c1441e20 c62e01c0
c1610e20 c62e01c0


[516383.019617]  c127e18e c781f07c c7830020 c7830020 c1441e20 c1441e20
c127f2f1 c781f080


[516383.019629] Call Trace:



[516383.019640]  [<c127e18e>] ? schedule+0x78f/0x7dc



[516383.019645]  [<c127f2f1>] ? _spin_unlock_irqrestore+0xd/0xf



[516383.019649]  [<c127e4a1>] ? schedule_timeout+0x20/0xb0



[516383.019656]  [<c100573c>] ? xen_force_evtchn_callback+0xc/0x10



[516383.019660]  [<c127e3aa>] ? wait_for_common+0xa4/0x100



[516383.019665]  [<c1033315>] ? default_wake_function+0x0/0x8



[516383.019671]  [<c104a144>] ? kthread_stop+0x4f/0x8e



[516383.019675]  [<c1047883>] ? cleanup_workqueue_thread+0x3a/0x45



[516383.019679]  [<c1047903>] ? destroy_workqueue+0x56/0x85



[516383.019684]  [<c106a395>] ? stop_machine_destroy+0x23/0x37



[516383.019690]  [<c11962d8>] ? shutdown_handler+0x200/0x22f



[516383.019694]  [<c1197439>] ? xenwatch_thread+0xdc/0x103



[516383.019698]  [<c104a322>] ? autoremove_wake_function+0x0/0x2d



[516383.019701]  [<c119735d>] ? xenwatch_thread+0x0/0x103



[516383.019705]  [<c104a0f0>] ? kthread+0x61/0x66



[516383.019709]  [<c104a08f>] ? kthread+0x0/0x66



[516383.019714]  [<c1008d87>] ? kernel_thread_helper+0x7/0x10

But I cannot reproduce it call trace and hang on our Sandy Bridge.

I''ve spent some time to identify the timestamp jump issue, and
finally found it''s due to Invarient TSC (CPUID Leaf 0x80000007 EDX:8,
also
called non-stop TSC). The present of the feature would enable a parameter
in the kernel named: sched_clock_stable. Seems this parameter is unable to
work with Xen''s pvclock. If sched_clock_stable() is set, value returned
by
xen_clocksource_read() would be returned as sched_clock_cpu() directly, but
CMIIW the value returned by xen_clocksource_read() is based on host(vcpu)
uptime rather than this VM''s uptime, then result in the timestamp jump.

I''ve compiled a kernel, force sched_clock_stable=0, then it solved the
timestamp jump issue as expected. Luckily, seems it also solved the call
trace and guest hang issue as well.

Attachment is a (untested) patch to mask the CPUID leaf 0x80000007. I think
the issue can be easily reproduced using a Westmere or SandyBridge
machine(my old colleagues at Intel said the feature likely existed after
Nehalem) running newer version of PV guest, check the guest cpuinfo you
would see nonstop_tsc, and you would notice the abnormal timestamp of
printk.

Sorry I don''t have a Xen unstable environment by hand now. But I think
this
should be the case we saw.

BTW: the original environment is xen-3.4.2, but I found the feature remain
unmasked by latest xen-unstable tree.

--
regards
Yang, Sheng




-- 
--
regards
Yang, Sheng



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2012-Apr-13 07:56 UTC

head link

Re: Debian stable kernel got timer issue when running as PV guest

>>> On 12.04.12 at 21:22, Sheng Yang <sheng@yasker.org> wrote:
> I''ve compiled a kernel, force sched_clock_stable=0, then it solved
the
> timestamp jump issue as expected. Luckily, seems it also solved the call
> trace and guest hang issue as well.
And this is also how it should be fixed.
> Attachment is a (untested) patch to mask the CPUID leaf 0x80000007. I think
> the issue can be easily reproduced using a Westmere or SandyBridge
> machine(my old colleagues at Intel said the feature likely existed after
> Nehalem) running newer version of PV guest, check the guest cpuinfo you
> would see nonstop_tsc, and you would notice the abnormal timestamp of
> printk.
Masking the entire leaf is certainly out of question. And even masking
the individual bit is questionable - a PV kernel simply shouldn''t look
at
it imo (for other than possibly reporting to user mode purposes).

Jan

David Vrabel

2012-Apr-13 10:37 UTC

head link

Re: Debian stable kernel got timer issue when running as PV guest

On 13/04/12 08:56, Jan Beulich wrote:>>>> On 12.04.12 at 21:22, Sheng Yang <sheng@yasker.org>
wrote:
>> I''ve compiled a kernel, force sched_clock_stable=0, then it
solved the
>> timestamp jump issue as expected. Luckily, seems it also solved the
call
>> trace and guest hang issue as well.
> 
> And this is also how it should be fixed.
Something like this?  I''ve not tested it yet as I need to track down
some of the problem hardware and get it set up.

8<---------------
xen: always set the sched clock as unstable

It''s not clear to me if the Xen clock source can be used as a stable
sched clock. Also, even if the guest is started on a system whose
underying TSC is stable it may be migrated to one where it''s not. So
never mark the sched clock as stable.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/time.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 0296a95..b22cd9c 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -473,6 +473,9 @@ static void __init xen_time_init(void)
 	do_settimeofday(&tp);

 	setup_force_cpu_cap(X86_FEATURE_TSC);
+	setup_clear_cpu_cap(X86_FEATURE_CONSTANT_TSC);
+	setup_clear_cpu_cap(X86_FEATURE_NONSTOP_TSC);
+	sched_clock_stable = 0;

 	xen_setup_runstate_info(cpu);
 	xen_setup_timer(cpu);

Jan Beulich

2012-Apr-13 11:00 UTC

head link

Re: Debian stable kernel got timer issue when running as PV guest

>>> On 13.04.12 at 12:37, David Vrabel <dvrabel@cantab.net>
wrote:
> On 13/04/12 08:56, Jan Beulich wrote:
>>>>> On 12.04.12 at 21:22, Sheng Yang <sheng@yasker.org>
wrote:
>>> I''ve compiled a kernel, force sched_clock_stable=0, then
it solved the
>>> timestamp jump issue as expected. Luckily, seems it also solved the
call
>>> trace and guest hang issue as well.
>> 
>> And this is also how it should be fixed.
> 
> Something like this?  I''ve not tested it yet as I need to track
down
> some of the problem hardware and get it set up.
Yeah, except that I''m not sure you really need to clear the feature
flags. Just making sure sched_clock_stable never gets set should be
enough; playing with the feature flags always implies that users will
see bigger differences in /proc/cpuinfo between native and Xen
kernels...

Jjan
> 8<---------------
> xen: always set the sched clock as unstable
> 
> It''s not clear to me if the Xen clock source can be used as a
stable
> sched clock. Also, even if the guest is started on a system whose
> underying TSC is stable it may be migrated to one where it''s not.
So
> never mark the sched clock as stable.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  arch/x86/xen/time.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index 0296a95..b22cd9c 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -473,6 +473,9 @@ static void __init xen_time_init(void)
>  	do_settimeofday(&tp);
> 
>  	setup_force_cpu_cap(X86_FEATURE_TSC);
> +	setup_clear_cpu_cap(X86_FEATURE_CONSTANT_TSC);
> +	setup_clear_cpu_cap(X86_FEATURE_NONSTOP_TSC);
> +	sched_clock_stable = 0;
> 
>  	xen_setup_runstate_info(cpu);
>  	xen_setup_timer(cpu);

David Vrabel

2012-Apr-13 16:10 UTC

head link

Re: Debian stable kernel got timer issue when running as PV guest

On 13/04/12 12:00, Jan Beulich wrote:>>>> On 13.04.12 at 12:37, David Vrabel <dvrabel@cantab.net>
wrote:
>> On 13/04/12 08:56, Jan Beulich wrote:
>>>>>> On 12.04.12 at 21:22, Sheng Yang
<sheng@yasker.org> wrote:
>>>> I''ve compiled a kernel, force sched_clock_stable=0,
then it solved the
>>>> timestamp jump issue as expected. Luckily, seems it also solved
the call
>>>> trace and guest hang issue as well.
>>>
>>> And this is also how it should be fixed.
>>
>> Something like this?  I''ve not tested it yet as I need to
track down
>> some of the problem hardware and get it set up.
> 
> Yeah, except that I''m not sure you really need to clear the
feature
> flags. Just making sure sched_clock_stable never gets set should be
> enough; playing with the feature flags always implies that users will
> see bigger differences in /proc/cpuinfo between native and Xen
> kernels...
I have a system with both NONSTOP_TSC and CONSTANT_TSC so
sched_clock_stable should be true.  VMs start and migrate fine with no
unexpected jumps in time.  I think more digging is required here to find
out why time is screwy on this particular system.

David
>> 8<---------------
>> xen: always set the sched clock as unstable
>>
>> It''s not clear to me if the Xen clock source can be used as a
stable
>> sched clock. Also, even if the guest is started on a system whose
>> underying TSC is stable it may be migrated to one where it''s
not. So
>> never mark the sched clock as stable.
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>> ---
>>  arch/x86/xen/time.c |    3 +++
>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
>> index 0296a95..b22cd9c 100644
>> --- a/arch/x86/xen/time.c
>> +++ b/arch/x86/xen/time.c
>> @@ -473,6 +473,9 @@ static void __init xen_time_init(void)
>>  	do_settimeofday(&tp);
>>
>>  	setup_force_cpu_cap(X86_FEATURE_TSC);
>> +	setup_clear_cpu_cap(X86_FEATURE_CONSTANT_TSC);
>> +	setup_clear_cpu_cap(X86_FEATURE_NONSTOP_TSC);
>> +	sched_clock_stable = 0;
>>
>>  	xen_setup_runstate_info(cpu);
>>  	xen_setup_timer(cpu);

Sheng Yang

2012-Apr-13 16:15 UTC

head link

Re: Debian stable kernel got timer issue when running as PV guest

On Fri, Apr 13, 2012 at 9:10 AM, David Vrabel
<david.vrabel@citrix.com>wrote:
> On 13/04/12 12:00, Jan Beulich wrote:
> >>>> On 13.04.12 at 12:37, David Vrabel
<dvrabel@cantab.net> wrote:
> >> On 13/04/12 08:56, Jan Beulich wrote:
> >>>>>> On 12.04.12 at 21:22, Sheng Yang
<sheng@yasker.org> wrote:
> >>>> I''ve compiled a kernel, force
sched_clock_stable=0, then it solved the
> >>>> timestamp jump issue as expected. Luckily, seems it also
solved the
> call
> >>>> trace and guest hang issue as well.
> >>>
> >>> And this is also how it should be fixed.
> >>
> >> Something like this?  I''ve not tested it yet as I need to
track down
> >> some of the problem hardware and get it set up.
> >
> > Yeah, except that I''m not sure you really need to clear the
feature
> > flags. Just making sure sched_clock_stable never gets set should be
> > enough; playing with the feature flags always implies that users will
> > see bigger differences in /proc/cpuinfo between native and Xen
> > kernels...
>
> I have a system with both NONSTOP_TSC and CONSTANT_TSC so
> sched_clock_stable should be true.  VMs start and migrate fine with no
> unexpected jumps in time.  I think more digging is required here to find
> out why time is screwy on this particular system.
>
That''s the reason I said there should be another (kernel) bug,
triggered by
this. In the original mail, I''ve already said on our Sandy Bridge
machine,
I can only reproduce the timestamp of printk jump issue, but not the
migration hang.

Did you see the timestamp jump on the PV guest?

--
regards
Yang, Sheng

> David
>
> >> 8<---------------
> >> xen: always set the sched clock as unstable
> >>
> >> It''s not clear to me if the Xen clock source can be used
as a stable
> >> sched clock. Also, even if the guest is started on a system whose
> >> underying TSC is stable it may be migrated to one where
it''s not. So
> >> never mark the sched clock as stable.
> >>
> >> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> >> ---
> >>  arch/x86/xen/time.c |    3 +++
> >>  1 files changed, 3 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> >> index 0296a95..b22cd9c 100644
> >> --- a/arch/x86/xen/time.c
> >> +++ b/arch/x86/xen/time.c
> >> @@ -473,6 +473,9 @@ static void __init xen_time_init(void)
> >>      do_settimeofday(&tp);
> >>
> >>      setup_force_cpu_cap(X86_FEATURE_TSC);
> >> +    setup_clear_cpu_cap(X86_FEATURE_CONSTANT_TSC);
> >> +    setup_clear_cpu_cap(X86_FEATURE_NONSTOP_TSC);
> >> +    sched_clock_stable = 0;
> >>
> >>      xen_setup_runstate_info(cpu);
> >>      xen_setup_timer(cpu);
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Sheng Yang

2012-Apr-13 17:27 UTC

head link

Re: Debian stable kernel got timer issue when running as PV guest

On Fri, Apr 13, 2012 at 12:56 AM, Jan Beulich <JBeulich@suse.com> wrote:
> >>> On 12.04.12 at 21:22, Sheng Yang <sheng@yasker.org>
wrote:
> > I''ve compiled a kernel, force sched_clock_stable=0, then it
solved the
> > timestamp jump issue as expected. Luckily, seems it also solved the
call
> > trace and guest hang issue as well.
>
> And this is also how it should be fixed.
>
> > Attachment is a (untested) patch to mask the CPUID leaf 0x80000007. I
> think
> > the issue can be easily reproduced using a Westmere or SandyBridge
> > machine(my old colleagues at Intel said the feature likely existed
after
> > Nehalem) running newer version of PV guest, check the guest cpuinfo
you
> > would see nonstop_tsc, and you would notice the abnormal timestamp of
> > printk.
>
> Masking the entire leaf is certainly out of question. And even masking
> the individual bit is questionable - a PV kernel simply shouldn''t
look at
> it imo (for other than possibly reporting to user mode purposes).
>
> Jan
>
>The CPUID detection part in the kernel is handled by CPU vendor, not Xen.
And the way how Xen control it is through CPUID it present to the guest.

1. We can only mask one bit of it. But currently this leaf got only this
feature. I don''t think it would be a big problem of mask the whole
leaf. I
think it''s already a problem that Xen handle PV guest a blacklist of
cpu
feature rather than a white list, so when some new feature slipped in(like
this time), nobody would know what would happen. I am really thinking of
some thing like:

switch ( input[0] )
case...
case...

+default:
        regs[0] = regs[1] = regs[2] = regs[3] = 0;

Maybe there are some reason that we didn''t set a default value for pv
cpuid
policy, but I can''t see why.

2. If we want to present the cpu feature to the guest and disable that
feature in the guest, then what''s the point? I don''t think it
is a good
idea. What if there are something else interactive with this cpuid feature
but we failed to disable(e.g. something other than sched_clock_stable)?
Just don''t show it would be a better/cleaner way to do it, as long as
we
agreed this feature is useless even troublesome for PV guest.

--
regards
Yang, Sheng

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Xen devel - Apr 2012 - Debian stable kernel got timer issue when running as PV guest

Debian stable kernel got timer issue when running as PV guest

Re: Debian stable kernel got timer issue when running as PV guest

Re: Debian stable kernel got timer issue when running as PV guest

Re: Debian stable kernel got timer issue when running as PV guest

Re: Debian stable kernel got timer issue when running as PV guest

Re: Debian stable kernel got timer issue when running as PV guest

Re: Debian stable kernel got timer issue when running as PV guest