thr3ads.net - Xen devel - [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Marek Marczykowski

2011-Aug-28 13:13 UTC

[Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

Hey,

I''m experiencing strange problem: non-deterministic PV domain hang,
only
on some machines (with fast SSD drive). I''ve tried xen-4.1.0 and
xen-4.1.1 with many kernels different kernels:
VM:
 - 2.6.38.3 xenlinux based on SUSE package
 - vanilla 3.0.3
 - vanilla 3.1 rc2
dom0:
 - 2.6.38.3 xenlinux based on SUSE package
 - vanilla 3.1 rc2

Result always the same: sometimes VM hang at startup, SysRq-T shows
modprobe waiting in "wait_for_devices" (concretely schedule_timeout)
and
jiffies counter not increasing between task-states dumps.

The only found thing (probably) connected with this problem are domU
kernel messages:
CE: xen increased min_delta_ns to 150000 nsec
(...)
CE: xen increased min_delta_ns to 4000000 nsec
CE: Reprogramming failure. Giving up

This messages doesn''t exists in successful boot.

I''ve also tried some options to xen and domU kernel, but without
success
(all combinations):
xen: tsc=unstable, cpufreq=none
domU: nohz=off, clocksource=tsc

Some combination of above options lowered frequency of problem (ex
tsc=unstable + nohz=off), but it happens quite often - like 1 of 15
boots fails.

Have you idea what is the cause and what can help?

Attached all relevant logs and configs:
xl-dmesg: xl dmesg after failed domU start
netvm-console-begin: kernel messages from failed domU
netvm-console-sysrq-t-1: first domU SysRq-T
netvm-console-sysrq-t-2: second domU SysRq-T
netvm.conf: domU config
xenstore-ls: result of xenstore-ls -fp
dom0-dmesg: dom0 kernel messages
config-xenlinux: 2.6.28.3 kernel config (same for dom0 and domU)
config-pvops: 3.1rc2 kernel config (same for dom0 and domU)

PS "script" prefix in domU vbd config is custom patch to libxl which
implement xend behaviour of using hotplug script for VBD setup.

-- 
Pozdrawiam / Best Regards,
Marek Marczykowski         | RLU #390519
marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl











_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Aug-29 20:07 UTC

head link

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

On Sun, Aug 28, 2011 at 03:13:46PM +0200, Marek Marczykowski
wrote:> Hey,
> 
> I''m experiencing strange problem: non-deterministic PV domain
hang, only
> on some machines (with fast SSD drive). I''ve tried xen-4.1.0 and
> xen-4.1.1 with many kernels different kernels:
> VM:
>  - 2.6.38.3 xenlinux based on SUSE package
>  - vanilla 3.0.3
>  - vanilla 3.1 rc2
> dom0:
>  - 2.6.38.3 xenlinux based on SUSE package
>  - vanilla 3.1 rc2
> 
> Result always the same: sometimes VM hang at startup, SysRq-T shows
> modprobe waiting in "wait_for_devices" (concretely
schedule_timeout) and
> jiffies counter not increasing between task-states dumps.
> 
> The only found thing (probably) connected with this problem are domU
> kernel messages:
> CE: xen increased min_delta_ns to 150000 nsec
> (...)
> CE: xen increased min_delta_ns to 4000000 nsec
> CE: Reprogramming failure. Giving up
> 
> This messages doesn''t exists in successful boot.
> 
> I''ve also tried some options to xen and domU kernel, but without
success
> (all combinations):
BTW, your ''xencons=..'' and ''swiotlb=force''
are obsolete. Use
''console=hvc0'' and ''iommu=soft''. The
''swiotlb=force'' kills performance.
> xen: tsc=unstable, cpufreq=none
> domU: nohz=off, clocksource=tsc
> 
> Some combination of above options lowered frequency of problem (ex
> tsc=unstable + nohz=off), but it happens quite often - like 1 of 15
> boots fails.
> 
> Have you idea what is the cause and what can help?
The problem looks to be xenwatch stuck. So the problem is in Dom0 right?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Marek Marczykowski

2011-Aug-29 20:21 UTC

head link

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

On 29.08.2011 22:07, Konrad Rzeszutek Wilk wrote:> On Sun, Aug 28, 2011 at 03:13:46PM +0200, Marek Marczykowski wrote:
>> Hey,
>>
>> I''m experiencing strange problem: non-deterministic PV domain
hang, only
>> on some machines (with fast SSD drive). I''ve tried xen-4.1.0
and
>> xen-4.1.1 with many kernels different kernels:
>> VM:
>>  - 2.6.38.3 xenlinux based on SUSE package
>>  - vanilla 3.0.3
>>  - vanilla 3.1 rc2
>> dom0:
>>  - 2.6.38.3 xenlinux based on SUSE package
>>  - vanilla 3.1 rc2
>>
>> Result always the same: sometimes VM hang at startup, SysRq-T shows
>> modprobe waiting in "wait_for_devices" (concretely
schedule_timeout) and
>> jiffies counter not increasing between task-states dumps.
>>
>> The only found thing (probably) connected with this problem are domU
>> kernel messages:
>> CE: xen increased min_delta_ns to 150000 nsec
>> (...)
>> CE: xen increased min_delta_ns to 4000000 nsec
>> CE: Reprogramming failure. Giving up
>>
>> This messages doesn''t exists in successful boot.
>>
>> I''ve also tried some options to xen and domU kernel, but
without success
>> (all combinations):
> 
> BTW, your ''xencons=..'' and
''swiotlb=force'' are obsolete. Use
> ''console=hvc0'' and ''iommu=soft''. The
''swiotlb=force'' kills performance.
> 
>> xen: tsc=unstable, cpufreq=none
>> domU: nohz=off, clocksource=tsc
>>
>> Some combination of above options lowered frequency of problem (ex
>> tsc=unstable + nohz=off), but it happens quite often - like 1 of 15
>> boots fails.
>>
>> Have you idea what is the cause and what can help?
> 
> The problem looks to be xenwatch stuck. So the problem is in Dom0 right?
This "R" state of xenwatch looks like result of SysRq, which dumps
data...

[  118.679707]  [<ffffffff812a8081>] handle_sysrq+0x21/0x30
[  118.679707]  [<ffffffff8128db49>] sysrq_handler+0xb9/0xe0
[  118.679707]  [<ffffffff8128ff50>] xenwatch_thread+0xb0/0x170

And the problem is at DomU boot, Dom0 works without any problems.

-- 
Pozdrawiam / Best Regards,
Marek Marczykowski         | RLU #390519
marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Aug-29 20:59 UTC

head link

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

On Mon, Aug 29, 2011 at 10:21:23PM +0200, Marek Marczykowski
wrote:> On 29.08.2011 22:07, Konrad Rzeszutek Wilk wrote:
> > On Sun, Aug 28, 2011 at 03:13:46PM +0200, Marek Marczykowski wrote:
> >> Hey,
> >>
> >> I''m experiencing strange problem: non-deterministic PV
domain hang, only
> >> on some machines (with fast SSD drive). I''ve tried
xen-4.1.0 and
> >> xen-4.1.1 with many kernels different kernels:
> >> VM:
> >>  - 2.6.38.3 xenlinux based on SUSE package
> >>  - vanilla 3.0.3
> >>  - vanilla 3.1 rc2
> >> dom0:
> >>  - 2.6.38.3 xenlinux based on SUSE package
> >>  - vanilla 3.1 rc2
> >>
> >> Result always the same: sometimes VM hang at startup, SysRq-T
shows
> >> modprobe waiting in "wait_for_devices" (concretely
schedule_timeout) and
> >> jiffies counter not increasing between task-states dumps.
> >>
> >> The only found thing (probably) connected with this problem are
domU
> >> kernel messages:
> >> CE: xen increased min_delta_ns to 150000 nsec
> >> (...)
> >> CE: xen increased min_delta_ns to 4000000 nsec
> >> CE: Reprogramming failure. Giving up
> >>
> >> This messages doesn''t exists in successful boot.
> >>
> >> I''ve also tried some options to xen and domU kernel, but
without success
> >> (all combinations):
> > 
> > BTW, your ''xencons=..'' and
''swiotlb=force'' are obsolete. Use
> > ''console=hvc0'' and ''iommu=soft''.
The ''swiotlb=force'' kills performance.
> > 
> >> xen: tsc=unstable, cpufreq=none
> >> domU: nohz=off, clocksource=tsc
> >>
> >> Some combination of above options lowered frequency of problem (ex
> >> tsc=unstable + nohz=off), but it happens quite often - like 1 of
15
> >> boots fails.
> >>
> >> Have you idea what is the cause and what can help?
> > 
> > The problem looks to be xenwatch stuck. So the problem is in Dom0
right?
> 
> This "R" state of xenwatch looks like result of SysRq, which
dumps data...
> 
> [  118.679707]  [<ffffffff812a8081>] handle_sysrq+0x21/0x30
> [  118.679707]  [<ffffffff8128db49>] sysrq_handler+0xb9/0xe0
> [  118.679707]  [<ffffffff8128ff50>] xenwatch_thread+0xb0/0x170
> 
> And the problem is at DomU boot, Dom0 works without any problems.
Ok, but I am still unsure where it is hanging in DomU. Can you run with
''console=hvc0 debug initcall_debug loglevel=8 earlyprintk=xen''
to get an idea
of what is stuck in the guest? You might also have better luck using
''xenctx'' to get a stack trace of what is hangning in the
guest.
(you will need the System.map file from the guest''s kernel.. but that
should
be fairly easy to extract).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2011-Aug-29 21:28 UTC

head link

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

On Mon, Aug 29, 2011 at 04:59:38PM -0400, Konrad Rzeszutek Wilk
wrote:> On Mon, Aug 29, 2011 at 10:21:23PM +0200, Marek Marczykowski wrote:
> > On 29.08.2011 22:07, Konrad Rzeszutek Wilk wrote:
> > > On Sun, Aug 28, 2011 at 03:13:46PM +0200, Marek Marczykowski
wrote:
> > >> Hey,
> > >>
> > >> I''m experiencing strange problem: non-deterministic
PV domain hang, only
> > >> on some machines (with fast SSD drive). I''ve tried
xen-4.1.0 and
> > >> xen-4.1.1 with many kernels different kernels:
> > >> VM:
> > >>  - 2.6.38.3 xenlinux based on SUSE package
> > >>  - vanilla 3.0.3
> > >>  - vanilla 3.1 rc2
> > >> dom0:
> > >>  - 2.6.38.3 xenlinux based on SUSE package
> > >>  - vanilla 3.1 rc2
> > >>
> > >> Result always the same: sometimes VM hang at startup, SysRq-T
shows
> > >> modprobe waiting in "wait_for_devices" (concretely
schedule_timeout) and
> > >> jiffies counter not increasing between task-states dumps.
> > >>
> > >> The only found thing (probably) connected with this problem
are domU
> > >> kernel messages:
> > >> CE: xen increased min_delta_ns to 150000 nsec
> > >> (...)
> > >> CE: xen increased min_delta_ns to 4000000 nsec
> > >> CE: Reprogramming failure. Giving up
> > >>
> > >> This messages doesn''t exists in successful boot.
> > >>
> > >> I''ve also tried some options to xen and domU kernel,
but without success
> > >> (all combinations):
> > > 
> > > BTW, your ''xencons=..'' and
''swiotlb=force'' are obsolete. Use
> > > ''console=hvc0'' and
''iommu=soft''. The ''swiotlb=force'' kills
performance.
> > > 
> > >> xen: tsc=unstable, cpufreq=none
> > >> domU: nohz=off, clocksource=tsc
> > >>
> > >> Some combination of above options lowered frequency of
problem (ex
> > >> tsc=unstable + nohz=off), but it happens quite often - like 1
of 15
> > >> boots fails.
> > >>
> > >> Have you idea what is the cause and what can help?
> > > 
> > > The problem looks to be xenwatch stuck. So the problem is in Dom0
right?
> > 
> > This "R" state of xenwatch looks like result of SysRq, which
dumps data...
> > 
> > [  118.679707]  [<ffffffff812a8081>] handle_sysrq+0x21/0x30
> > [  118.679707]  [<ffffffff8128db49>] sysrq_handler+0xb9/0xe0
> > [  118.679707]  [<ffffffff8128ff50>] xenwatch_thread+0xb0/0x170
> > 
> > And the problem is at DomU boot, Dom0 works without any problems.
> 
> Ok, but I am still unsure where it is hanging in DomU. Can you run with
> ''console=hvc0 debug initcall_debug loglevel=8
earlyprintk=xen'' to get an idea
> of what is stuck in the guest? You might also have better luck using
> ''xenctx'' to get a stack trace of what is hangning in the
guest.
> (you will need the System.map file from the guest''s kernel.. but
that should
> be fairly easy to extract).
> 
xenctx usage:
http://wiki.xen.org/xenwiki/XenCommonProblems#head-61843b32f0243b5ad0e17850f9493bffd80f8c17

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Marek Marczykowski

2011-Aug-30 17:18 UTC

head link

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

On 29.08.2011 22:59, Konrad Rzeszutek Wilk wrote:> Ok, but I am still unsure where it is hanging in DomU. Can you run with
> ''console=hvc0 debug initcall_debug loglevel=8
earlyprintk=xen'' to get an idea
> of what is stuck in the guest? 
With "initcall_debug" parameter problem does not appear (at least for
200 domU starts)... It looks like race condition which doesn''t happens
on slowed down kernel (by printing lots of debug info). This also
explains why this bug appears only on fast hardware.
> You might also have better luck using
> ''xenctx'' to get a stack trace of what is hangning in the
guest.
> (you will need the System.map file from the guest''s kernel.. but
that should
> be fairly easy to extract).
xenctx didn''t provide any useful data :/ It always shows following
trace
for hanged domU:
-----------------
rip: ffffffff810013aa hypercall_page+0x3aa
flags: 00001246 i z p
rsp: ffffffff81801ee0
rax: 0000000000000000	rcx: ffffffff810013aa	rdx: 0000000000000000
rbx: ffffffff81800010	rsi: 00000000deadbeef	rdi: 00000000deadbeef
rbp: ffffffff81801ef8	 r8: 0000000000000000	 r9: 0000000000000000
r10: 0000000000000000	r11: 0000000000000246	r12: 0000000000000000
r13: 0000000000000000	r14: ffffffffffffffff	r15: 0000000000000000
 cs: e033	 ss: e02b	 ds: 0000	 es: 0000
 fs: 0000 @ 0000000000000000
 gs: 0000 @ ffff880018ee7000/0000000000000000
Code (instr addr ffffffff810013aa)
cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b
59 c3 cc cc cc cc cc cc cc


Stack:
 0000000000000000 0000000000000000 ffffffff810072a0 ffffffff81801f18
 ffffffff81012528 ffffffff81800010 ffffffff8185a2a0 ffffffff81801f38
 ffffffff81009faf 0000000000000000 6db6db6db6db6db7 ffffffff81801f48
 ffffffff813fb388 ffffffff81801f88 ffffffff81875c79 ffffffff81801f88

Call Trace:
  [<ffffffff810013aa>] hypercall_page+0x3aa  <--
  [<ffffffff810072a0>] xen_safe_halt+0x10
  [<ffffffff81012528>] default_idle+0x58
  [<ffffffff81009faf>] cpu_idle+0x5f
  [<ffffffff813fb388>] rest_init+0x68
  [<ffffffff81875c79>] start_kernel+0x36f
  [<ffffffff81875346>] x86_64_start_reservations+0x131
  [<ffffffff81878245>] xen_start_kernel+0x5f1
------------------

I''ve collected few more messages from successful and failed domU
starts.
The only difference is the place where "Switched to NOHz mode on CPU
#0"
appears and existence of "CE: xen increased min_delta_ns to ..." and
"CE: Reprogramming failure. Giving up" messages.

I think it can be related to:
http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00649.html
(this was on HVM not PV, but looks similar)

I''ve tried also xenpm set-max-cstate 0 and tsc_mode=1 in domU config,
but it doesn''t help. Also pinning vcpu doesn''t help (this
domUs have
only 1 vcpu). Is ''xenpm set-max-cstate 0'' the same as booting
xen with
max_cstate=0?

-- 
Pozdrawiam / Best Regards,
Marek Marczykowski         | RLU #390519
marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl











_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Marek Marczykowski

2011-Aug-31 16:27 UTC

head link

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

On 30.08.2011 19:18, Marek Marczykowski wrote:> On 29.08.2011 22:59, Konrad Rzeszutek Wilk wrote:
>> Ok, but I am still unsure where it is hanging in DomU. Can you run with
>> ''console=hvc0 debug initcall_debug loglevel=8
earlyprintk=xen'' to get an idea
>> of what is stuck in the guest? 
> 
> With "initcall_debug" parameter problem does not appear (at least
for
> 200 domU starts)... It looks like race condition which doesn''t
happens
> on slowed down kernel (by printing lots of debug info). This also
> explains why this bug appears only on fast hardware.
> 
>> You might also have better luck using
>> ''xenctx'' to get a stack trace of what is hangning in
the guest.
>> (you will need the System.map file from the guest''s kernel..
but that should
>> be fairly easy to extract).
> 
> xenctx didn''t provide any useful data :/ It always shows following
trace
> for hanged domU:
> -----------------
> rip: ffffffff810013aa hypercall_page+0x3aa
> flags: 00001246 i z p
> rsp: ffffffff81801ee0
> rax: 0000000000000000	rcx: ffffffff810013aa	rdx: 0000000000000000
> rbx: ffffffff81800010	rsi: 00000000deadbeef	rdi: 00000000deadbeef
> rbp: ffffffff81801ef8	 r8: 0000000000000000	 r9: 0000000000000000
> r10: 0000000000000000	r11: 0000000000000246	r12: 0000000000000000
> r13: 0000000000000000	r14: ffffffffffffffff	r15: 0000000000000000
>  cs: e033	 ss: e02b	 ds: 0000	 es: 0000
>  fs: 0000 @ 0000000000000000
>  gs: 0000 @ ffff880018ee7000/0000000000000000
> Code (instr addr ffffffff810013aa)
> cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41>
5b
> 59 c3 cc cc cc cc cc cc cc
> 
> 
> Stack:
>  0000000000000000 0000000000000000 ffffffff810072a0 ffffffff81801f18
>  ffffffff81012528 ffffffff81800010 ffffffff8185a2a0 ffffffff81801f38
>  ffffffff81009faf 0000000000000000 6db6db6db6db6db7 ffffffff81801f48
>  ffffffff813fb388 ffffffff81801f88 ffffffff81875c79 ffffffff81801f88
> 
> Call Trace:
>   [<ffffffff810013aa>] hypercall_page+0x3aa  <--
>   [<ffffffff810072a0>] xen_safe_halt+0x10
>   [<ffffffff81012528>] default_idle+0x58
>   [<ffffffff81009faf>] cpu_idle+0x5f
>   [<ffffffff813fb388>] rest_init+0x68
>   [<ffffffff81875c79>] start_kernel+0x36f
>   [<ffffffff81875346>] x86_64_start_reservations+0x131
>   [<ffffffff81878245>] xen_start_kernel+0x5f1
> ------------------
> 
> I''ve collected few more messages from successful and failed domU
starts.
> The only difference is the place where "Switched to NOHz mode on CPU
#0"
> appears and existence of "CE: xen increased min_delta_ns to ..."
and
> "CE: Reprogramming failure. Giving up" messages.
> 
> I think it can be related to:
> http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00649.html
> (this was on HVM not PV, but looks similar)
> 
> I''ve tried also xenpm set-max-cstate 0 and tsc_mode=1 in domU
config,
> but it doesn''t help. Also pinning vcpu doesn''t help (this
domUs have
> only 1 vcpu). Is ''xenpm set-max-cstate 0'' the same as
booting xen with
> max_cstate=0?
Looks like tsc_mode=2 solves the problem.

-- 
Pozdrawiam / Best Regards,
Marek Marczykowski         | RLU #390519
marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2011-Aug-31 20:00 UTC

head link

RE: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

> > I''ve tried also xenpm set-max-cstate 0 and tsc_mode=1 in domU
config,
> > but it doesn''t help. Also pinning vcpu doesn''t help
(this domUs have
> > only 1 vcpu). Is ''xenpm set-max-cstate 0'' the same
as booting xen with
> > max_cstate=0?
> 
> Looks like tsc_mode=2 solves the problem.
It''s unlikely that it SOLVES the problem, but only changes
timings so that it effectively works around whatever the real
problem is.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Marek Marczykowski

2011-Aug-31 20:49 UTC

head link

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

On 31.08.2011 22:00, Dan Magenheimer wrote:>>> I''ve tried also xenpm set-max-cstate 0 and tsc_mode=1 in
domU config,
>>> but it doesn''t help. Also pinning vcpu doesn''t
help (this domUs have
>>> only 1 vcpu). Is ''xenpm set-max-cstate 0'' the
same as booting xen with
>>> max_cstate=0?
>>
>> Looks like tsc_mode=2 solves the problem.
> 
> It''s unlikely that it SOLVES the problem, but only changes
> timings so that it effectively works around whatever the real
> problem is.
Some additional information I''ve found during debugging this problem:
clockevent_program_event returns -ETIME:
------------ kernel/time/clockevents.c:
/**
 * clockevents_program_event - Reprogram the clock event device.
 * @expires:    absolute expiry time (monotonic clock)
 *
 * Returns 0 on success, -ETIME when the event is in the past.
 */
int clockevents_program_event(struct clock_event_device *dev, ktime_t
expires,
                  ktime_t now)
-------------

xen_vcpuop_set_next_event schedules event by getting current time
(xen_clocksource_read()) (*1) adding delta (expires-now) and programming
event with VCPUOP_set_singleshot_timer hypercall. Then xen gets current
time (*2) and in some rare cases this time is after expected timer
expiration... Even after VCPUOP_set_singleshot_timer hypercal,
xen_clocksource_read() reports time slightly in the past comparing to
xen time (reported by NOW() macro).

I think this is because "current" time is calculated different way in
*1
and *2. The *1 way is controlled by tsc_mode, which is described here:
http://lxr.xensource.com/lxr/source/docs/misc/tscmode.txt. Default
tsc_mode=0 is "smart" and I think because of that can be slightly
before
NOW() time. tsc_mode=2 looks almost the same as NOW() macro works.

Is this reasoning correct?

-- 
Pozdrawiam / Best Regards,
Marek Marczykowski         | RLU #390519
marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Aug-31 21:01 UTC

head link

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

On 31/08/2011 21:49, "Marek Marczykowski"
<marmarek@mimuw.edu.pl> wrote:
> xen_vcpuop_set_next_event schedules event by getting current time
> (xen_clocksource_read()) (*1) adding delta (expires-now) and programming
> event with VCPUOP_set_singleshot_timer hypercall. Then xen gets current
> time (*2) and in some rare cases this time is after expected timer
> expiration... Even after VCPUOP_set_singleshot_timer hypercal,
> xen_clocksource_read() reports time slightly in the past comparing to
> xen time (reported by NOW() macro).
> 
> I think this is because "current" time is calculated different
way in *1
> and *2. The *1 way is controlled by tsc_mode, which is described here:
> http://lxr.xensource.com/lxr/source/docs/misc/tscmode.txt. Default
> tsc_mode=0 is "smart" and I think because of that can be slightly
before
> NOW() time. tsc_mode=2 looks almost the same as NOW() macro works.
> 
> Is this reasoning correct?
They really ought to work out to the same thing. This will trivially be the
case with tsc_mode=2 because both guest and hypervisor will see the same
(real) values from RDTSC, and use the same offsets and sacle factors to turn
that into a current system time. When using emulated TSC in the guest
(tsc_mode=0,1) then the TSC values it sees, and the offsets and scale
factors it applies, are different. It is intended that it should result in
the same values being computed for NOW(), but I suppose something could be
going wrong there. By how much have you seen guest and hypervisor disagree?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Marek Marczykowski

2011-Aug-31 21:13 UTC

head link

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

On 31.08.2011 23:01, Keir Fraser wrote:> On 31/08/2011 21:49, "Marek Marczykowski"
<marmarek@mimuw.edu.pl> wrote:
> 
>> xen_vcpuop_set_next_event schedules event by getting current time
>> (xen_clocksource_read()) (*1) adding delta (expires-now) and
programming
>> event with VCPUOP_set_singleshot_timer hypercall. Then xen gets current
>> time (*2) and in some rare cases this time is after expected timer
>> expiration... Even after VCPUOP_set_singleshot_timer hypercal,
>> xen_clocksource_read() reports time slightly in the past comparing to
>> xen time (reported by NOW() macro).
>>
>> I think this is because "current" time is calculated
different way in *1
>> and *2. The *1 way is controlled by tsc_mode, which is described here:
>> http://lxr.xensource.com/lxr/source/docs/misc/tscmode.txt. Default
>> tsc_mode=0 is "smart" and I think because of that can be
slightly before
>> NOW() time. tsc_mode=2 looks almost the same as NOW() macro works.
>>
>> Is this reasoning correct?
> 
> They really ought to work out to the same thing. This will trivially be the
> case with tsc_mode=2 because both guest and hypervisor will see the same
> (real) values from RDTSC, and use the same offsets and sacle factors to
turn
> that into a current system time. When using emulated TSC in the guest
> (tsc_mode=0,1) then the TSC values it sees, and the offsets and scale
> factors it applies, are different. It is intended that it should result in
> the same values being computed for NOW(), but I suppose something could be
> going wrong there. 
NOW() calls get_s_time() which doesn''t look to be depended on tsc_mode
setting. Have I missed something?
> By how much have you seen guest and hypervisor disagree?
Adding printks in domU and hypervisor side using attached patches.

-- 
Pozdrawiam / Best Regards,
Marek Marczykowski         | RLU #390519
marmarek at mimuw edu pl   | xmpp:marmarek at staszic waw pl




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Aug-31 22:07 UTC

head link

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

On 31/08/2011 22:13, "Marek Marczykowski"
<marmarek@mimuw.edu.pl> wrote:
>> They really ought to work out to the same thing. This will trivially be
the
>> case with tsc_mode=2 because both guest and hypervisor will see the
same
>> (real) values from RDTSC, and use the same offsets and sacle factors to
turn
>> that into a current system time. When using emulated TSC in the guest
>> (tsc_mode=0,1) then the TSC values it sees, and the offsets and scale
>> factors it applies, are different. It is intended that it should result
in
>> the same values being computed for NOW(), but I suppose something could
be
>> going wrong there.
> 
> NOW() calls get_s_time() which doesn''t look to be depended on
tsc_mode
> setting. Have I missed something?
I mean the result of xen_clocksource_read() in the guest kernel, which we
expect to match the result of executing NOW() in the hypervisor. The former
does depend on tsc_mode because xen_clocksource_read() uses RDTSC.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Apparently Analagous Threads

Search for more apparently analagous threads

Xen devel - Aug 2011 - xen-4.1: PV domain hanging at startup, jiffies stopped

[Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

RE: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped

Apparently Analagous Threads