Marek Marczykowski
2011-Aug-28 13:13 UTC
[Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
Hey, I''m experiencing strange problem: non-deterministic PV domain hang, only on some machines (with fast SSD drive). I''ve tried xen-4.1.0 and xen-4.1.1 with many kernels different kernels: VM: - 2.6.38.3 xenlinux based on SUSE package - vanilla 3.0.3 - vanilla 3.1 rc2 dom0: - 2.6.38.3 xenlinux based on SUSE package - vanilla 3.1 rc2 Result always the same: sometimes VM hang at startup, SysRq-T shows modprobe waiting in "wait_for_devices" (concretely schedule_timeout) and jiffies counter not increasing between task-states dumps. The only found thing (probably) connected with this problem are domU kernel messages: CE: xen increased min_delta_ns to 150000 nsec (...) CE: xen increased min_delta_ns to 4000000 nsec CE: Reprogramming failure. Giving up This messages doesn''t exists in successful boot. I''ve also tried some options to xen and domU kernel, but without success (all combinations): xen: tsc=unstable, cpufreq=none domU: nohz=off, clocksource=tsc Some combination of above options lowered frequency of problem (ex tsc=unstable + nohz=off), but it happens quite often - like 1 of 15 boots fails. Have you idea what is the cause and what can help? Attached all relevant logs and configs: xl-dmesg: xl dmesg after failed domU start netvm-console-begin: kernel messages from failed domU netvm-console-sysrq-t-1: first domU SysRq-T netvm-console-sysrq-t-2: second domU SysRq-T netvm.conf: domU config xenstore-ls: result of xenstore-ls -fp dom0-dmesg: dom0 kernel messages config-xenlinux: 2.6.28.3 kernel config (same for dom0 and domU) config-pvops: 3.1rc2 kernel config (same for dom0 and domU) PS "script" prefix in domU vbd config is custom patch to libxl which implement xend behaviour of using hotplug script for VBD setup. -- Pozdrawiam / Best Regards, Marek Marczykowski | RLU #390519 marmarek at mimuw edu pl | xmpp:marmarek at staszic waw pl _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-29 20:07 UTC
Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
On Sun, Aug 28, 2011 at 03:13:46PM +0200, Marek Marczykowski wrote:> Hey, > > I''m experiencing strange problem: non-deterministic PV domain hang, only > on some machines (with fast SSD drive). I''ve tried xen-4.1.0 and > xen-4.1.1 with many kernels different kernels: > VM: > - 2.6.38.3 xenlinux based on SUSE package > - vanilla 3.0.3 > - vanilla 3.1 rc2 > dom0: > - 2.6.38.3 xenlinux based on SUSE package > - vanilla 3.1 rc2 > > Result always the same: sometimes VM hang at startup, SysRq-T shows > modprobe waiting in "wait_for_devices" (concretely schedule_timeout) and > jiffies counter not increasing between task-states dumps. > > The only found thing (probably) connected with this problem are domU > kernel messages: > CE: xen increased min_delta_ns to 150000 nsec > (...) > CE: xen increased min_delta_ns to 4000000 nsec > CE: Reprogramming failure. Giving up > > This messages doesn''t exists in successful boot. > > I''ve also tried some options to xen and domU kernel, but without success > (all combinations):BTW, your ''xencons=..'' and ''swiotlb=force'' are obsolete. Use ''console=hvc0'' and ''iommu=soft''. The ''swiotlb=force'' kills performance.> xen: tsc=unstable, cpufreq=none > domU: nohz=off, clocksource=tsc > > Some combination of above options lowered frequency of problem (ex > tsc=unstable + nohz=off), but it happens quite often - like 1 of 15 > boots fails. > > Have you idea what is the cause and what can help?The problem looks to be xenwatch stuck. So the problem is in Dom0 right? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Marek Marczykowski
2011-Aug-29 20:21 UTC
Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
On 29.08.2011 22:07, Konrad Rzeszutek Wilk wrote:> On Sun, Aug 28, 2011 at 03:13:46PM +0200, Marek Marczykowski wrote: >> Hey, >> >> I''m experiencing strange problem: non-deterministic PV domain hang, only >> on some machines (with fast SSD drive). I''ve tried xen-4.1.0 and >> xen-4.1.1 with many kernels different kernels: >> VM: >> - 2.6.38.3 xenlinux based on SUSE package >> - vanilla 3.0.3 >> - vanilla 3.1 rc2 >> dom0: >> - 2.6.38.3 xenlinux based on SUSE package >> - vanilla 3.1 rc2 >> >> Result always the same: sometimes VM hang at startup, SysRq-T shows >> modprobe waiting in "wait_for_devices" (concretely schedule_timeout) and >> jiffies counter not increasing between task-states dumps. >> >> The only found thing (probably) connected with this problem are domU >> kernel messages: >> CE: xen increased min_delta_ns to 150000 nsec >> (...) >> CE: xen increased min_delta_ns to 4000000 nsec >> CE: Reprogramming failure. Giving up >> >> This messages doesn''t exists in successful boot. >> >> I''ve also tried some options to xen and domU kernel, but without success >> (all combinations): > > BTW, your ''xencons=..'' and ''swiotlb=force'' are obsolete. Use > ''console=hvc0'' and ''iommu=soft''. The ''swiotlb=force'' kills performance. > >> xen: tsc=unstable, cpufreq=none >> domU: nohz=off, clocksource=tsc >> >> Some combination of above options lowered frequency of problem (ex >> tsc=unstable + nohz=off), but it happens quite often - like 1 of 15 >> boots fails. >> >> Have you idea what is the cause and what can help? > > The problem looks to be xenwatch stuck. So the problem is in Dom0 right?This "R" state of xenwatch looks like result of SysRq, which dumps data... [ 118.679707] [<ffffffff812a8081>] handle_sysrq+0x21/0x30 [ 118.679707] [<ffffffff8128db49>] sysrq_handler+0xb9/0xe0 [ 118.679707] [<ffffffff8128ff50>] xenwatch_thread+0xb0/0x170 And the problem is at DomU boot, Dom0 works without any problems. -- Pozdrawiam / Best Regards, Marek Marczykowski | RLU #390519 marmarek at mimuw edu pl | xmpp:marmarek at staszic waw pl _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Aug-29 20:59 UTC
Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
On Mon, Aug 29, 2011 at 10:21:23PM +0200, Marek Marczykowski wrote:> On 29.08.2011 22:07, Konrad Rzeszutek Wilk wrote: > > On Sun, Aug 28, 2011 at 03:13:46PM +0200, Marek Marczykowski wrote: > >> Hey, > >> > >> I''m experiencing strange problem: non-deterministic PV domain hang, only > >> on some machines (with fast SSD drive). I''ve tried xen-4.1.0 and > >> xen-4.1.1 with many kernels different kernels: > >> VM: > >> - 2.6.38.3 xenlinux based on SUSE package > >> - vanilla 3.0.3 > >> - vanilla 3.1 rc2 > >> dom0: > >> - 2.6.38.3 xenlinux based on SUSE package > >> - vanilla 3.1 rc2 > >> > >> Result always the same: sometimes VM hang at startup, SysRq-T shows > >> modprobe waiting in "wait_for_devices" (concretely schedule_timeout) and > >> jiffies counter not increasing between task-states dumps. > >> > >> The only found thing (probably) connected with this problem are domU > >> kernel messages: > >> CE: xen increased min_delta_ns to 150000 nsec > >> (...) > >> CE: xen increased min_delta_ns to 4000000 nsec > >> CE: Reprogramming failure. Giving up > >> > >> This messages doesn''t exists in successful boot. > >> > >> I''ve also tried some options to xen and domU kernel, but without success > >> (all combinations): > > > > BTW, your ''xencons=..'' and ''swiotlb=force'' are obsolete. Use > > ''console=hvc0'' and ''iommu=soft''. The ''swiotlb=force'' kills performance. > > > >> xen: tsc=unstable, cpufreq=none > >> domU: nohz=off, clocksource=tsc > >> > >> Some combination of above options lowered frequency of problem (ex > >> tsc=unstable + nohz=off), but it happens quite often - like 1 of 15 > >> boots fails. > >> > >> Have you idea what is the cause and what can help? > > > > The problem looks to be xenwatch stuck. So the problem is in Dom0 right? > > This "R" state of xenwatch looks like result of SysRq, which dumps data... > > [ 118.679707] [<ffffffff812a8081>] handle_sysrq+0x21/0x30 > [ 118.679707] [<ffffffff8128db49>] sysrq_handler+0xb9/0xe0 > [ 118.679707] [<ffffffff8128ff50>] xenwatch_thread+0xb0/0x170 > > And the problem is at DomU boot, Dom0 works without any problems.Ok, but I am still unsure where it is hanging in DomU. Can you run with ''console=hvc0 debug initcall_debug loglevel=8 earlyprintk=xen'' to get an idea of what is stuck in the guest? You might also have better luck using ''xenctx'' to get a stack trace of what is hangning in the guest. (you will need the System.map file from the guest''s kernel.. but that should be fairly easy to extract). _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2011-Aug-29 21:28 UTC
Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
On Mon, Aug 29, 2011 at 04:59:38PM -0400, Konrad Rzeszutek Wilk wrote:> On Mon, Aug 29, 2011 at 10:21:23PM +0200, Marek Marczykowski wrote: > > On 29.08.2011 22:07, Konrad Rzeszutek Wilk wrote: > > > On Sun, Aug 28, 2011 at 03:13:46PM +0200, Marek Marczykowski wrote: > > >> Hey, > > >> > > >> I''m experiencing strange problem: non-deterministic PV domain hang, only > > >> on some machines (with fast SSD drive). I''ve tried xen-4.1.0 and > > >> xen-4.1.1 with many kernels different kernels: > > >> VM: > > >> - 2.6.38.3 xenlinux based on SUSE package > > >> - vanilla 3.0.3 > > >> - vanilla 3.1 rc2 > > >> dom0: > > >> - 2.6.38.3 xenlinux based on SUSE package > > >> - vanilla 3.1 rc2 > > >> > > >> Result always the same: sometimes VM hang at startup, SysRq-T shows > > >> modprobe waiting in "wait_for_devices" (concretely schedule_timeout) and > > >> jiffies counter not increasing between task-states dumps. > > >> > > >> The only found thing (probably) connected with this problem are domU > > >> kernel messages: > > >> CE: xen increased min_delta_ns to 150000 nsec > > >> (...) > > >> CE: xen increased min_delta_ns to 4000000 nsec > > >> CE: Reprogramming failure. Giving up > > >> > > >> This messages doesn''t exists in successful boot. > > >> > > >> I''ve also tried some options to xen and domU kernel, but without success > > >> (all combinations): > > > > > > BTW, your ''xencons=..'' and ''swiotlb=force'' are obsolete. Use > > > ''console=hvc0'' and ''iommu=soft''. The ''swiotlb=force'' kills performance. > > > > > >> xen: tsc=unstable, cpufreq=none > > >> domU: nohz=off, clocksource=tsc > > >> > > >> Some combination of above options lowered frequency of problem (ex > > >> tsc=unstable + nohz=off), but it happens quite often - like 1 of 15 > > >> boots fails. > > >> > > >> Have you idea what is the cause and what can help? > > > > > > The problem looks to be xenwatch stuck. So the problem is in Dom0 right? > > > > This "R" state of xenwatch looks like result of SysRq, which dumps data... > > > > [ 118.679707] [<ffffffff812a8081>] handle_sysrq+0x21/0x30 > > [ 118.679707] [<ffffffff8128db49>] sysrq_handler+0xb9/0xe0 > > [ 118.679707] [<ffffffff8128ff50>] xenwatch_thread+0xb0/0x170 > > > > And the problem is at DomU boot, Dom0 works without any problems. > > Ok, but I am still unsure where it is hanging in DomU. Can you run with > ''console=hvc0 debug initcall_debug loglevel=8 earlyprintk=xen'' to get an idea > of what is stuck in the guest? You might also have better luck using > ''xenctx'' to get a stack trace of what is hangning in the guest. > (you will need the System.map file from the guest''s kernel.. but that should > be fairly easy to extract). >xenctx usage: http://wiki.xen.org/xenwiki/XenCommonProblems#head-61843b32f0243b5ad0e17850f9493bffd80f8c17 -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Marek Marczykowski
2011-Aug-30 17:18 UTC
Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
On 29.08.2011 22:59, Konrad Rzeszutek Wilk wrote:> Ok, but I am still unsure where it is hanging in DomU. Can you run with > ''console=hvc0 debug initcall_debug loglevel=8 earlyprintk=xen'' to get an idea > of what is stuck in the guest?With "initcall_debug" parameter problem does not appear (at least for 200 domU starts)... It looks like race condition which doesn''t happens on slowed down kernel (by printing lots of debug info). This also explains why this bug appears only on fast hardware.> You might also have better luck using > ''xenctx'' to get a stack trace of what is hangning in the guest. > (you will need the System.map file from the guest''s kernel.. but that should > be fairly easy to extract).xenctx didn''t provide any useful data :/ It always shows following trace for hanged domU: ----------------- rip: ffffffff810013aa hypercall_page+0x3aa flags: 00001246 i z p rsp: ffffffff81801ee0 rax: 0000000000000000 rcx: ffffffff810013aa rdx: 0000000000000000 rbx: ffffffff81800010 rsi: 00000000deadbeef rdi: 00000000deadbeef rbp: ffffffff81801ef8 r8: 0000000000000000 r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000246 r12: 0000000000000000 r13: 0000000000000000 r14: ffffffffffffffff r15: 0000000000000000 cs: e033 ss: e02b ds: 0000 es: 0000 fs: 0000 @ 0000000000000000 gs: 0000 @ ffff880018ee7000/0000000000000000 Code (instr addr ffffffff810013aa) cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc Stack: 0000000000000000 0000000000000000 ffffffff810072a0 ffffffff81801f18 ffffffff81012528 ffffffff81800010 ffffffff8185a2a0 ffffffff81801f38 ffffffff81009faf 0000000000000000 6db6db6db6db6db7 ffffffff81801f48 ffffffff813fb388 ffffffff81801f88 ffffffff81875c79 ffffffff81801f88 Call Trace: [<ffffffff810013aa>] hypercall_page+0x3aa <-- [<ffffffff810072a0>] xen_safe_halt+0x10 [<ffffffff81012528>] default_idle+0x58 [<ffffffff81009faf>] cpu_idle+0x5f [<ffffffff813fb388>] rest_init+0x68 [<ffffffff81875c79>] start_kernel+0x36f [<ffffffff81875346>] x86_64_start_reservations+0x131 [<ffffffff81878245>] xen_start_kernel+0x5f1 ------------------ I''ve collected few more messages from successful and failed domU starts. The only difference is the place where "Switched to NOHz mode on CPU #0" appears and existence of "CE: xen increased min_delta_ns to ..." and "CE: Reprogramming failure. Giving up" messages. I think it can be related to: http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00649.html (this was on HVM not PV, but looks similar) I''ve tried also xenpm set-max-cstate 0 and tsc_mode=1 in domU config, but it doesn''t help. Also pinning vcpu doesn''t help (this domUs have only 1 vcpu). Is ''xenpm set-max-cstate 0'' the same as booting xen with max_cstate=0? -- Pozdrawiam / Best Regards, Marek Marczykowski | RLU #390519 marmarek at mimuw edu pl | xmpp:marmarek at staszic waw pl _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Marek Marczykowski
2011-Aug-31 16:27 UTC
Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
On 30.08.2011 19:18, Marek Marczykowski wrote:> On 29.08.2011 22:59, Konrad Rzeszutek Wilk wrote: >> Ok, but I am still unsure where it is hanging in DomU. Can you run with >> ''console=hvc0 debug initcall_debug loglevel=8 earlyprintk=xen'' to get an idea >> of what is stuck in the guest? > > With "initcall_debug" parameter problem does not appear (at least for > 200 domU starts)... It looks like race condition which doesn''t happens > on slowed down kernel (by printing lots of debug info). This also > explains why this bug appears only on fast hardware. > >> You might also have better luck using >> ''xenctx'' to get a stack trace of what is hangning in the guest. >> (you will need the System.map file from the guest''s kernel.. but that should >> be fairly easy to extract). > > xenctx didn''t provide any useful data :/ It always shows following trace > for hanged domU: > ----------------- > rip: ffffffff810013aa hypercall_page+0x3aa > flags: 00001246 i z p > rsp: ffffffff81801ee0 > rax: 0000000000000000 rcx: ffffffff810013aa rdx: 0000000000000000 > rbx: ffffffff81800010 rsi: 00000000deadbeef rdi: 00000000deadbeef > rbp: ffffffff81801ef8 r8: 0000000000000000 r9: 0000000000000000 > r10: 0000000000000000 r11: 0000000000000246 r12: 0000000000000000 > r13: 0000000000000000 r14: ffffffffffffffff r15: 0000000000000000 > cs: e033 ss: e02b ds: 0000 es: 0000 > fs: 0000 @ 0000000000000000 > gs: 0000 @ ffff880018ee7000/0000000000000000 > Code (instr addr ffffffff810013aa) > cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b > 59 c3 cc cc cc cc cc cc cc > > > Stack: > 0000000000000000 0000000000000000 ffffffff810072a0 ffffffff81801f18 > ffffffff81012528 ffffffff81800010 ffffffff8185a2a0 ffffffff81801f38 > ffffffff81009faf 0000000000000000 6db6db6db6db6db7 ffffffff81801f48 > ffffffff813fb388 ffffffff81801f88 ffffffff81875c79 ffffffff81801f88 > > Call Trace: > [<ffffffff810013aa>] hypercall_page+0x3aa <-- > [<ffffffff810072a0>] xen_safe_halt+0x10 > [<ffffffff81012528>] default_idle+0x58 > [<ffffffff81009faf>] cpu_idle+0x5f > [<ffffffff813fb388>] rest_init+0x68 > [<ffffffff81875c79>] start_kernel+0x36f > [<ffffffff81875346>] x86_64_start_reservations+0x131 > [<ffffffff81878245>] xen_start_kernel+0x5f1 > ------------------ > > I''ve collected few more messages from successful and failed domU starts. > The only difference is the place where "Switched to NOHz mode on CPU #0" > appears and existence of "CE: xen increased min_delta_ns to ..." and > "CE: Reprogramming failure. Giving up" messages. > > I think it can be related to: > http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00649.html > (this was on HVM not PV, but looks similar) > > I''ve tried also xenpm set-max-cstate 0 and tsc_mode=1 in domU config, > but it doesn''t help. Also pinning vcpu doesn''t help (this domUs have > only 1 vcpu). Is ''xenpm set-max-cstate 0'' the same as booting xen with > max_cstate=0?Looks like tsc_mode=2 solves the problem. -- Pozdrawiam / Best Regards, Marek Marczykowski | RLU #390519 marmarek at mimuw edu pl | xmpp:marmarek at staszic waw pl _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2011-Aug-31 20:00 UTC
RE: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
> > I''ve tried also xenpm set-max-cstate 0 and tsc_mode=1 in domU config, > > but it doesn''t help. Also pinning vcpu doesn''t help (this domUs have > > only 1 vcpu). Is ''xenpm set-max-cstate 0'' the same as booting xen with > > max_cstate=0? > > Looks like tsc_mode=2 solves the problem.It''s unlikely that it SOLVES the problem, but only changes timings so that it effectively works around whatever the real problem is. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Marek Marczykowski
2011-Aug-31 20:49 UTC
Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
On 31.08.2011 22:00, Dan Magenheimer wrote:>>> I''ve tried also xenpm set-max-cstate 0 and tsc_mode=1 in domU config, >>> but it doesn''t help. Also pinning vcpu doesn''t help (this domUs have >>> only 1 vcpu). Is ''xenpm set-max-cstate 0'' the same as booting xen with >>> max_cstate=0? >> >> Looks like tsc_mode=2 solves the problem. > > It''s unlikely that it SOLVES the problem, but only changes > timings so that it effectively works around whatever the real > problem is.Some additional information I''ve found during debugging this problem: clockevent_program_event returns -ETIME: ------------ kernel/time/clockevents.c: /** * clockevents_program_event - Reprogram the clock event device. * @expires: absolute expiry time (monotonic clock) * * Returns 0 on success, -ETIME when the event is in the past. */ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires, ktime_t now) ------------- xen_vcpuop_set_next_event schedules event by getting current time (xen_clocksource_read()) (*1) adding delta (expires-now) and programming event with VCPUOP_set_singleshot_timer hypercall. Then xen gets current time (*2) and in some rare cases this time is after expected timer expiration... Even after VCPUOP_set_singleshot_timer hypercal, xen_clocksource_read() reports time slightly in the past comparing to xen time (reported by NOW() macro). I think this is because "current" time is calculated different way in *1 and *2. The *1 way is controlled by tsc_mode, which is described here: http://lxr.xensource.com/lxr/source/docs/misc/tscmode.txt. Default tsc_mode=0 is "smart" and I think because of that can be slightly before NOW() time. tsc_mode=2 looks almost the same as NOW() macro works. Is this reasoning correct? -- Pozdrawiam / Best Regards, Marek Marczykowski | RLU #390519 marmarek at mimuw edu pl | xmpp:marmarek at staszic waw pl _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Aug-31 21:01 UTC
Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
On 31/08/2011 21:49, "Marek Marczykowski" <marmarek@mimuw.edu.pl> wrote:> xen_vcpuop_set_next_event schedules event by getting current time > (xen_clocksource_read()) (*1) adding delta (expires-now) and programming > event with VCPUOP_set_singleshot_timer hypercall. Then xen gets current > time (*2) and in some rare cases this time is after expected timer > expiration... Even after VCPUOP_set_singleshot_timer hypercal, > xen_clocksource_read() reports time slightly in the past comparing to > xen time (reported by NOW() macro). > > I think this is because "current" time is calculated different way in *1 > and *2. The *1 way is controlled by tsc_mode, which is described here: > http://lxr.xensource.com/lxr/source/docs/misc/tscmode.txt. Default > tsc_mode=0 is "smart" and I think because of that can be slightly before > NOW() time. tsc_mode=2 looks almost the same as NOW() macro works. > > Is this reasoning correct?They really ought to work out to the same thing. This will trivially be the case with tsc_mode=2 because both guest and hypervisor will see the same (real) values from RDTSC, and use the same offsets and sacle factors to turn that into a current system time. When using emulated TSC in the guest (tsc_mode=0,1) then the TSC values it sees, and the offsets and scale factors it applies, are different. It is intended that it should result in the same values being computed for NOW(), but I suppose something could be going wrong there. By how much have you seen guest and hypervisor disagree? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Marek Marczykowski
2011-Aug-31 21:13 UTC
Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
On 31.08.2011 23:01, Keir Fraser wrote:> On 31/08/2011 21:49, "Marek Marczykowski" <marmarek@mimuw.edu.pl> wrote: > >> xen_vcpuop_set_next_event schedules event by getting current time >> (xen_clocksource_read()) (*1) adding delta (expires-now) and programming >> event with VCPUOP_set_singleshot_timer hypercall. Then xen gets current >> time (*2) and in some rare cases this time is after expected timer >> expiration... Even after VCPUOP_set_singleshot_timer hypercal, >> xen_clocksource_read() reports time slightly in the past comparing to >> xen time (reported by NOW() macro). >> >> I think this is because "current" time is calculated different way in *1 >> and *2. The *1 way is controlled by tsc_mode, which is described here: >> http://lxr.xensource.com/lxr/source/docs/misc/tscmode.txt. Default >> tsc_mode=0 is "smart" and I think because of that can be slightly before >> NOW() time. tsc_mode=2 looks almost the same as NOW() macro works. >> >> Is this reasoning correct? > > They really ought to work out to the same thing. This will trivially be the > case with tsc_mode=2 because both guest and hypervisor will see the same > (real) values from RDTSC, and use the same offsets and sacle factors to turn > that into a current system time. When using emulated TSC in the guest > (tsc_mode=0,1) then the TSC values it sees, and the offsets and scale > factors it applies, are different. It is intended that it should result in > the same values being computed for NOW(), but I suppose something could be > going wrong there.NOW() calls get_s_time() which doesn''t look to be depended on tsc_mode setting. Have I missed something?> By how much have you seen guest and hypervisor disagree?Adding printks in domU and hypervisor side using attached patches. -- Pozdrawiam / Best Regards, Marek Marczykowski | RLU #390519 marmarek at mimuw edu pl | xmpp:marmarek at staszic waw pl _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Aug-31 22:07 UTC
Re: [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
On 31/08/2011 22:13, "Marek Marczykowski" <marmarek@mimuw.edu.pl> wrote:>> They really ought to work out to the same thing. This will trivially be the >> case with tsc_mode=2 because both guest and hypervisor will see the same >> (real) values from RDTSC, and use the same offsets and sacle factors to turn >> that into a current system time. When using emulated TSC in the guest >> (tsc_mode=0,1) then the TSC values it sees, and the offsets and scale >> factors it applies, are different. It is intended that it should result in >> the same values being computed for NOW(), but I suppose something could be >> going wrong there. > > NOW() calls get_s_time() which doesn''t look to be depended on tsc_mode > setting. Have I missed something?I mean the result of xen_clocksource_read() in the guest kernel, which we expect to match the result of executing NOW() in the hypervisor. The former does depend on tsc_mode because xen_clocksource_read() uses RDTSC. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel