Thomas Gleixner
2017-Feb-09 17:08 UTC
[PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:> +#ifdef CONFIG_HYPERV_TSCPAGE > +static notrace u64 vread_hvclock(int *mode) > +{ > + const struct ms_hyperv_tsc_page *tsc_pg > + (const struct ms_hyperv_tsc_page *)&hvclock_page; > + u64 sequence, scale, offset, current_tick, cur_tsc; > + > + while (1) { > + sequence = READ_ONCE(tsc_pg->tsc_sequence); > + if (!sequence) > + break; > + > + scale = READ_ONCE(tsc_pg->tsc_scale); > + offset = READ_ONCE(tsc_pg->tsc_offset); > + rdtscll(cur_tsc); > + > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; > + > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) > + return current_tick;That sequence stuff lacks still a sensible explanation. It's fundamentally different from the sequence counting we do in the kernel, so documentation for it is really required. Thanks, tglx
Stephen Hemminger
2017-Feb-09 18:27 UTC
[PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
Why not use existing seqlock's? -----Original Message----- From: Thomas Gleixner [mailto:tglx at linutronix.de] Sent: Thursday, February 9, 2017 9:08 AM To: Vitaly Kuznetsov <vkuznets at redhat.com> Cc: x86 at kernel.org; Andy Lutomirski <luto at amacapital.net>; Ingo Molnar <mingo at redhat.com>; H. Peter Anvin <hpa at zytor.com>; KY Srinivasan <kys at microsoft.com>; Haiyang Zhang <haiyangz at microsoft.com>; Stephen Hemminger <sthemmin at microsoft.com>; Dexuan Cui <decui at microsoft.com>; linux-kernel at vger.kernel.org; devel at linuxdriverproject.org; virtualization at lists.linux-foundation.org Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:> +#ifdef CONFIG_HYPERV_TSCPAGE > +static notrace u64 vread_hvclock(int *mode) { > + const struct ms_hyperv_tsc_page *tsc_pg > + (const struct ms_hyperv_tsc_page *)&hvclock_page; > + u64 sequence, scale, offset, current_tick, cur_tsc; > + > + while (1) { > + sequence = READ_ONCE(tsc_pg->tsc_sequence); > + if (!sequence) > + break; > + > + scale = READ_ONCE(tsc_pg->tsc_scale); > + offset = READ_ONCE(tsc_pg->tsc_offset); > + rdtscll(cur_tsc); > + > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; > + > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) > + return current_tick;That sequence stuff lacks still a sensible explanation. It's fundamentally different from the sequence counting we do in the kernel, so documentation for it is really required. Thanks, tglx
KY Srinivasan
2017-Feb-09 20:45 UTC
[PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
> -----Original Message----- > From: Thomas Gleixner [mailto:tglx at linutronix.de] > Sent: Thursday, February 9, 2017 9:08 AM > To: Vitaly Kuznetsov <vkuznets at redhat.com> > Cc: x86 at kernel.org; Andy Lutomirski <luto at amacapital.net>; Ingo Molnar > <mingo at redhat.com>; H. Peter Anvin <hpa at zytor.com>; KY Srinivasan > <kys at microsoft.com>; Haiyang Zhang <haiyangz at microsoft.com>; Stephen > Hemminger <sthemmin at microsoft.com>; Dexuan Cui > <decui at microsoft.com>; linux-kernel at vger.kernel.org; > devel at linuxdriverproject.org; virtualization at lists.linux-foundation.org > Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read > method > > On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote: > > +#ifdef CONFIG_HYPERV_TSCPAGE > > +static notrace u64 vread_hvclock(int *mode) > > +{ > > + const struct ms_hyperv_tsc_page *tsc_pg > > + (const struct ms_hyperv_tsc_page *)&hvclock_page; > > + u64 sequence, scale, offset, current_tick, cur_tsc; > > + > > + while (1) { > > + sequence = READ_ONCE(tsc_pg->tsc_sequence); > > + if (!sequence) > > + break; > > + > > + scale = READ_ONCE(tsc_pg->tsc_scale); > > + offset = READ_ONCE(tsc_pg->tsc_offset); > > + rdtscll(cur_tsc); > > + > > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; > > + > > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) > > + return current_tick; > > That sequence stuff lacks still a sensible explanation. It's fundamentally > different from the sequence counting we do in the kernel, so documentation > for it is really required.The host is updating multiple fields in this shared TSC page and the sequence number is used to ensure that the guest sees a consistent set values published. If I remember correctly, Xen has a similar mechanism. K. Y> > Thanks, > > tglx
Andy Lutomirski
2017-Feb-09 22:55 UTC
[PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan <kys at microsoft.com> wrote:> > >> -----Original Message----- >> From: Thomas Gleixner [mailto:tglx at linutronix.de] >> Sent: Thursday, February 9, 2017 9:08 AM >> To: Vitaly Kuznetsov <vkuznets at redhat.com> >> Cc: x86 at kernel.org; Andy Lutomirski <luto at amacapital.net>; Ingo Molnar >> <mingo at redhat.com>; H. Peter Anvin <hpa at zytor.com>; KY Srinivasan >> <kys at microsoft.com>; Haiyang Zhang <haiyangz at microsoft.com>; Stephen >> Hemminger <sthemmin at microsoft.com>; Dexuan Cui >> <decui at microsoft.com>; linux-kernel at vger.kernel.org; >> devel at linuxdriverproject.org; virtualization at lists.linux-foundation.org >> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read >> method >> >> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote: >> > +#ifdef CONFIG_HYPERV_TSCPAGE >> > +static notrace u64 vread_hvclock(int *mode) >> > +{ >> > + const struct ms_hyperv_tsc_page *tsc_pg >> > + (const struct ms_hyperv_tsc_page *)&hvclock_page; >> > + u64 sequence, scale, offset, current_tick, cur_tsc; >> > + >> > + while (1) { >> > + sequence = READ_ONCE(tsc_pg->tsc_sequence); >> > + if (!sequence) >> > + break; >> > + >> > + scale = READ_ONCE(tsc_pg->tsc_scale); >> > + offset = READ_ONCE(tsc_pg->tsc_offset); >> > + rdtscll(cur_tsc); >> > + >> > + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; >> > + >> > + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) >> > + return current_tick; >> >> That sequence stuff lacks still a sensible explanation. It's fundamentally >> different from the sequence counting we do in the kernel, so documentation >> for it is really required. > > The host is updating multiple fields in this shared TSC page and the sequence number is > used to ensure that the guest sees a consistent set values published. If I remember > correctly, Xen has a similar mechanism.So what's the actual protocol? When the hypervisor updates the page, does it freeze all guest cpus? If not, how does it maintain atomicity? --Andy
Vitaly Kuznetsov
2017-Feb-10 11:06 UTC
[PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
Thomas Gleixner <tglx at linutronix.de> writes:> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote: >> +#ifdef CONFIG_HYPERV_TSCPAGE >> +static notrace u64 vread_hvclock(int *mode) >> +{ >> + const struct ms_hyperv_tsc_page *tsc_pg >> + (const struct ms_hyperv_tsc_page *)&hvclock_page; >> + u64 sequence, scale, offset, current_tick, cur_tsc; >> + >> + while (1) { >> + sequence = READ_ONCE(tsc_pg->tsc_sequence); >> + if (!sequence) >> + break; >> + >> + scale = READ_ONCE(tsc_pg->tsc_scale); >> + offset = READ_ONCE(tsc_pg->tsc_offset); >> + rdtscll(cur_tsc); >> + >> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; >> + >> + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) >> + return current_tick; > > That sequence stuff lacks still a sensible explanation. It's fundamentally > different from the sequence counting we do in the kernel, so documentation > for it is really required.Sure, do you think the following would do? diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c index 4af10b4..886b600 100644 --- a/arch/x86/entry/vdso/vclock_gettime.c +++ b/arch/x86/entry/vdso/vclock_gettime.c @@ -154,6 +154,22 @@ static notrace u64 vread_hvclock(int *mode) (const struct ms_hyperv_tsc_page *)&hvclock_page; u64 sequence, scale, offset, current_tick, cur_tsc; + /* + * The protocol for reading Hyper-V TSC page is specified in Hypervisor + * Top-Level Functional Specification ver. 3.0 and above. To get the + * reference time we must do the following: + * - READ ReferenceTscSequence + * A special '0' value indicates the time source is unreliable and we + * need to use something else. The currently published specification + * versions (up to 4.0b) contain a mistake and wrongly claim '-1' + * instead of '0' as the special value, see commit c35b82ef0294. + * - ReferenceTime + * ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset + * - READ ReferenceTscSequence again. In case its value has changed + * since our first reading we need to discard ReferenceTime and repeat + * the whole sequence as the hypervisor was updating the page in + * between. + */ while (1) { sequence = READ_ONCE(tsc_pg->tsc_sequence); if (!sequence) -- Vitaly
Thomas Gleixner
2017-Feb-10 11:15 UTC
[PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote:> Thomas Gleixner <tglx at linutronix.de> writes: > > > On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote: > >> +#ifdef CONFIG_HYPERV_TSCPAGE > >> +static notrace u64 vread_hvclock(int *mode) > >> +{ > >> + const struct ms_hyperv_tsc_page *tsc_pg > >> + (const struct ms_hyperv_tsc_page *)&hvclock_page; > >> + u64 sequence, scale, offset, current_tick, cur_tsc; > >> + > >> + while (1) { > >> + sequence = READ_ONCE(tsc_pg->tsc_sequence); > >> + if (!sequence) > >> + break; > >> + > >> + scale = READ_ONCE(tsc_pg->tsc_scale); > >> + offset = READ_ONCE(tsc_pg->tsc_offset); > >> + rdtscll(cur_tsc); > >> + > >> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset; > >> + > >> + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence) > >> + return current_tick; > > > > That sequence stuff lacks still a sensible explanation. It's fundamentally > > different from the sequence counting we do in the kernel, so documentation > > for it is really required. > > Sure, do you think the following would do?Yes> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c > index 4af10b4..886b600 100644 > --- a/arch/x86/entry/vdso/vclock_gettime.c > +++ b/arch/x86/entry/vdso/vclock_gettime.c > @@ -154,6 +154,22 @@ static notrace u64 vread_hvclock(int *mode) > (const struct ms_hyperv_tsc_page *)&hvclock_page; > u64 sequence, scale, offset, current_tick, cur_tsc; > > + /* > + * The protocol for reading Hyper-V TSC page is specified in Hypervisor > + * Top-Level Functional Specification ver. 3.0 and above. To get the > + * reference time we must do the following: > + * - READ ReferenceTscSequence > + * A special '0' value indicates the time source is unreliable and we > + * need to use something else. The currently published specification > + * versions (up to 4.0b) contain a mistake and wrongly claim '-1' > + * instead of '0' as the special value, see commit c35b82ef0294. > + * - ReferenceTime > + * ((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset > + * - READ ReferenceTscSequence again. In case its value has changed > + * since our first reading we need to discard ReferenceTime and repeat > + * the whole sequence as the hypervisor was updating the page in > + * between. > + */ > while (1) { > sequence = READ_ONCE(tsc_pg->tsc_sequence); > if (!sequence) > > -- > Vitaly >
Vitaly Kuznetsov
2017-Feb-10 12:25 UTC
[PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
Stephen Hemminger <sthemmin at microsoft.com> writes:> Why not use existing seqlock's? >To be honest I don't quite understand how we could use it -- the sequence locking here is done against the page updated by the hypersior, we're not creating new structures (so I don't understand how we could use struct seqcount which we don't have) but I may be misunderstanding something. BTW, I just occured to me that I should've probably put the TSC reading code to mshyperv.h and use it from both vDSO and read_hv_clock_tsc() -- what do you thing? [snip] -- Vitaly
Apparently Analagous Threads
- [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
- [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
- [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
- [PATCH RFC 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method
- [PATCH RFC 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method