Displaying 20 results from an estimated 23 matches for "rdtsc_ordered".
2018 Sep 18
3
[patch 09/11] x86/vdso: Simplify the invalid vclock case
...l.
>
> d8bb6f4c1670 ("x86: tsc prevent time going backwards")
>
> I still have one of the machines which is affected by this.
Are we sure this isn't a load vs rdtsc reorder? Because if I look at the
current code:
notrace static u64 vread_tsc(void)
{
u64 ret = (u64)rdtsc_ordered();
u64 last = gtod->cycle_last;
if (likely(ret >= last))
return ret;
/*
* GCC likes to generate cmov here, but this branch is extremely
* predictable (it's just a function of time and the likely is
* very likely) and there's a data dependence, so force GCC
* to generate...
2018 Sep 18
3
[patch 09/11] x86/vdso: Simplify the invalid vclock case
...l.
>
> d8bb6f4c1670 ("x86: tsc prevent time going backwards")
>
> I still have one of the machines which is affected by this.
Are we sure this isn't a load vs rdtsc reorder? Because if I look at the
current code:
notrace static u64 vread_tsc(void)
{
u64 ret = (u64)rdtsc_ordered();
u64 last = gtod->cycle_last;
if (likely(ret >= last))
return ret;
/*
* GCC likes to generate cmov here, but this branch is extremely
* predictable (it's just a function of time and the likely is
* very likely) and there's a data dependence, so force GCC
* to generate...
2017 Mar 03
1
[PATCH v3 2/3] x86/hyperv: move TSC reading method to asm/mshyperv.h
...available, probably should be unlikely.
> + /*
> + * Make sure we read sequence before we read other values from
> + * TSC page.
> + */
> + smp_rmb();
> +
> + scale = READ_ONCE(tsc_pg->tsc_scale);
> + offset = READ_ONCE(tsc_pg->tsc_offset);
> + cur_tsc = rdtsc_ordered();
Since you already have smp_ barriers and rdtsc_ordered is a barrier,
the compiler barriers (READ_ONCE()) shouldn't be necessary.
> +
> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> +
> + /*
> + * Make sure we read sequence after we read all other value...
2017 Mar 03
1
[PATCH v3 2/3] x86/hyperv: move TSC reading method to asm/mshyperv.h
...available, probably should be unlikely.
> + /*
> + * Make sure we read sequence before we read other values from
> + * TSC page.
> + */
> + smp_rmb();
> +
> + scale = READ_ONCE(tsc_pg->tsc_scale);
> + offset = READ_ONCE(tsc_pg->tsc_offset);
> + cur_tsc = rdtsc_ordered();
Since you already have smp_ barriers and rdtsc_ordered is a barrier,
the compiler barriers (READ_ONCE()) shouldn't be necessary.
> +
> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> +
> + /*
> + * Make sure we read sequence after we read all other value...
2018 Sep 14
0
[patch 10/11] x86/vdso: Move cycle_last handling into the caller
...(void)
{
const struct pvclock_vcpu_time_info *pvti = &get_pvti0()->pvti;
- u64 ret;
- u64 last;
u32 version;
+ u64 ret;
/*
* Note: The kernel and hypervisor must guarantee that cpu ID
@@ -111,13 +110,7 @@ static notrace u64 vread_pvclock(void)
ret = __pvclock_read_cycles(pvti, rdtsc_ordered());
} while (pvclock_read_retry(pvti, version));
- /* refer to vread_tsc() comment for rationale */
- last = gtod->cycle_last;
-
- if (likely(ret >= last))
- return ret;
-
- return last;
+ return ret;
}
#endif
#ifdef CONFIG_HYPERV_TSCPAGE
@@ -130,30 +123,10 @@ static notrace u64 vread...
2017 Feb 14
2
[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
Thomas Gleixner <tglx at linutronix.de> writes:
> On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote:
>
>> Hi,
>>
>> while we're still waiting for a definitive ACK from Microsoft that the
>> algorithm is good for SMP case (as we can't prevent the code in vdso from
>> migrating between CPUs) I'd like to send v2 with some modifications to keep
2017 Feb 14
2
[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
Thomas Gleixner <tglx at linutronix.de> writes:
> On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote:
>
>> Hi,
>>
>> while we're still waiting for a definitive ACK from Microsoft that the
>> algorithm is good for SMP case (as we can't prevent the code in vdso from
>> migrating between CPUs) I'd like to send v2 with some modifications to keep
2018 Sep 18
0
[patch 09/11] x86/vdso: Simplify the invalid vclock case
...1:57PM +0200, Thomas Gleixner wrote:
> > I still have one of the machines which is affected by this.
>
> Are we sure this isn't a load vs rdtsc reorder? Because if I look at the
> current code:
The load order of last vs. rdtsc does not matter at all.
CPU0 CPU1
....
now0 = rdtsc_ordered();
...
tk->cycle_last = now0;
gtod->seq++;
gtod->cycle_last = tk->cycle_last;
...
gtod->seq++;
seq_begin(gtod->seq);
now1 = rdtsc_ordered();
So if the TSC on CPU1 is slightly behind the TSC on CPU0 then now1 can be
smaller than cycle_last. The TSC sync stuff does not c...
2017 Mar 03
4
[PATCH v3 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
Hi,
merge window is about to close so I hope it's OK to make another try here.
Changes since v2:
- Add explicit READ_ONCE() to not rely on 'volatile' [Andy Lutomirski]
- rdtsc() -> rdtsc_ordered() [Andy Lutomirski]
- virt_rmb() -> smp_rmb() [Thomas Gleixner, Andy Lutomirski]
Thomas, Andy, it seems the only blocker for the series was the ambiguity with
TSC page read algorithm. I contacted Microsoft (through K. Y.) and asked what
we should do when we see 'seq=0'. The answer is:...
2017 Mar 03
4
[PATCH v3 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
Hi,
merge window is about to close so I hope it's OK to make another try here.
Changes since v2:
- Add explicit READ_ONCE() to not rely on 'volatile' [Andy Lutomirski]
- rdtsc() -> rdtsc_ordered() [Andy Lutomirski]
- virt_rmb() -> smp_rmb() [Thomas Gleixner, Andy Lutomirski]
Thomas, Andy, it seems the only blocker for the series was the ambiguity with
TSC page read algorithm. I contacted Microsoft (through K. Y.) and asked what
we should do when we see 'seq=0'. The answer is:...
2017 Feb 14
0
[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
...> (PATCH 2)? As members of struct ms_hyperv_tsc_page are volatile we don't
> need READ_ONCE(), compilers are not allowed to merge accesses. The
> resulting code looks good to me:
No, on multiple counts, unfortunately.
1. LFENCE is basically useless except for IO and for (Intel only)
rdtsc_ordered(). AFAIK there is literally no scenario under which
LFENCE is useful for access to normal memory.
2. The problem here has little to do with barriers. You're doing:
read seq;
read var1;
read var2;
read tsc;
read seq again;
If the hypervisor updates things between reading var1 and var2 or
be...
2017 Feb 15
2
[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
...struct ms_hyperv_tsc_page are volatile we don't
>> need READ_ONCE(), compilers are not allowed to merge accesses. The
>> resulting code looks good to me:
>
> No, on multiple counts, unfortunately.
>
> 1. LFENCE is basically useless except for IO and for (Intel only)
> rdtsc_ordered(). AFAIK there is literally no scenario under which
> LFENCE is useful for access to normal memory.
>
Interesting,
(For some reason I was under the impression that when I do
READ var1 -> reg1
READ var2 -> reg2
from normal memory reads can actually happen in any order and LFENCE
in...
2017 Feb 15
2
[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
...struct ms_hyperv_tsc_page are volatile we don't
>> need READ_ONCE(), compilers are not allowed to merge accesses. The
>> resulting code looks good to me:
>
> No, on multiple counts, unfortunately.
>
> 1. LFENCE is basically useless except for IO and for (Intel only)
> rdtsc_ordered(). AFAIK there is literally no scenario under which
> LFENCE is useful for access to normal memory.
>
Interesting,
(For some reason I was under the impression that when I do
READ var1 -> reg1
READ var2 -> reg2
from normal memory reads can actually happen in any order and LFENCE
in...
2018 Sep 17
11
[patch V2 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
Matt attempted to add CLOCK_TAI support to the VDSO clock_gettime()
implementation, which extended the clockid switch case and added yet
another slightly different copy of the same code.
Especially the extended switch case is problematic as the compiler tends to
generate a jump table which then requires to use retpolines. If jump tables
are disabled it adds yet another conditional to the existing
2018 Sep 19
0
[patch 09/11] x86/vdso: Simplify the invalid vclock case
...; (cycles-last)* ...". That should just be a "sub ; js ; ". It's an extra
> load of ->cycle_last, but only on the path where we're heading for the
> fallback anyway. The value of 1 can be adjusted so that in the "js"
> path, we could detect and accept an rdtsc_ordered() call that's just a
> few 10s of cycles behind last and treat that as 0 and continue back on
> the normal path. But maybe it's hard to get gcc to generate the expected
> code.
I played around with a lot of variants and GCC generates all kinds of
interesting ASM. And at some point...
2017 Mar 03
0
[PATCH v3 2/3] x86/hyperv: move TSC reading method to asm/mshyperv.h
...(1) {
+ sequence = READ_ONCE(tsc_pg->tsc_sequence);
+ if (!sequence)
+ break;
+ /*
+ * Make sure we read sequence before we read other values from
+ * TSC page.
+ */
+ smp_rmb();
+
+ scale = READ_ONCE(tsc_pg->tsc_scale);
+ offset = READ_ONCE(tsc_pg->tsc_offset);
+ cur_tsc = rdtsc_ordered();
+
+ current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
+
+ /*
+ * Make sure we read sequence after we read all other values
+ * from TSC page.
+ */
+ smp_rmb();
+
+ if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
+ return current_tick;
+ }
+
+ return U64_MAX;
+}
+
#els...
2018 Sep 18
2
[patch 09/11] x86/vdso: Simplify the invalid vclock case
On Tue, 18 Sep 2018, Thomas Gleixner wrote:
> On Tue, 18 Sep 2018, Peter Zijlstra wrote:
> > > Your memory serves you right. That's indeed observable on CPUs which
> > > lack TSC_ADJUST.
> >
> > But, if the gtod code can observe this, then why doesn't the code that
> > checks the sync?
>
> Because it depends where the involved CPUs are in the
2018 Sep 18
2
[patch 09/11] x86/vdso: Simplify the invalid vclock case
On Tue, 18 Sep 2018, Thomas Gleixner wrote:
> On Tue, 18 Sep 2018, Peter Zijlstra wrote:
> > > Your memory serves you right. That's indeed observable on CPUs which
> > > lack TSC_ADJUST.
> >
> > But, if the gtod code can observe this, then why doesn't the code that
> > checks the sync?
>
> Because it depends where the involved CPUs are in the
2018 Sep 14
0
[patch 09/11] x86/vdso: Simplify the invalid vclock case
..._pvclock(int *mo
do {
version = pvclock_read_begin(pvti);
- if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
- *mode = VCLOCK_NONE;
- return 0;
- }
+ if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)))
+ return U64_MAX;
ret = __pvclock_read_cycles(pvti, rdtsc_ordered());
} while (pvclock_read_retry(pvti, version));
@@ -148,17 +121,12 @@ static notrace u64 vread_pvclock(int *mo
}
#endif
#ifdef CONFIG_HYPERV_TSCPAGE
-static notrace u64 vread_hvclock(int *mode)
+static notrace u64 vread_hvclock(void)
{
const struct ms_hyperv_tsc_page *tsc_pg =
(const st...
2018 Sep 18
3
[patch 09/11] x86/vdso: Simplify the invalid vclock case
> On Sep 18, 2018, at 12:52 AM, Thomas Gleixner <tglx at linutronix.de> wrote:
>
>> On Mon, 17 Sep 2018, John Stultz wrote:
>>> On Mon, Sep 17, 2018 at 12:25 PM, Andy Lutomirski <luto at kernel.org> wrote:
>>> Also, I'm not entirely convinced that this "last" thing is needed at
>>> all. John, what's the scenario under which we