Peter Zijlstra
2018-Sep-18 12:48 UTC
[patch 09/11] x86/vdso: Simplify the invalid vclock case
On Tue, Sep 18, 2018 at 12:41:57PM +0200, Thomas Gleixner wrote:> On Tue, 18 Sep 2018, Thomas Gleixner wrote: > > On Tue, 18 Sep 2018, Thomas Gleixner wrote: > > > On Tue, 18 Sep 2018, Peter Zijlstra wrote: > > > > > Your memory serves you right. That's indeed observable on CPUs which > > > > > lack TSC_ADJUST. > > > > > > > > But, if the gtod code can observe this, then why doesn't the code that > > > > checks the sync? > > > > > > Because it depends where the involved CPUs are in the topology. The sync > > > code might just run on the same package an simply not see it. Yes, w/o > > > TSC_ADJUST the TSC sync code can just fail to see the havoc. > > > > Even with TSC adjust the TSC can be slightly off by design on multi-socket > > systems. > > Here are the gory details: > > https://lore.kernel.org/lkml/3c1737210708230408i7a8049a9m5db49e6c4d89ab62 at mail.gmail.com/ > > The changelog has an explanation as well. > > d8bb6f4c1670 ("x86: tsc prevent time going backwards") > > I still have one of the machines which is affected by this.Are we sure this isn't a load vs rdtsc reorder? Because if I look at the current code: notrace static u64 vread_tsc(void) { u64 ret = (u64)rdtsc_ordered(); u64 last = gtod->cycle_last; if (likely(ret >= last)) return ret; /* * GCC likes to generate cmov here, but this branch is extremely * predictable (it's just a function of time and the likely is * very likely) and there's a data dependence, so force GCC * to generate a branch instead. I don't barrier() because * we don't actually need a barrier, and if this function * ever gets inlined it will generate worse code. */ asm volatile (""); return last; } That does: lfence rdtsc load gtod->cycle_last Which obviously allows us to observe a cycles_last that is later than the rdtsc itself, and thus time can trivially go backwards. The new code: last = gtod->cycle_last; cycles = vgetcyc(gtod->vclock_mode); if (unlikely((s64)cycles < 0)) return vdso_fallback_gettime(clk, ts); if (cycles > last) ns += (cycles - last) * gtod->mult; looks like: load gtod->cycle_last lfence rdtsc which avoids that possibility, the cycle_last load must have completed before the rdtsc.
Thomas Gleixner
2018-Sep-18 13:23 UTC
[patch 09/11] x86/vdso: Simplify the invalid vclock case
On Tue, 18 Sep 2018, Peter Zijlstra wrote:> On Tue, Sep 18, 2018 at 12:41:57PM +0200, Thomas Gleixner wrote: > > I still have one of the machines which is affected by this. > > Are we sure this isn't a load vs rdtsc reorder? Because if I look at the > current code:The load order of last vs. rdtsc does not matter at all. CPU0 CPU1 .... now0 = rdtsc_ordered(); ... tk->cycle_last = now0; gtod->seq++; gtod->cycle_last = tk->cycle_last; ... gtod->seq++; seq_begin(gtod->seq); now1 = rdtsc_ordered(); So if the TSC on CPU1 is slightly behind the TSC on CPU0 then now1 can be smaller than cycle_last. The TSC sync stuff does not catch the small delta for unknown raisins. I'll go and find that machine and test that again. Thanks, tglx
Peter Zijlstra
2018-Sep-18 13:38 UTC
[patch 09/11] x86/vdso: Simplify the invalid vclock case
On Tue, Sep 18, 2018 at 03:23:08PM +0200, Thomas Gleixner wrote:> On Tue, 18 Sep 2018, Peter Zijlstra wrote: > > On Tue, Sep 18, 2018 at 12:41:57PM +0200, Thomas Gleixner wrote: > > > I still have one of the machines which is affected by this. > > > > Are we sure this isn't a load vs rdtsc reorder? Because if I look at the > > current code: > > The load order of last vs. rdtsc does not matter at all. > > CPU0 CPU1 > > .... > now0 = rdtsc_ordered(); > ... > tk->cycle_last = now0; > > gtod->seq++; > gtod->cycle_last = tk->cycle_last; > ... > gtod->seq++; > seq_begin(gtod->seq); > now1 = rdtsc_ordered(); > > So if the TSC on CPU1 is slightly behind the TSC on CPU0 then now1 can be > smaller than cycle_last. The TSC sync stuff does not catch the small delta > for unknown raisins. I'll go and find that machine and test that again.Yeah, somehow I forgot about rseq.. maybe I should go sleep or something.
Thomas Gleixner
2018-Sep-18 15:52 UTC
[patch 09/11] x86/vdso: Simplify the invalid vclock case
On Tue, 18 Sep 2018, Thomas Gleixner wrote:> So if the TSC on CPU1 is slightly behind the TSC on CPU0 then now1 can be > smaller than cycle_last. The TSC sync stuff does not catch the small delta > for unknown raisins. I'll go and find that machine and test that again.Of course it does not trigger anymore. We accumulated code between the point in timekeeping_advance() where the TSC is read and the update of the VDSO data. I'll might have to get an 2.6ish kernel booted on that machine and try with that again. /me shudders Thanks, tglx