Andy Lutomirski
2018-Oct-04 20:05 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
> On Oct 4, 2018, at 12:31 PM, Peter Zijlstra <peterz at infradead.org> wrote: > > On Thu, Oct 04, 2018 at 07:00:45AM -0700, Andy Lutomirski wrote: >>> On Oct 4, 2018, at 1:11 AM, Peter Zijlstra <peterz at infradead.org> wrote: >>> >>>> On Thu, Oct 04, 2018 at 09:54:45AM +0200, Vitaly Kuznetsov wrote: >>>> I was hoping to hear this from you :-) If I am to suggest how we can >>>> move forward I'd propose: >>>> - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling >>>> is supported). >>>> - Check if non-masterclock mode is still needed. E.g. HyperV's TSC page >>>> clocksource is a single page for the whole VM, not a per-cpu thing. Can >>>> we think that all the buggy hardware is already gone? >>> >>> No, and it is not the hardware you have to worry about (mostly), it is >>> the frigging PoS firmware people put on it. >>> >>> Ever since Nehalem TSC is stable (unless you get to >4 socket systems, >>> after which it still can be, but bets are off). But even relatively >>> recent systems fail the TSC sync test because firmware messes it up by >>> writing to either MSR_TSC or MSR_TSC_ADJUST. >>> >>> But the thing is, if the TSC is not synced, you cannot use it for >>> timekeeping, full stop. So having a single page is fine, it either >>> contains a mult/shift that is valid, or it indicates TSC is messed up >>> and you fall back to something else. >>> >>> There is no inbetween there. >>> >>> For sched_clock we can still use the global page, because the rate will >>> still be the same for each cpu, it's just offset between CPUs and the >>> code compensates for that. >> >> But if we?re in a KVM guest, then the clock will jump around on the >> same *vCPU* when the vCPU migrates. > > Urgh yes.. > >> But I don?t see how kvmclock helps here, since I don?t think it?s used >> for sched_clock. > > I get hits on kvm_sched_clock, but haven't looked further.Ok, so KVM is using the per-vCPU pvclock data for sched_clock. Which hopefully does something intelligent. Regardless of any TSC syncing issues, a paravirt clock should presumably be used for sched_clock to account for time that the vCPU was stopped. It would be fantastic if this stuff were documented much better, both in terms of the data structures and the Linux code.
Andy Lutomirski
2018-Oct-04 22:15 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
For better or for worse, I'm trying to understand this code. So far, I've come up with this patch: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso-tglx&id=14fd71e12b1c4492a06f368f75041f263e6862bf Is it correct, or am I missing some subtlety?
Marcelo Tosatti
2018-Oct-06 20:27 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
On Thu, Oct 04, 2018 at 03:15:32PM -0700, Andy Lutomirski wrote:> For better or for worse, I'm trying to understand this code. So far, > I've come up with this patch: > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso-tglx&id=14fd71e12b1c4492a06f368f75041f263e6862bf > > Is it correct, or am I missing some subtlety?The master clock, when initialized, has a pair masterclockvalues=(TSC value, time-of-day data). When updating the guest clock, we only update relative to (TSC value) that was read on masterclock initialization. See the following comment on x86.c: /* * * Assuming a stable TSC across physical CPUS, and a stable TSC * across virtual CPUs, the following condition is possible. * Each numbered line represents an event visible to both * CPUs at the next numbered event. ... When updating the "masterclockvalues" pair, all vcpus are stopped.
Marcelo Tosatti
2018-Oct-06 20:49 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
On Thu, Oct 04, 2018 at 03:15:32PM -0700, Andy Lutomirski wrote:> For better or for worse, I'm trying to understand this code. So far, > I've come up with this patch: > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso-tglx&id=14fd71e12b1c4492a06f368f75041f263e6862bf > > Is it correct, or am I missing some subtlety?"In the non-fallback case, a bunch of gnarly arithmetic is done: a hopefully matched pair of (TSC, boot time) is read, then all locks are dropped, then the TSC frequency is read, a branch new multiplier and shift is read, and the result is turned into a time. This seems quite racy to me. Because locks are not held, I don't see what keeps TSC frequency used in sync with the master clock data." If tsc_khz changes, the host TSC clocksource will not be used, which disables masterclock: if ((val == CPUFREQ_PRECHANGE && freq->old < freq->new) || (val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) { *lpj = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new); tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new); if (!(freq->flags & CPUFREQ_CONST_LOOPS)) mark_tsc_unstable("cpufreq changes"); In which case it ends up in: - spin_lock(&ka->pvclock_gtod_sync_lock); - if (!ka->use_master_clock) { - spin_unlock(&ka->pvclock_gtod_sync_lock); - return ktime_get_boot_ns() + ka->kvmclock_offset; - } masterclock -> non masterclock transition sets a REQUEST bit on each vCPU, so as to invalidate any previous clock reads. static void kvm_gen_update_masterclock(struct kvm *kvm) { #ifdef CONFIG_X86_64 int i; struct kvm_vcpu *vcpu; struct kvm_arch *ka = &kvm->arch; spin_lock(&ka->pvclock_gtod_sync_lock); kvm_make_mclock_inprogress_request(kvm); /* no guest entries from this point */ pvclock_update_vm_gtod_copy(kvm); kvm_for_each_vcpu(i, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); /* guest entries allowed */ kvm_for_each_vcpu(i, vcpu, kvm) kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); spin_unlock(&ka->pvclock_gtod_sync_lock); #endif /* * If the host uses TSC clock, then passthrough TSC as stable * to the guest. */ host_tsc_clocksource = kvm_get_time_and_clockread( &ka->master_kernel_ns, &ka->master_cycle_now); ka->use_master_clock = host_tsc_clocksource && vcpus_matched && !ka->backwards_tsc_observed && !ka->boot_vcpu_runs_old_kvmclock;
Reasonably Related Threads
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support