Vitaly Kuznetsov
2018-Oct-04 07:54 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
Marcelo Tosatti <mtosatti at redhat.com> writes:> On Wed, Oct 03, 2018 at 11:22:58AM +0200, Vitaly Kuznetsov wrote: >> >> There is a very long history of different (hardware) issues Marcelo was >> fighting with and the current code is the survived Frankenstein. > > Right, the code has to handle different TSC modes. > >> E.g. it >> is very, very unclear what "catchup", "always catchup" and >> masterclock-less mode in general are and if we still need them. > > Catchup means handling exposed (to guest) TSC frequency smaller than > HW TSC frequency. > > That is "frankenstein" code, could be removed. > >> That said I'm all for simplification. I'm not sure if we still need to >> care about buggy hardware though. > > What simplification is that again? >I was hoping to hear this from you :-) If I am to suggest how we can move forward I'd propose: - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling is supported). - Check if non-masterclock mode is still needed. E.g. HyperV's TSC page clocksource is a single page for the whole VM, not a per-cpu thing. Can we think that all the buggy hardware is already gone? -- Vitaly
Peter Zijlstra
2018-Oct-04 08:11 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
On Thu, Oct 04, 2018 at 09:54:45AM +0200, Vitaly Kuznetsov wrote:> I was hoping to hear this from you :-) If I am to suggest how we can > move forward I'd propose: > - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling > is supported). > - Check if non-masterclock mode is still needed. E.g. HyperV's TSC page > clocksource is a single page for the whole VM, not a per-cpu thing. Can > we think that all the buggy hardware is already gone?No, and it is not the hardware you have to worry about (mostly), it is the frigging PoS firmware people put on it. Ever since Nehalem TSC is stable (unless you get to >4 socket systems, after which it still can be, but bets are off). But even relatively recent systems fail the TSC sync test because firmware messes it up by writing to either MSR_TSC or MSR_TSC_ADJUST. But the thing is, if the TSC is not synced, you cannot use it for timekeeping, full stop. So having a single page is fine, it either contains a mult/shift that is valid, or it indicates TSC is messed up and you fall back to something else. There is no inbetween there. For sched_clock we can still use the global page, because the rate will still be the same for each cpu, it's just offset between CPUs and the code compensates for that.
Paolo Bonzini
2018-Oct-04 12:00 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
On 04/10/2018 09:54, Vitaly Kuznetsov wrote:> - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling > is supported).Not if you want to migrate to pre-Skylake systems.> - Check if non-masterclock mode is still needed. E.g. HyperV's TSC page > clocksource is a single page for the whole VM, not a per-cpu thing. Can > we think that all the buggy hardware is already gone?No. :( We still get reports whenever we break 2007-2008 hardware. Paolo
Andy Lutomirski
2018-Oct-04 14:00 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
> On Oct 4, 2018, at 1:11 AM, Peter Zijlstra <peterz at infradead.org> wrote: > >> On Thu, Oct 04, 2018 at 09:54:45AM +0200, Vitaly Kuznetsov wrote: >> I was hoping to hear this from you :-) If I am to suggest how we can >> move forward I'd propose: >> - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling >> is supported). >> - Check if non-masterclock mode is still needed. E.g. HyperV's TSC page >> clocksource is a single page for the whole VM, not a per-cpu thing. Can >> we think that all the buggy hardware is already gone? > > No, and it is not the hardware you have to worry about (mostly), it is > the frigging PoS firmware people put on it. > > Ever since Nehalem TSC is stable (unless you get to >4 socket systems, > after which it still can be, but bets are off). But even relatively > recent systems fail the TSC sync test because firmware messes it up by > writing to either MSR_TSC or MSR_TSC_ADJUST. > > But the thing is, if the TSC is not synced, you cannot use it for > timekeeping, full stop. So having a single page is fine, it either > contains a mult/shift that is valid, or it indicates TSC is messed up > and you fall back to something else. > > There is no inbetween there. > > For sched_clock we can still use the global page, because the rate will > still be the same for each cpu, it's just offset between CPUs and the > code compensates for that.But if we?re in a KVM guest, then the clock will jump around on the same *vCPU* when the vCPU migrates. But I don?t see how kvmclock helps here, since I don?t think it?s used for sched_clock.
Andy Lutomirski
2018-Oct-04 14:04 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
> On Oct 4, 2018, at 5:00 AM, Paolo Bonzini <pbonzini at redhat.com> wrote: > >> On 04/10/2018 09:54, Vitaly Kuznetsov wrote: >> - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling >> is supported). > > Not if you want to migrate to pre-Skylake systems. > >> - Check if non-masterclock mode is still needed. E.g. HyperV's TSC page >> clocksource is a single page for the whole VM, not a per-cpu thing. Can >> we think that all the buggy hardware is already gone? > > No. :( We still get reports whenever we break 2007-2008 hardware. > >Does the KVM non-masterclock mode actually help? It?s not clear to me exactly how it?s supposed to work, but it seems like it?s trying to expose per-vCPU adjustments to the guest. Which is dubious at best, since the guest can?t validly use them for anything other than sched_clock, since they aren?t fully corrected by anything KVM can do.
Marcelo Tosatti
2018-Oct-05 21:18 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
On Thu, Oct 04, 2018 at 09:54:45AM +0200, Vitaly Kuznetsov wrote:> Marcelo Tosatti <mtosatti at redhat.com> writes: > > > On Wed, Oct 03, 2018 at 11:22:58AM +0200, Vitaly Kuznetsov wrote: > >> > >> There is a very long history of different (hardware) issues Marcelo was > >> fighting with and the current code is the survived Frankenstein. > > > > Right, the code has to handle different TSC modes. > > > >> E.g. it > >> is very, very unclear what "catchup", "always catchup" and > >> masterclock-less mode in general are and if we still need them. > > > > Catchup means handling exposed (to guest) TSC frequency smaller than > > HW TSC frequency. > > > > That is "frankenstein" code, could be removed. > > > >> That said I'm all for simplification. I'm not sure if we still need to > >> care about buggy hardware though. > > > > What simplification is that again? > > > > I was hoping to hear this from you :-) If I am to suggest how we can > move forward I'd propose: > - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling > is supported).In that case just use TSC clocksource on the guest directly: i am writing code for that now (its faster than pvclock syscall).> - Check if non-masterclock mode is still needed. E.g. HyperV's TSC page > clocksource is a single page for the whole VM, not a per-cpu thing. Can > we think that all the buggy hardware is already gone?As Peter mentioned, non sync TSC hardware might exist in the future. And older hardware must be supported.
Reasonably Related Threads
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support