thr3ads.net - Linux Virtualization - [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK

If this information is useful, please help other people find it:
Share via:

Vitaly Kuznetsov

2018-Oct-04 07:54 UTC

[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

Marcelo Tosatti <mtosatti at redhat.com> writes:
> On Wed, Oct 03, 2018 at 11:22:58AM +0200, Vitaly Kuznetsov wrote:
>> 
>> There is a very long history of different (hardware) issues Marcelo was
>> fighting with and the current code is the survived Frankenstein.
>
> Right, the code has to handle different TSC modes.
>
>>  E.g. it
>> is very, very unclear what "catchup", "always
catchup" and
>> masterclock-less mode in general are and if we still need them.
>
> Catchup means handling exposed (to guest) TSC frequency smaller than
> HW TSC frequency.
>
> That is "frankenstein" code, could be removed.
>
>> That said I'm all for simplification. I'm not sure if we still
need to
>> care about buggy hardware though.
>
> What simplification is that again? 
>
I was hoping to hear this from you :-) If I am to suggest how we can
move forward I'd propose:
- Check if pure TSC can be used on SkyLake+ systems (where TSC scaling
is supported).
- Check if non-masterclock mode is still needed. E.g. HyperV's TSC page
clocksource is a single page for the whole VM, not a per-cpu thing. Can
we think that all the buggy hardware is already gone?

-- 
Vitaly

Peter Zijlstra

2018-Oct-04 08:11 UTC

head link

[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

On Thu, Oct 04, 2018 at 09:54:45AM +0200, Vitaly Kuznetsov
wrote:> I was hoping to hear this from you :-) If I am to suggest how we can
> move forward I'd propose:
> - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling
> is supported).
> - Check if non-masterclock mode is still needed. E.g. HyperV's TSC page
> clocksource is a single page for the whole VM, not a per-cpu thing. Can
> we think that all the buggy hardware is already gone?
No, and it is not the hardware you have to worry about (mostly), it is
the frigging PoS firmware people put on it.

Ever since Nehalem TSC is stable (unless you get to >4 socket systems,
after which it still can be, but bets are off). But even relatively
recent systems fail the TSC sync test because firmware messes it up by
writing to either MSR_TSC or MSR_TSC_ADJUST.

But the thing is, if the TSC is not synced, you cannot use it for
timekeeping, full stop. So having a single page is fine, it either
contains a mult/shift that is valid, or it indicates TSC is messed up
and you fall back to something else.

There is no inbetween there.

For sched_clock we can still use the global page, because the rate will
still be the same for each cpu, it's just offset between CPUs and the
code compensates for that.

Paolo Bonzini

2018-Oct-04 12:00 UTC

head link

[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

On 04/10/2018 09:54, Vitaly Kuznetsov wrote:> - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling
> is supported).
Not if you want to migrate to pre-Skylake systems.
> - Check if non-masterclock mode is still needed. E.g. HyperV's TSC page
> clocksource is a single page for the whole VM, not a per-cpu thing. Can
> we think that all the buggy hardware is already gone?
No. :(  We still get reports whenever we break 2007-2008 hardware.

Paolo

Andy Lutomirski

2018-Oct-04 14:00 UTC

head link

[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

> On Oct 4, 2018, at 1:11 AM, Peter Zijlstra <peterz at infradead.org>
wrote:
> 
>> On Thu, Oct 04, 2018 at 09:54:45AM +0200, Vitaly Kuznetsov wrote:
>> I was hoping to hear this from you :-) If I am to suggest how we can
>> move forward I'd propose:
>> - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling
>> is supported).
>> - Check if non-masterclock mode is still needed. E.g. HyperV's TSC
page
>> clocksource is a single page for the whole VM, not a per-cpu thing. Can
>> we think that all the buggy hardware is already gone?
> 
> No, and it is not the hardware you have to worry about (mostly), it is
> the frigging PoS firmware people put on it.
> 
> Ever since Nehalem TSC is stable (unless you get to >4 socket systems,
> after which it still can be, but bets are off). But even relatively
> recent systems fail the TSC sync test because firmware messes it up by
> writing to either MSR_TSC or MSR_TSC_ADJUST.
> 
> But the thing is, if the TSC is not synced, you cannot use it for
> timekeeping, full stop. So having a single page is fine, it either
> contains a mult/shift that is valid, or it indicates TSC is messed up
> and you fall back to something else.
> 
> There is no inbetween there.
> 
> For sched_clock we can still use the global page, because the rate will
> still be the same for each cpu, it's just offset between CPUs and the
> code compensates for that.
But if we?re in a KVM guest, then the clock will jump around on the same *vCPU*
when the vCPU migrates.

But I don?t see how kvmclock helps here, since I don?t think it?s used for
sched_clock.

Andy Lutomirski

2018-Oct-04 14:04 UTC

head link

[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

> On Oct 4, 2018, at 5:00 AM, Paolo Bonzini <pbonzini at redhat.com>
wrote:
> 
>> On 04/10/2018 09:54, Vitaly Kuznetsov wrote:
>> - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling
>> is supported).
> 
> Not if you want to migrate to pre-Skylake systems.
> 
>> - Check if non-masterclock mode is still needed. E.g. HyperV's TSC
page
>> clocksource is a single page for the whole VM, not a per-cpu thing. Can
>> we think that all the buggy hardware is already gone?
> 
> No. :(  We still get reports whenever we break 2007-2008 hardware.
> 
> 
Does the KVM non-masterclock mode actually help?  It?s not clear to me exactly
how it?s supposed to work, but it seems like it?s trying to expose per-vCPU
adjustments to the guest. Which is dubious at best, since the guest can?t
validly use them for anything other than sched_clock, since they aren?t fully
corrected by anything KVM can do.

Marcelo Tosatti

2018-Oct-05 21:18 UTC

head link

[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

On Thu, Oct 04, 2018 at 09:54:45AM +0200, Vitaly Kuznetsov
wrote:> Marcelo Tosatti <mtosatti at redhat.com> writes:
> 
> > On Wed, Oct 03, 2018 at 11:22:58AM +0200, Vitaly Kuznetsov wrote:
> >> 
> >> There is a very long history of different (hardware) issues
Marcelo was
> >> fighting with and the current code is the survived Frankenstein.
> >
> > Right, the code has to handle different TSC modes.
> >
> >>  E.g. it
> >> is very, very unclear what "catchup", "always
catchup" and
> >> masterclock-less mode in general are and if we still need them.
> >
> > Catchup means handling exposed (to guest) TSC frequency smaller than
> > HW TSC frequency.
> >
> > That is "frankenstein" code, could be removed.
> >
> >> That said I'm all for simplification. I'm not sure if we
still need to
> >> care about buggy hardware though.
> >
> > What simplification is that again? 
> >
> 
> I was hoping to hear this from you :-) If I am to suggest how we can
> move forward I'd propose:
> - Check if pure TSC can be used on SkyLake+ systems (where TSC scaling
> is supported).
In that case just use TSC clocksource on the guest directly: i am
writing code for that now (its faster than pvclock syscall).
> - Check if non-masterclock mode is still needed. E.g. HyperV's TSC page
> clocksource is a single page for the whole VM, not a per-cpu thing. Can
> we think that all the buggy hardware is already gone?
As Peter mentioned, non sync TSC hardware might exist in the future. 
And older hardware must be supported.

Reasonably Related Threads

Search for more seemingly similar threads

Linux Virtualization - Oct 2018 - [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

Reasonably Related Threads