Marcelo Tosatti
2018-Oct-03 19:00 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
On Tue, Oct 02, 2018 at 10:15:49PM -0700, Andy Lutomirski wrote:> Hi Vitaly, Paolo, Radim, etc., > > On Fri, Sep 14, 2018 at 5:52 AM Thomas Gleixner <tglx at linutronix.de> wrote: > > > > Matt attempted to add CLOCK_TAI support to the VDSO clock_gettime() > > implementation, which extended the clockid switch case and added yet > > another slightly different copy of the same code. > > > > Especially the extended switch case is problematic as the compiler tends to > > generate a jump table which then requires to use retpolines. If jump tables > > are disabled it adds yet another conditional to the existing maze. > > > > This series takes a different approach by consolidating the almost > > identical functions into one implementation for high resolution clocks and > > one for the coarse grained clock ids by storing the base data for each > > clock id in an array which is indexed by the clock id. > > > > I was trying to understand more of the implications of this patch > series, and I was again reminded that there is an entire extra copy of > the vclock reading code in arch/x86/kvm/x86.c. And the purpose of > that code is very, very opaque. > > Can one of you explain what the code is even doing? From a couple of > attempts to read through it, it's a whole bunch of > probably-extremely-buggy code that,Yes, probably.> drumroll please, tries to atomically read the TSC value and the time. And decide whether the > result is "based on the TSC".I think "based on the TSC" refers to whether TSC clocksource is being used.> And then synthesizes a TSC-to-ns > multiplier and shift, based on *something other than the actual > multiply and shift used*. > > IOW, unless I'm totally misunderstanding it, the code digs into the > private arch clocksource data intended for the vDSO, uses a poorly > maintained copy of the vDSO code to read the time (instead of doing > the sane thing and using the kernel interfaces for this), and > propagates a totally made up copy to the guest.I posted kernel interfaces for this, and it was suggested to instead write a "in-kernel user of pvclock data". If you can get kernel interfaces to replace that, go for it. I prefer kernel interfaces as well.> And gets it entirely > wrong when doing nested virt, since, unless there's some secret in > this maze, it doesn't acutlaly use the scaling factor from the host > when it tells the guest what to do. > > I am really, seriously tempted to send a patch to simply delete all > this code.If your patch which deletes the code gets the necessary features right, sure, go for it.> The correct way to do it is to hookCan you expand on the correct way to do it?> And I don't see how it's even possible to pass kvmclock correctly to > the L2 guest when L0 is hyperv. KVM could pass *hyperv's* clock, but > L1 isn't notified when the data structure changes, so how the heck is > it supposed to update the kvmclock structure?I don't parse your question.> > --Andy
Marcelo Tosatti
2018-Oct-03 19:05 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support\
On Wed, Oct 03, 2018 at 04:00:29PM -0300, Marcelo Tosatti wrote:> On Tue, Oct 02, 2018 at 10:15:49PM -0700, Andy Lutomirski wrote: > > Hi Vitaly, Paolo, Radim, etc., > > > > On Fri, Sep 14, 2018 at 5:52 AM Thomas Gleixner <tglx at linutronix.de> wrote: > > > > > > Matt attempted to add CLOCK_TAI support to the VDSO clock_gettime() > > > implementation, which extended the clockid switch case and added yet > > > another slightly different copy of the same code. > > > > > > Especially the extended switch case is problematic as the compiler tends to > > > generate a jump table which then requires to use retpolines. If jump tables > > > are disabled it adds yet another conditional to the existing maze. > > > > > > This series takes a different approach by consolidating the almost > > > identical functions into one implementation for high resolution clocks and > > > one for the coarse grained clock ids by storing the base data for each > > > clock id in an array which is indexed by the clock id. > > > > > > > I was trying to understand more of the implications of this patch > > series, and I was again reminded that there is an entire extra copy of > > the vclock reading code in arch/x86/kvm/x86.c. And the purpose of > > that code is very, very opaque. > > > > Can one of you explain what the code is even doing? From a couple of > > attempts to read through it, it's a whole bunch of > > probably-extremely-buggy code that, > > Yes, probably. > > > drumroll please, tries to atomically read the TSC value and the time. And decide whether the > > result is "based on the TSC". > > I think "based on the TSC" refers to whether TSC clocksource is being > used. > > > And then synthesizes a TSC-to-ns > > multiplier and shift, based on *something other than the actual > > multiply and shift used*. > > > > IOW, unless I'm totally misunderstanding it, the code digs into the > > private arch clocksource data intended for the vDSO, uses a poorly > > maintained copy of the vDSO code to read the time (instead of doing > > the sane thing and using the kernel interfaces for this), and > > propagates a totally made up copy to the guest. > > I posted kernel interfaces for this, and it was suggested to > instead write a "in-kernel user of pvclock data". > > If you can get kernel interfaces to replace that, go for it. I prefer > kernel interfaces as well.And cleanup patches, to make that code look nicer, are also very welcome!
Andy Lutomirski
2018-Oct-03 22:32 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
On Wed, Oct 3, 2018 at 12:01 PM Marcelo Tosatti <mtosatti at redhat.com> wrote:> > On Tue, Oct 02, 2018 at 10:15:49PM -0700, Andy Lutomirski wrote: > > Hi Vitaly, Paolo, Radim, etc., > > > > On Fri, Sep 14, 2018 at 5:52 AM Thomas Gleixner <tglx at linutronix.de> wrote: > > > > > > Matt attempted to add CLOCK_TAI support to the VDSO clock_gettime() > > > implementation, which extended the clockid switch case and added yet > > > another slightly different copy of the same code. > > > > > > Especially the extended switch case is problematic as the compiler tends to > > > generate a jump table which then requires to use retpolines. If jump tables > > > are disabled it adds yet another conditional to the existing maze. > > > > > > This series takes a different approach by consolidating the almost > > > identical functions into one implementation for high resolution clocks and > > > one for the coarse grained clock ids by storing the base data for each > > > clock id in an array which is indexed by the clock id. > > > > > > > I was trying to understand more of the implications of this patch > > series, and I was again reminded that there is an entire extra copy of > > the vclock reading code in arch/x86/kvm/x86.c. And the purpose of > > that code is very, very opaque. > > > > Can one of you explain what the code is even doing? From a couple of > > attempts to read through it, it's a whole bunch of > > probably-extremely-buggy code that, > > Yes, probably. > > > drumroll please, tries to atomically read the TSC value and the time. And decide whether the > > result is "based on the TSC". > > I think "based on the TSC" refers to whether TSC clocksource is being > used. > > > And then synthesizes a TSC-to-ns > > multiplier and shift, based on *something other than the actual > > multiply and shift used*. > > > > IOW, unless I'm totally misunderstanding it, the code digs into the > > private arch clocksource data intended for the vDSO, uses a poorly > > maintained copy of the vDSO code to read the time (instead of doing > > the sane thing and using the kernel interfaces for this), and > > propagates a totally made up copy to the guest. > > I posted kernel interfaces for this, and it was suggested to > instead write a "in-kernel user of pvclock data". > > If you can get kernel interfaces to replace that, go for it. I prefer > kernel interfaces as well. > > > And gets it entirely > > wrong when doing nested virt, since, unless there's some secret in > > this maze, it doesn't acutlaly use the scaling factor from the host > > when it tells the guest what to do. > > > > I am really, seriously tempted to send a patch to simply delete all > > this code. > > If your patch which deletes the code gets the necessary features right, > sure, go for it. > > > The correct way to do it is to hook > > Can you expand on the correct way to do it? > > > And I don't see how it's even possible to pass kvmclock correctly to > > the L2 guest when L0 is hyperv. KVM could pass *hyperv's* clock, but > > L1 isn't notified when the data structure changes, so how the heck is > > it supposed to update the kvmclock structure? > > I don't parse your question.Let me ask it more intelligently: when the "reenlightenment" IRQ happens, what tells KVM to do its own update for its guests?
Marcelo Tosatti
2018-Oct-04 16:37 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
On Wed, Oct 03, 2018 at 03:32:08PM -0700, Andy Lutomirski wrote:> On Wed, Oct 3, 2018 at 12:01 PM Marcelo Tosatti <mtosatti at redhat.com> wrote: > > > > On Tue, Oct 02, 2018 at 10:15:49PM -0700, Andy Lutomirski wrote: > > > Hi Vitaly, Paolo, Radim, etc., > > > > > > On Fri, Sep 14, 2018 at 5:52 AM Thomas Gleixner <tglx at linutronix.de> wrote: > > > > > > > > Matt attempted to add CLOCK_TAI support to the VDSO clock_gettime() > > > > implementation, which extended the clockid switch case and added yet > > > > another slightly different copy of the same code. > > > > > > > > Especially the extended switch case is problematic as the compiler tends to > > > > generate a jump table which then requires to use retpolines. If jump tables > > > > are disabled it adds yet another conditional to the existing maze. > > > > > > > > This series takes a different approach by consolidating the almost > > > > identical functions into one implementation for high resolution clocks and > > > > one for the coarse grained clock ids by storing the base data for each > > > > clock id in an array which is indexed by the clock id. > > > > > > > > > > I was trying to understand more of the implications of this patch > > > series, and I was again reminded that there is an entire extra copy of > > > the vclock reading code in arch/x86/kvm/x86.c. And the purpose of > > > that code is very, very opaque. > > > > > > Can one of you explain what the code is even doing? From a couple of > > > attempts to read through it, it's a whole bunch of > > > probably-extremely-buggy code that, > > > > Yes, probably. > > > > > drumroll please, tries to atomically read the TSC value and the time. And decide whether the > > > result is "based on the TSC". > > > > I think "based on the TSC" refers to whether TSC clocksource is being > > used. > > > > > And then synthesizes a TSC-to-ns > > > multiplier and shift, based on *something other than the actual > > > multiply and shift used*. > > > > > > IOW, unless I'm totally misunderstanding it, the code digs into the > > > private arch clocksource data intended for the vDSO, uses a poorly > > > maintained copy of the vDSO code to read the time (instead of doing > > > the sane thing and using the kernel interfaces for this), and > > > propagates a totally made up copy to the guest. > > > > I posted kernel interfaces for this, and it was suggested to > > instead write a "in-kernel user of pvclock data". > > > > If you can get kernel interfaces to replace that, go for it. I prefer > > kernel interfaces as well. > > > > > And gets it entirely > > > wrong when doing nested virt, since, unless there's some secret in > > > this maze, it doesn't acutlaly use the scaling factor from the host > > > when it tells the guest what to do. > > > > > > I am really, seriously tempted to send a patch to simply delete all > > > this code. > > > > If your patch which deletes the code gets the necessary features right, > > sure, go for it. > > > > > The correct way to do it is to hook > > > > Can you expand on the correct way to do it? > > > > > And I don't see how it's even possible to pass kvmclock correctly to > > > the L2 guest when L0 is hyperv. KVM could pass *hyperv's* clock, but > > > L1 isn't notified when the data structure changes, so how the heck is > > > it supposed to update the kvmclock structure? > > > > I don't parse your question. > > Let me ask it more intelligently: when the "reenlightenment" IRQ > happens, what tells KVM to do its own update for its guests?Update of what, and why it needs to update anything from IRQ? The update i can think of is from host kernel clocksource, which there is a notifier for.
Possibly Parallel Threads
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support