Marcelo Tosatti
2018-Oct-11 22:27 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
On Tue, Oct 09, 2018 at 01:09:42PM -0700, Andy Lutomirski wrote:> On Tue, Oct 9, 2018 at 8:28 AM Marcelo Tosatti <mtosatti at redhat.com> wrote: > > > > On Mon, Oct 08, 2018 at 10:38:22AM -0700, Andy Lutomirski wrote: > > > On Mon, Oct 8, 2018 at 8:27 AM Marcelo Tosatti <mtosatti at redhat.com> wrote: > > > > I read the comment three more times and even dug through the git > > > history. It seems like what you're saying is that, under certain > > > conditions (which arguably would be bugs in the core Linux timing > > > code), > > > > I don't see that as a bug. Its just a side effect of reading two > > different clocks (one is CLOCK_MONOTONIC and the other is TSC), > > and using those two clocks to as a "base + offset". > > > > As the comment explains, if you do that, can't guarantee monotonicity. > > > > > actually calling ktime_get_boot_ns() could be non-monotonic > > > with respect to the kvmclock timing. But get_kvmclock_ns() isn't used > > > for VM timing as such -- it's used for the IOCTL interfaces for > > > updating the time offset. So can you explain how my patch is > > > incorrect? > > > > ktime_get_boot_ns() has frequency correction applied, while > > reading masterclock + TSC offset does not. > > > > So the clock reads differ. > > > > Ah, okay, I finally think I see what's going on. In the kvmclock data > exposed to the guest, tsc_shift and tsc_to_system_mul come from > tgt_tsc_khz, whereas master_kernel_ns and master_cycle_now come from > CLOCK_BOOTTIME. So the kvmclock and kernel clock drift apart at a > rate given by the frequency shift and then suddenly agree again every > time the pvclock data is updated.Yes.> Is there a reason to do it this way?Since pvclock updates which update system_timestamp are expensive (must stop all vcpus), they should be avoided. So only HW TSC counts, and used as offset against vcpu's tsc_timestamp.
Andy Lutomirski
2018-Oct-11 23:00 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
On Thu, Oct 11, 2018 at 3:28 PM Marcelo Tosatti <mtosatti at redhat.com> wrote:> > On Tue, Oct 09, 2018 at 01:09:42PM -0700, Andy Lutomirski wrote: > > On Tue, Oct 9, 2018 at 8:28 AM Marcelo Tosatti <mtosatti at redhat.com> wrote: > > > > > > On Mon, Oct 08, 2018 at 10:38:22AM -0700, Andy Lutomirski wrote: > > > > On Mon, Oct 8, 2018 at 8:27 AM Marcelo Tosatti <mtosatti at redhat.com> wrote: > > > > > > I read the comment three more times and even dug through the git > > > > history. It seems like what you're saying is that, under certain > > > > conditions (which arguably would be bugs in the core Linux timing > > > > code), > > > > > > I don't see that as a bug. Its just a side effect of reading two > > > different clocks (one is CLOCK_MONOTONIC and the other is TSC), > > > and using those two clocks to as a "base + offset". > > > > > > As the comment explains, if you do that, can't guarantee monotonicity. > > > > > > > actually calling ktime_get_boot_ns() could be non-monotonic > > > > with respect to the kvmclock timing. But get_kvmclock_ns() isn't used > > > > for VM timing as such -- it's used for the IOCTL interfaces for > > > > updating the time offset. So can you explain how my patch is > > > > incorrect? > > > > > > ktime_get_boot_ns() has frequency correction applied, while > > > reading masterclock + TSC offset does not. > > > > > > So the clock reads differ. > > > > > > > Ah, okay, I finally think I see what's going on. In the kvmclock data > > exposed to the guest, tsc_shift and tsc_to_system_mul come from > > tgt_tsc_khz, whereas master_kernel_ns and master_cycle_now come from > > CLOCK_BOOTTIME. So the kvmclock and kernel clock drift apart at a > > rate given by the frequency shift and then suddenly agree again every > > time the pvclock data is updated. > > Yes. > > > Is there a reason to do it this way? > > Since pvclock updates which update system_timestamp are expensive (must stop all vcpus), > they should be avoided. >Fair enough.> So only HW TSC countsmakes sense.>, and used as offset against vcpu's tsc_timestamp. >Why don't you just expose CLOCK_MONTONIC_RAW or CLOCK_MONOTONIC_RAW plus suspend time, though? Then you would actually be tracking a real kernel timekeeping mode, and you wouldn't need all this complicated offsetting work to avoid accidentally going backwards.
Marcelo Tosatti
2018-Oct-15 13:39 UTC
[patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
On Thu, Oct 11, 2018 at 04:00:29PM -0700, Andy Lutomirski wrote:> On Thu, Oct 11, 2018 at 3:28 PM Marcelo Tosatti <mtosatti at redhat.com> wrote: > > > > On Tue, Oct 09, 2018 at 01:09:42PM -0700, Andy Lutomirski wrote: > > > On Tue, Oct 9, 2018 at 8:28 AM Marcelo Tosatti <mtosatti at redhat.com> wrote: > > > > > > > > On Mon, Oct 08, 2018 at 10:38:22AM -0700, Andy Lutomirski wrote: > > > > > On Mon, Oct 8, 2018 at 8:27 AM Marcelo Tosatti <mtosatti at redhat.com> wrote: > > > > > > > > I read the comment three more times and even dug through the git > > > > > history. It seems like what you're saying is that, under certain > > > > > conditions (which arguably would be bugs in the core Linux timing > > > > > code), > > > > > > > > I don't see that as a bug. Its just a side effect of reading two > > > > different clocks (one is CLOCK_MONOTONIC and the other is TSC), > > > > and using those two clocks to as a "base + offset". > > > > > > > > As the comment explains, if you do that, can't guarantee monotonicity. > > > > > > > > > actually calling ktime_get_boot_ns() could be non-monotonic > > > > > with respect to the kvmclock timing. But get_kvmclock_ns() isn't used > > > > > for VM timing as such -- it's used for the IOCTL interfaces for > > > > > updating the time offset. So can you explain how my patch is > > > > > incorrect? > > > > > > > > ktime_get_boot_ns() has frequency correction applied, while > > > > reading masterclock + TSC offset does not. > > > > > > > > So the clock reads differ. > > > > > > > > > > Ah, okay, I finally think I see what's going on. In the kvmclock data > > > exposed to the guest, tsc_shift and tsc_to_system_mul come from > > > tgt_tsc_khz, whereas master_kernel_ns and master_cycle_now come from > > > CLOCK_BOOTTIME. So the kvmclock and kernel clock drift apart at a > > > rate given by the frequency shift and then suddenly agree again every > > > time the pvclock data is updated. > > > > Yes. > > > > > Is there a reason to do it this way? > > > > Since pvclock updates which update system_timestamp are expensive (must stop all vcpus), > > they should be avoided. > > > > Fair enough. > > > So only HW TSC counts > > makes sense. > > >, and used as offset against vcpu's tsc_timestamp. > > > > Why don't you just expose CLOCK_MONTONIC_RAW or CLOCK_MONOTONIC_RAW > plus suspend time, though? Then you would actually be tracking a real > kernel timekeeping mode, and you wouldn't need all this complicated > offsetting work to avoid accidentally going backwards.Can you outline how that would work ?
Reasonably Related Threads
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
- [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support