On 11/4/08 21:00, "Dave Winchell" <dwinchell@virtualiron.com> wrote:> I turned to the hpet as I became frustrated trying to solve the problem > in xen with pit. > One of the solutions proposed to the customer was a max(curr time, last > time) modification to Linux. > They didn''t want that. > (Keir, what Linux version are you looking at when you say Linux already > has this modification?)This is part of our own Xen-specific time patches for Linux.> I had tried hpet before to solve the time backwards problem and knew it > was effective. > But the accuracy of hpet was very poor. When I looked into the hpet I > was surprised that it was > based on tsc, as I was tring to get away from tsc. But note, even based > on tsc the time was not > going backwards, at least for this simple test case.Yes, having all the virtual timers based on ''guest TSC'' (which really is basically host TSC + an offset) is not great.> Its a fairly simple matter to base the hpet on the physical hpet. Its > easy to share it among guests > as no one really writes the physical hpet. Offsets are kept in each > vhpet such that each guest thinks > he owns the hpet.This is really no better than basing on Xen system time. Actually it''s worse since most systems don''t even expose the HPET, so we can''t probe it (without hacks) and so we can''t use it. Xen''s system time abstraction, perhaps with the max(last, curr) addition, is perfectly good enough.> This goes along with some of the experiences > Keir has had with drift, I think. I''m not sure why this happens - can > the hpet hardware be that poor in quality?It does appear to be, and I have no idea why.> There are three factors that give hpet its great accuracy, in my opinion. > 1) The hardware is very stable. > 2) There is only one of them in the system, not one per cpu. > 3) The Linux implementation for clock and hpet is very clean. It > calculates missed ticks and offsets without > including the interrupt delay.Encouraging the guest to use HPET makes sense. It''s a nice wide counter which hence does not have the wrap issues of the 16-bit PIT counters. Also in some cases the guest OS interface to the HPET is saner (for our purposes at least) than the equivalent code to interface to PIT/TSC. This doesn''t mean it has to be plumbed right down to the physical HPET. HVM time sources can be fixed for drift by moving them away from guest/host TSC and onto the Xen system time abstraction. -- Keir> Items 2 and 3 here are important factors in why the time stays > monotonic. Another reason is that > gettimeofday reads the hpet main counter for extrapolation, eliminating > extrapolation error since > the same counter is the sole determinator for the next interrupt time > stamp. Furthermore, Linux can take the > clock interrupt on any processor and the monotonicity is preserved > because of item 2. > > Thanks for reading this far!_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/4/08 22:20, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> Its a fairly simple matter to base the hpet on the physical hpet. Its >> easy to share it among guests >> as no one really writes the physical hpet. Offsets are kept in each >> vhpet such that each guest thinks >> he owns the hpet. > > This is really no better than basing on Xen system time. Actually it''s worse > since most systems don''t even expose the HPET, so we can''t probe it (without > hacks) and so we can''t use it. Xen''s system time abstraction, perhaps with the > max(last, curr) addition, is perfectly good enough.Just to labour the point some more: If you still believe that diving to the real platform timer on every guest time access is measurably more accurate, you can cleanly prove that by building on Xen''s system time abstraction, and then switch between using get_s_time() (aka NOW()) and read_platform_stime(). The latter calculates current system time by reading from the platform timer *right now*. It''s the function that all local CPUs calibrate to once per second. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
IMHO, this all comes down to how bad the tsc drift gets between calibrations. If this is agreed, let me propose the following: I see local_time_calibration() has some old printk''s. I propose re-enabling them (with some rate-limiting) to record to a log how bad the skew gets. Then we can request feedback from anyone running xen-unstable (and maybe xen-3.1-latest and xen-3.2-latest also) so we can "measure" a broad set of machines. One nice approach to rate-limit is to printk each time the value exceeds the next higher power of two. Even though this printk gets output for each processor, I''d think this would be low overhead and sufficient information for our needs. A nice touch would be to include a little script to run in domain0 that collects the cpuinfo, the relevant log lines, and emails (or UDPs?) them to a pre-set address (which is viewable via http). And maybe a once-per-log printk that says "please run this script". Even tied to a boot parameter, this would be better than the lack-of-information we have now. Dan> -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Friday, April 11, 2008 3:21 PM > To: Dave Winchell > Cc: dan.magenheimer@oracle.com; Ian Pratt; Tian, Kevin; > xen-devel@lists.xensource.com > Subject: Re: [xen-devel] System time monotonicity > > > On 11/4/08 21:00, "Dave Winchell" <dwinchell@virtualiron.com> wrote: > > > I turned to the hpet as I became frustrated trying to solve > the problem > > in xen with pit. > > One of the solutions proposed to the customer was a > max(curr time, last > > time) modification to Linux. > > They didn''t want that. > > (Keir, what Linux version are you looking at when you say > Linux already > > has this modification?) > > This is part of our own Xen-specific time patches for Linux. > > > I had tried hpet before to solve the time backwards problem > and knew it > > was effective. > > But the accuracy of hpet was very poor. When I looked into > the hpet I > > was surprised that it was > > based on tsc, as I was tring to get away from tsc. But > note, even based > > on tsc the time was not > > going backwards, at least for this simple test case. > > Yes, having all the virtual timers based on ''guest TSC'' > (which really is > basically host TSC + an offset) is not great. > > > Its a fairly simple matter to base the hpet on the physical > hpet. Its > > easy to share it among guests > > as no one really writes the physical hpet. Offsets are kept in each > > vhpet such that each guest thinks > > he owns the hpet. > > This is really no better than basing on Xen system time. > Actually it''s worse > since most systems don''t even expose the HPET, so we can''t > probe it (without > hacks) and so we can''t use it. Xen''s system time abstraction, > perhaps with > the max(last, curr) addition, is perfectly good enough. > > > This goes along with some of the experiences > > Keir has had with drift, I think. I''m not sure why this > happens - can > > the hpet hardware be that poor in quality? > > It does appear to be, and I have no idea why. > > > There are three factors that give hpet its great accuracy, > in my opinion. > > 1) The hardware is very stable. > > 2) There is only one of them in the system, not one per cpu. > > 3) The Linux implementation for clock and hpet is very clean. It > > calculates missed ticks and offsets without > > including the interrupt delay. > > Encouraging the guest to use HPET makes sense. It''s a nice > wide counter > which hence does not have the wrap issues of the 16-bit PIT > counters. Also > in some cases the guest OS interface to the HPET is saner > (for our purposes > at least) than the equivalent code to interface to PIT/TSC. > This doesn''t > mean it has to be plumbed right down to the physical HPET. > HVM time sources > can be fixed for drift by moving them away from guest/host > TSC and onto the > Xen system time abstraction. > > -- Keir > > > Items 2 and 3 here are important factors in why the time stays > > monotonic. Another reason is that > > gettimeofday reads the hpet main counter for extrapolation, > eliminating > > extrapolation error since > > the same counter is the sole determinator for the next > interrupt time > > stamp. Furthermore, Linux can take the > > clock interrupt on any processor and the monotonicity is preserved > > because of item 2. > > > > Thanks for reading this far! > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Keir, Your suggestion below is a good one. I''ll give it a try and let you know. (I thought most systems did expose an hpet, at least modern ones. All the systems I use expose it. My code defaults back to the tsc way of doing things when no hpet is detected.) Regards, Dave ________________________________ From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] Sent: Fri 4/11/2008 5:41 PM To: Dave Winchell Cc: dan.magenheimer@oracle.com; Ian Pratt; Tian, Kevin; xen-devel@lists.xensource.com Subject: Re: [xen-devel] System time monotonicity On 11/4/08 22:20, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> Its a fairly simple matter to base the hpet on the physical hpet. Its >> easy to share it among guests >> as no one really writes the physical hpet. Offsets are kept in each >> vhpet such that each guest thinks >> he owns the hpet. > > This is really no better than basing on Xen system time. Actually it''s worse > since most systems don''t even expose the HPET, so we can''t probe it (without > hacks) and so we can''t use it. Xen''s system time abstraction, perhaps with the > max(last, curr) addition, is perfectly good enough.Just to labour the point some more: If you still believe that diving to the real platform timer on every guest time access is measurably more accurate, you can cleanly prove that by building on Xen''s system time abstraction, and then switch between using get_s_time() (aka NOW()) and read_platform_stime(). The latter calculates current system time by reading from the platform timer *right now*. It''s the function that all local CPUs calibrate to once per second. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/4/08 23:58, "Dave Winchell" <dwinchell@virtualiron.com> wrote:> Your suggestion below is a good one. I''ll give it a try and let you know. > > (I thought most systems did expose an hpet, at least modern ones. > All the systems I use expose it. > My code defaults back to the tsc way of doing things when no hpet is > detected.)It used to be deliberately not exposed because Windows had problems using it in some cases, iirc. If it¹s no longer getting hidden in newer systems then that¹s a good thing. Anyhow, yes, please do switch over to system time. I wouldn¹t take a HPET-vs-TSC patch anyway. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Dave Winchell [mailto:dwinchell@virtualiron.com] > Dan asked that I measure the cost of accessing the hpet. My > first set of > measurements indicated that > for 99.8% of the samples taken the cost is less than 1/2 usec, or 500 > cycles on my machine. > This is about the cost of a single vmexit, for reference. The cost > includes the cost > of taking a spinlock. This cost also includes the overhead of > one or two > rdtsc instructions for the measurement. > I''m still working with Dan on these measurements as he sees (much) > higher costs measured from Linux.FYI, I did a more precise measurement of reading hpet (and pit) on my machine by modifying the kernel and recording rdtscll differences around the platform timer reads (as well as the entire function calls which compares better against my prior measurements using systemtap). For HPET reads, about 86% of the (100000) samples were between 4K and 8K cycles and about 13% were between 16K and 32K cycles. For PIT reads, all reads are between 64K and 128K cycles. (I also measured the rdtscll calls by putting two calls back-to-back... 90% of them were between 128-255 cycles and 10% between 64-127 cycles.) Since my system is 3.0 Ghz, HPET read averages on the order of 3 usec, which is what my previous measure showed. I suspect that Dave''s measurements and mine are "both right" and that read overhead of HPET varies on different systems.> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > It used to be deliberately not exposed because Windows had problems > using it in some cases, iirc. If it''s no longer getting hidden in > newer systems then that''s a good thing.On my box (and several others at Oracle), HPET is present but disabled by default in the BIOS. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 21/4/08 20:26, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> Since my system is 3.0 Ghz, HPET read averages on the order of > 3 usec, which is what my previous measure showed. > > I suspect that Dave''s measurements and mine are "both right" > and that read overhead of HPET varies on different systems.That would make sense. HPET accesses have to go all the way to the southbridge, and that''s bound to vary a lot between chipsets let alone across the very different interconnect topologies of AMD vs Intel. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir, In my work on layering hpet on get_s_time, I found a problem in get_s_time and related code. Because of the problem I was getting large jumps in the offset between local time and ntp time. These jumps were on the order of many seconds. The issue is the race between local_time_calibration() executing on one processor and platform_time_calibration() on another. I have included a patch which addresses the race in local_time_calibration(), cpu_frequency_change(), and init_percpu_time(). I''m giving you this ahead of the hpet work as it affects all users of get_s_time(). I''m confident of the fix in local_time_calibration() as I had failures there before the fix and no failures after. The other two I''m less confident in, so check my work closely there. On the hpet over get_s_time() front, this fix allows me to get .0014% error. This is very close to the error going to the hardware hpet each time. Regards, Dave _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
What''s the race? All accesses to platform time are already done under the platform_timer_lock as far as I can see. -- Keir On 21/4/08 21:32, "Dave Winchell" <dwinchell@virtualiron.com> wrote:> Keir, > > In my work on layering hpet on get_s_time, I found a problem > in get_s_time and related code. Because of the problem I was getting > large jumps in the offset between local time and ntp time. > These jumps were on the order of many seconds. > > The issue is the race between local_time_calibration() executing > on one processor and platform_time_calibration() on another. > > I have included a patch which addresses the race in > local_time_calibration(), cpu_frequency_change(), and > init_percpu_time(). > > I''m giving you this ahead of the hpet work as it affects all users > of get_s_time(). > > I''m confident of the fix in local_time_calibration() as I had failures there > before the fix and no failures after. The other two I''m less confident > in, so check > my work closely there. > > On the hpet over get_s_time() front, this fix allows me to get .0014% error. > This is very close to the error going to the hardware hpet each time. > > Regards, > Dave > > > diff -r a38a41de0800 xen/arch/x86/time.c > --- a/xen/arch/x86/time.c Wed Apr 16 16:42:47 2008 +0100 > +++ b/xen/arch/x86/time.c Mon Apr 21 15:19:37 2008 -0400 > @@ -530,6 +530,16 @@ static s_time_t read_platform_stime(void > > return stime; > } > +static s_time_t read_platform_stime_locked(void) > +{ > + u64 count; > + s_time_t stime; > + > + count = plt_count64 + ((plt_src.read_counter() - plt_count) & plt_mask); > + stime = __read_platform_stime(count); > + > + return stime; > +} > > static void platform_time_calibration(void) > { > @@ -749,6 +759,7 @@ int cpu_frequency_change(u64 freq) > { > struct cpu_time *t = &this_cpu(cpu_time); > u64 curr_tsc; > + unsigned long flags; > > /* Sanity check: CPU frequency allegedly dropping below 1MHz? */ > if ( freq < 1000000u ) > @@ -758,15 +769,15 @@ int cpu_frequency_change(u64 freq) > return -EINVAL; > } > > - local_irq_disable(); > + spin_lock_irqsave(&platform_timer_lock, flags); > rdtscll(curr_tsc); > t->local_tsc_stamp = curr_tsc; > - t->stime_master_stamp = read_platform_stime(); > + t->stime_master_stamp = read_platform_stime_locked(); > /* TSC-extrapolated time may be bogus after frequency change. */ > /*t->stime_local_stamp = get_s_time();*/ > t->stime_local_stamp = t->stime_master_stamp; > set_time_scale(&t->tsc_scale, freq); > - local_irq_enable(); > + spin_unlock_irqrestore(&platform_timer_lock, flags); > > /* A full epoch should pass before we check for deviation. */ > set_timer(&t->calibration_timer, NOW() + EPOCH); > @@ -830,16 +841,18 @@ static void local_time_calibration(void > /* The overall calibration scale multiplier. */ > u32 calibration_mul_frac; > > + unsigned long flags; > + > prev_tsc = t->local_tsc_stamp; > prev_local_stime = t->stime_local_stamp; > prev_master_stime = t->stime_master_stamp; > > /* Disable IRQs to get ''instantaneous'' current timestamps. */ > - local_irq_disable(); > + spin_lock_irqsave(&platform_timer_lock, flags); > rdtscll(curr_tsc); > curr_local_stime = get_s_time(); > - curr_master_stime = read_platform_stime(); > - local_irq_enable(); > + curr_master_stime = read_platform_stime_locked(); > + spin_unlock_irqrestore(&platform_timer_lock, flags); > > #if 0 > printk("PRE%d: tsc=%"PRIu64" stime=%"PRIu64" master=%"PRIu64"\n", > @@ -944,10 +957,10 @@ void init_percpu_time(void) > unsigned long flags; > s_time_t now; > > - local_irq_save(flags); > + spin_lock_irqsave(&platform_timer_lock, flags); > rdtscll(t->local_tsc_stamp); > - now = !plt_src.read_counter ? 0 : read_platform_stime(); > - local_irq_restore(flags); > + now = !plt_src.read_counter ? 0 : read_platform_stime_locked(); > + spin_unlock_irqrestore(&platform_timer_lock, flags); > > t->stime_master_stamp = now; > t->stime_local_stamp = now;_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Is the issue that there is a variable delay in waiting to acquire the platform_timer_lock? If so, you can fix that by, for example, making read_platform_stime() the *first* thing you do inside the local_irq_disable/enable() block in local-time_calibration(). i.e., *before* doing the rdtsc() and get_s_time(). This avoids a variable delay between the TSC-based time estimates and the platform-timer-based time estimate. If that fixes your issue I''ll apply that in preference. -- Keir On 21/4/08 23:55, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> What''s the race? All accesses to platform time are already done under the > platform_timer_lock as far as I can see. > > -- Keir > > On 21/4/08 21:32, "Dave Winchell" <dwinchell@virtualiron.com> wrote: > >> Keir, >> >> In my work on layering hpet on get_s_time, I found a problem >> in get_s_time and related code. Because of the problem I was getting >> large jumps in the offset between local time and ntp time. >> These jumps were on the order of many seconds. >> >> The issue is the race between local_time_calibration() executing >> on one processor and platform_time_calibration() on another. >> >> I have included a patch which addresses the race in >> local_time_calibration(), cpu_frequency_change(), and >> init_percpu_time(). >> >> I''m giving you this ahead of the hpet work as it affects all users >> of get_s_time(). >> >> I''m confident of the fix in local_time_calibration() as I had failures there >> before the fix and no failures after. The other two I''m less confident >> in, so check >> my work closely there. >> >> On the hpet over get_s_time() front, this fix allows me to get .0014% error. >> This is very close to the error going to the hardware hpet each time. >> >> Regards, >> Dave >> >> >> diff -r a38a41de0800 xen/arch/x86/time.c >> --- a/xen/arch/x86/time.c Wed Apr 16 16:42:47 2008 +0100 >> +++ b/xen/arch/x86/time.c Mon Apr 21 15:19:37 2008 -0400 >> @@ -530,6 +530,16 @@ static s_time_t read_platform_stime(void >> >> return stime; >> } >> +static s_time_t read_platform_stime_locked(void) >> +{ >> + u64 count; >> + s_time_t stime; >> + >> + count = plt_count64 + ((plt_src.read_counter() - plt_count) & plt_mask); >> + stime = __read_platform_stime(count); >> + >> + return stime; >> +} >> >> static void platform_time_calibration(void) >> { >> @@ -749,6 +759,7 @@ int cpu_frequency_change(u64 freq) >> { >> struct cpu_time *t = &this_cpu(cpu_time); >> u64 curr_tsc; >> + unsigned long flags; >> >> /* Sanity check: CPU frequency allegedly dropping below 1MHz? */ >> if ( freq < 1000000u ) >> @@ -758,15 +769,15 @@ int cpu_frequency_change(u64 freq) >> return -EINVAL; >> } >> >> - local_irq_disable(); >> + spin_lock_irqsave(&platform_timer_lock, flags); >> rdtscll(curr_tsc); >> t->local_tsc_stamp = curr_tsc; >> - t->stime_master_stamp = read_platform_stime(); >> + t->stime_master_stamp = read_platform_stime_locked(); >> /* TSC-extrapolated time may be bogus after frequency change. */ >> /*t->stime_local_stamp = get_s_time();*/ >> t->stime_local_stamp = t->stime_master_stamp; >> set_time_scale(&t->tsc_scale, freq); >> - local_irq_enable(); >> + spin_unlock_irqrestore(&platform_timer_lock, flags); >> >> /* A full epoch should pass before we check for deviation. */ >> set_timer(&t->calibration_timer, NOW() + EPOCH); >> @@ -830,16 +841,18 @@ static void local_time_calibration(void >> /* The overall calibration scale multiplier. */ >> u32 calibration_mul_frac; >> >> + unsigned long flags; >> + >> prev_tsc = t->local_tsc_stamp; >> prev_local_stime = t->stime_local_stamp; >> prev_master_stime = t->stime_master_stamp; >> >> /* Disable IRQs to get ''instantaneous'' current timestamps. */ >> - local_irq_disable(); >> + spin_lock_irqsave(&platform_timer_lock, flags); >> rdtscll(curr_tsc); >> curr_local_stime = get_s_time(); >> - curr_master_stime = read_platform_stime(); >> - local_irq_enable(); >> + curr_master_stime = read_platform_stime_locked(); >> + spin_unlock_irqrestore(&platform_timer_lock, flags); >> >> #if 0 >> printk("PRE%d: tsc=%"PRIu64" stime=%"PRIu64" master=%"PRIu64"\n", >> @@ -944,10 +957,10 @@ void init_percpu_time(void) >> unsigned long flags; >> s_time_t now; >> >> - local_irq_save(flags); >> + spin_lock_irqsave(&platform_timer_lock, flags); >> rdtscll(t->local_tsc_stamp); >> - now = !plt_src.read_counter ? 0 : read_platform_stime(); >> - local_irq_restore(flags); >> + now = !plt_src.read_counter ? 0 : read_platform_stime_locked(); >> + spin_unlock_irqrestore(&platform_timer_lock, flags); >> >> t->stime_master_stamp = now; >> t->stime_local_stamp = now; > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir, Well, I''m glad you didn''t take the patch as my overnight test was not successful. I''m afraid that I was overly confident. I''ll re-submit when I get to the bottom of this. thanks, Dave Keir Fraser wrote:>Is the issue that there is a variable delay in waiting to acquire the >platform_timer_lock? If so, you can fix that by, for example, making >read_platform_stime() the *first* thing you do inside the >local_irq_disable/enable() block in local-time_calibration(). i.e., *before* >doing the rdtsc() and get_s_time(). This avoids a variable delay between the >TSC-based time estimates and the platform-timer-based time estimate. > >If that fixes your issue I''ll apply that in preference. > > -- Keir > >On 21/4/08 23:55, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > > > >>What''s the race? All accesses to platform time are already done under the >>platform_timer_lock as far as I can see. >> >> -- Keir >> >>On 21/4/08 21:32, "Dave Winchell" <dwinchell@virtualiron.com> wrote: >> >> >> >>>Keir, >>> >>>In my work on layering hpet on get_s_time, I found a problem >>>in get_s_time and related code. Because of the problem I was getting >>>large jumps in the offset between local time and ntp time. >>>These jumps were on the order of many seconds. >>> >>>The issue is the race between local_time_calibration() executing >>>on one processor and platform_time_calibration() on another. >>> >>>I have included a patch which addresses the race in >>>local_time_calibration(), cpu_frequency_change(), and >>>init_percpu_time(). >>> >>>I''m giving you this ahead of the hpet work as it affects all users >>>of get_s_time(). >>> >>>I''m confident of the fix in local_time_calibration() as I had failures there >>>before the fix and no failures after. The other two I''m less confident >>>in, so check >>>my work closely there. >>> >>>On the hpet over get_s_time() front, this fix allows me to get .0014% error. >>>This is very close to the error going to the hardware hpet each time. >>> >>>Regards, >>>Dave >>> >>> >>>diff -r a38a41de0800 xen/arch/x86/time.c >>>--- a/xen/arch/x86/time.c Wed Apr 16 16:42:47 2008 +0100 >>>+++ b/xen/arch/x86/time.c Mon Apr 21 15:19:37 2008 -0400 >>>@@ -530,6 +530,16 @@ static s_time_t read_platform_stime(void >>> >>> return stime; >>> } >>>+static s_time_t read_platform_stime_locked(void) >>>+{ >>>+ u64 count; >>>+ s_time_t stime; >>>+ >>>+ count = plt_count64 + ((plt_src.read_counter() - plt_count) & plt_mask); >>>+ stime = __read_platform_stime(count); >>>+ >>>+ return stime; >>>+} >>> >>> static void platform_time_calibration(void) >>> { >>>@@ -749,6 +759,7 @@ int cpu_frequency_change(u64 freq) >>> { >>> struct cpu_time *t = &this_cpu(cpu_time); >>> u64 curr_tsc; >>>+ unsigned long flags; >>> >>> /* Sanity check: CPU frequency allegedly dropping below 1MHz? */ >>> if ( freq < 1000000u ) >>>@@ -758,15 +769,15 @@ int cpu_frequency_change(u64 freq) >>> return -EINVAL; >>> } >>> >>>- local_irq_disable(); >>>+ spin_lock_irqsave(&platform_timer_lock, flags); >>> rdtscll(curr_tsc); >>> t->local_tsc_stamp = curr_tsc; >>>- t->stime_master_stamp = read_platform_stime(); >>>+ t->stime_master_stamp = read_platform_stime_locked(); >>> /* TSC-extrapolated time may be bogus after frequency change. */ >>> /*t->stime_local_stamp = get_s_time();*/ >>> t->stime_local_stamp = t->stime_master_stamp; >>> set_time_scale(&t->tsc_scale, freq); >>>- local_irq_enable(); >>>+ spin_unlock_irqrestore(&platform_timer_lock, flags); >>> >>> /* A full epoch should pass before we check for deviation. */ >>> set_timer(&t->calibration_timer, NOW() + EPOCH); >>>@@ -830,16 +841,18 @@ static void local_time_calibration(void >>> /* The overall calibration scale multiplier. */ >>> u32 calibration_mul_frac; >>> >>>+ unsigned long flags; >>>+ >>> prev_tsc = t->local_tsc_stamp; >>> prev_local_stime = t->stime_local_stamp; >>> prev_master_stime = t->stime_master_stamp; >>> >>> /* Disable IRQs to get ''instantaneous'' current timestamps. */ >>>- local_irq_disable(); >>>+ spin_lock_irqsave(&platform_timer_lock, flags); >>> rdtscll(curr_tsc); >>> curr_local_stime = get_s_time(); >>>- curr_master_stime = read_platform_stime(); >>>- local_irq_enable(); >>>+ curr_master_stime = read_platform_stime_locked(); >>>+ spin_unlock_irqrestore(&platform_timer_lock, flags); >>> >>> #if 0 >>> printk("PRE%d: tsc=%"PRIu64" stime=%"PRIu64" master=%"PRIu64"\n", >>>@@ -944,10 +957,10 @@ void init_percpu_time(void) >>> unsigned long flags; >>> s_time_t now; >>> >>>- local_irq_save(flags); >>>+ spin_lock_irqsave(&platform_timer_lock, flags); >>> rdtscll(t->local_tsc_stamp); >>>- now = !plt_src.read_counter ? 0 : read_platform_stime(); >>>- local_irq_restore(flags); >>>+ now = !plt_src.read_counter ? 0 : read_platform_stime_locked(); >>>+ spin_unlock_irqrestore(&platform_timer_lock, flags); >>> >>> t->stime_master_stamp = now; >>> t->stime_local_stamp = now; >>> >>> >> >>_______________________________________________ >>Xen-devel mailing list >>Xen-devel@lists.xensource.com >>http://lists.xensource.com/xen-devel >> >> > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:>Is the issue that there is a variable delay in waiting to acquire the >platform_timer_lock? If so, you can fix that by, for example, making >read_platform_stime() the *first* thing you do inside the >local_irq_disable/enable() block in local-time_calibration(). i.e., *before* >doing the rdtsc() and get_s_time(). This avoids a variable delay between the >TSC-based time estimates and the platform-timer-based time estimate. > >yes, this is the issue. What you suggest should be fine and I am trying it now. With the locking version (and a fix to a bug I introduced) I got .0012% error on an overnight run with hpet layered on get_s_time_mono(), which is the max(prev, cur) layer on get_s_time we discussed. -Dave>If that fixes your issue I''ll apply that in preference. > > -- Keir > >On 21/4/08 23:55, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > > > >>What''s the race? All accesses to platform time are already done under the >>platform_timer_lock as far as I can see. >> >> -- Keir >> >>On 21/4/08 21:32, "Dave Winchell" <dwinchell@virtualiron.com> wrote: >> >> >> >>>Keir, >>> >>>In my work on layering hpet on get_s_time, I found a problem >>>in get_s_time and related code. Because of the problem I was getting >>>large jumps in the offset between local time and ntp time. >>>These jumps were on the order of many seconds. >>> >>>The issue is the race between local_time_calibration() executing >>>on one processor and platform_time_calibration() on another. >>> >>>I have included a patch which addresses the race in >>>local_time_calibration(), cpu_frequency_change(), and >>>init_percpu_time(). >>> >>>I''m giving you this ahead of the hpet work as it affects all users >>>of get_s_time(). >>> >>>I''m confident of the fix in local_time_calibration() as I had failures there >>>before the fix and no failures after. The other two I''m less confident >>>in, so check >>>my work closely there. >>> >>>On the hpet over get_s_time() front, this fix allows me to get .0014% error. >>>This is very close to the error going to the hardware hpet each time. >>> >>>Regards, >>>Dave >>> >>> >>>diff -r a38a41de0800 xen/arch/x86/time.c >>>--- a/xen/arch/x86/time.c Wed Apr 16 16:42:47 2008 +0100 >>>+++ b/xen/arch/x86/time.c Mon Apr 21 15:19:37 2008 -0400 >>>@@ -530,6 +530,16 @@ static s_time_t read_platform_stime(void >>> >>> return stime; >>> } >>>+static s_time_t read_platform_stime_locked(void) >>>+{ >>>+ u64 count; >>>+ s_time_t stime; >>>+ >>>+ count = plt_count64 + ((plt_src.read_counter() - plt_count) & plt_mask); >>>+ stime = __read_platform_stime(count); >>>+ >>>+ return stime; >>>+} >>> >>> static void platform_time_calibration(void) >>> { >>>@@ -749,6 +759,7 @@ int cpu_frequency_change(u64 freq) >>> { >>> struct cpu_time *t = &this_cpu(cpu_time); >>> u64 curr_tsc; >>>+ unsigned long flags; >>> >>> /* Sanity check: CPU frequency allegedly dropping below 1MHz? */ >>> if ( freq < 1000000u ) >>>@@ -758,15 +769,15 @@ int cpu_frequency_change(u64 freq) >>> return -EINVAL; >>> } >>> >>>- local_irq_disable(); >>>+ spin_lock_irqsave(&platform_timer_lock, flags); >>> rdtscll(curr_tsc); >>> t->local_tsc_stamp = curr_tsc; >>>- t->stime_master_stamp = read_platform_stime(); >>>+ t->stime_master_stamp = read_platform_stime_locked(); >>> /* TSC-extrapolated time may be bogus after frequency change. */ >>> /*t->stime_local_stamp = get_s_time();*/ >>> t->stime_local_stamp = t->stime_master_stamp; >>> set_time_scale(&t->tsc_scale, freq); >>>- local_irq_enable(); >>>+ spin_unlock_irqrestore(&platform_timer_lock, flags); >>> >>> /* A full epoch should pass before we check for deviation. */ >>> set_timer(&t->calibration_timer, NOW() + EPOCH); >>>@@ -830,16 +841,18 @@ static void local_time_calibration(void >>> /* The overall calibration scale multiplier. */ >>> u32 calibration_mul_frac; >>> >>>+ unsigned long flags; >>>+ >>> prev_tsc = t->local_tsc_stamp; >>> prev_local_stime = t->stime_local_stamp; >>> prev_master_stime = t->stime_master_stamp; >>> >>> /* Disable IRQs to get ''instantaneous'' current timestamps. */ >>>- local_irq_disable(); >>>+ spin_lock_irqsave(&platform_timer_lock, flags); >>> rdtscll(curr_tsc); >>> curr_local_stime = get_s_time(); >>>- curr_master_stime = read_platform_stime(); >>>- local_irq_enable(); >>>+ curr_master_stime = read_platform_stime_locked(); >>>+ spin_unlock_irqrestore(&platform_timer_lock, flags); >>> >>> #if 0 >>> printk("PRE%d: tsc=%"PRIu64" stime=%"PRIu64" master=%"PRIu64"\n", >>>@@ -944,10 +957,10 @@ void init_percpu_time(void) >>> unsigned long flags; >>> s_time_t now; >>> >>>- local_irq_save(flags); >>>+ spin_lock_irqsave(&platform_timer_lock, flags); >>> rdtscll(t->local_tsc_stamp); >>>- now = !plt_src.read_counter ? 0 : read_platform_stime(); >>>- local_irq_restore(flags); >>>+ now = !plt_src.read_counter ? 0 : read_platform_stime_locked(); >>>+ spin_unlock_irqrestore(&platform_timer_lock, flags); >>> >>> t->stime_master_stamp = now; >>> t->stime_local_stamp = now; >>> >>> >> >>_______________________________________________ >>Xen-devel mailing list >>Xen-devel@lists.xensource.com >>http://lists.xensource.com/xen-devel >> >> > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 24/4/08 17:04, "Dave Winchell" <dwinchell@virtualiron.com> wrote:> yes, this is the issue. What you suggest should be fine and I am trying > it now. > With the locking version (and a fix to a bug I introduced) I got .0012% > error > on an overnight run with hpet layered on get_s_time_mono(), which is the > max(prev, cur) layer on get_s_time we discussed.12 parts per million is pretty good. Is that cumulative deviation from ''wall time'' over ~12 hours? That could easily be explained by the fact that Xen system time is not sync''ed with ntp. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:>On 24/4/08 17:04, "Dave Winchell" <dwinchell@virtualiron.com> wrote: > > > >>yes, this is the issue. What you suggest should be fine and I am trying >>it now. >>With the locking version (and a fix to a bug I introduced) I got .0012% >>error >>on an overnight run with hpet layered on get_s_time_mono(), which is the >>max(prev, cur) layer on get_s_time we discussed. >> >> > >12 parts per million is pretty good. Is that cumulative deviation from ''wall >time'' over ~12 hours? >yes, deviation between the guest''s time and an ntp reference.> That could easily be explained by the fact that Xen >system time is not sync''ed with ntp. > >That''s true. And, as we have discussed, this error seems to vary quite a bit platform to platform for some reason. I will verify that this still is the case. -Dave> -- Keir > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir, Last nights run had the error in the 12 ppm range. Here is the change we have been talking about. -Dave Dave Winchell wrote:> Keir Fraser wrote: > >> On 24/4/08 17:04, "Dave Winchell" <dwinchell@virtualiron.com> wrote: >> >> >> >>> yes, this is the issue. What you suggest should be fine and I am trying >>> it now. >>> With the locking version (and a fix to a bug I introduced) I got .0012% >>> error >>> on an overnight run with hpet layered on get_s_time_mono(), which is >>> the >>> max(prev, cur) layer on get_s_time we discussed. >>> >> >> >> 12 parts per million is pretty good. Is that cumulative deviation >> from ''wall >> time'' over ~12 hours? >> > yes, deviation between the guest''s time and an ntp reference. > >> That could easily be explained by the fact that Xen >> system time is not sync''ed with ntp. >> >> > That''s true. And, as we have discussed, this error seems to vary quite > a bit > platform to platform for some reason. I will verify that this still is > the case. > > -Dave > >> -- Keir >> >> >> >> >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Dave -- Are you ready to release the guest-virtual-platform-timer on xen-system-time patch yet? If so, we''d be happy to give it some testing. Thanks, Dan> -----Original Message----- > From: Dave Winchell [mailto:dwinchell@virtualiron.com] > Sent: Friday, April 25, 2008 1:48 PM > To: Dave Winchell > Cc: Keir Fraser; Tian, Kevin; dan.magenheimer@oracle.com; > xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell > Subject: Re: [Xen-devel] Re: Fix for get_s_time() > > > Keir, > > Last nights run had the error in the 12 ppm range. > Here is the change we have been talking about. > > -Dave > > Dave Winchell wrote: > > > Keir Fraser wrote: > > > >> On 24/4/08 17:04, "Dave Winchell" > <dwinchell@virtualiron.com> wrote: > >> > >> > >> > >>> yes, this is the issue. What you suggest should be fine > and I am trying > >>> it now. > >>> With the locking version (and a fix to a bug I > introduced) I got .0012% > >>> error > >>> on an overnight run with hpet layered on > get_s_time_mono(), which is > >>> the > >>> max(prev, cur) layer on get_s_time we discussed. > >>> > >> > >> > >> 12 parts per million is pretty good. Is that cumulative deviation > >> from ''wall > >> time'' over ~12 hours? > >> > > yes, deviation between the guest''s time and an ntp reference. > > > >> That could easily be explained by the fact that Xen > >> system time is not sync''ed with ntp. > >> > >> > > That''s true. And, as we have discussed, this error seems to > vary quite > > a bit > > platform to platform for some reason. I will verify that > this still is > > the case. > > > > -Dave > > > >> -- Keir > >> > >> > >> > >> > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Dan, I just need to remove some debug and merge with unstable. I should be able to send you a patch Monday or Tuesday. You know, its more like hpet on system time. Thanks for the testing offer. Regards, Dave -----Original Message----- From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] Sent: Fri 4/25/2008 5:03 PM To: Dave Winchell Cc: Keir Fraser; Tian, Kevin; xen-devel@lists.xensource.com; Ian Pratt Subject: RE: [Xen-devel] Re: Fix for get_s_time() Hi Dave -- Are you ready to release the guest-virtual-platform-timer on xen-system-time patch yet? If so, we''d be happy to give it some testing. Thanks, Dan> -----Original Message----- > From: Dave Winchell [mailto:dwinchell@virtualiron.com] > Sent: Friday, April 25, 2008 1:48 PM > To: Dave Winchell > Cc: Keir Fraser; Tian, Kevin; dan.magenheimer@oracle.com; > xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell > Subject: Re: [Xen-devel] Re: Fix for get_s_time() > > > Keir, > > Last nights run had the error in the 12 ppm range. > Here is the change we have been talking about. > > -Dave > > Dave Winchell wrote: > > > Keir Fraser wrote: > > > >> On 24/4/08 17:04, "Dave Winchell" > <dwinchell@virtualiron.com> wrote: > >> > >> > >> > >>> yes, this is the issue. What you suggest should be fine > and I am trying > >>> it now. > >>> With the locking version (and a fix to a bug I > introduced) I got .0012% > >>> error > >>> on an overnight run with hpet layered on > get_s_time_mono(), which is > >>> the > >>> max(prev, cur) layer on get_s_time we discussed. > >>> > >> > >> > >> 12 parts per million is pretty good. Is that cumulative deviation > >> from ''wall > >> time'' over ~12 hours? > >> > > yes, deviation between the guest''s time and an ntp reference. > > > >> That could easily be explained by the fact that Xen > >> system time is not sync''ed with ntp. > >> > >> > > That''s true. And, as we have discussed, this error seems to > vary quite > > a bit > > platform to platform for some reason. I will verify that > this still is > > the case. > > > > -Dave > > > >> -- Keir > >> > >> > >> > >> > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
RE: [Xen-devel] Re: Fix for get_s_time()Hi Dave --> You know, its more like hpet on system time.I wonder how much of the problems we observed with skew on pit was due to the pit-on-tsc "bug"... in other words, should the virtual pit be based on system time also? Dan -----Original Message----- From: Dave Winchell [mailto:dwinchell@virtualiron.com] Sent: Friday, April 25, 2008 7:54 PM To: dan.magenheimer@oracle.com Cc: Keir Fraser; Tian, Kevin; xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell Subject: RE: [Xen-devel] Re: Fix for get_s_time() Hi Dan, I just need to remove some debug and merge with unstable. I should be able to send you a patch Monday or Tuesday. You know, its more like hpet on system time. Thanks for the testing offer. Regards, Dave -----Original Message----- From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] Sent: Fri 4/25/2008 5:03 PM To: Dave Winchell Cc: Keir Fraser; Tian, Kevin; xen-devel@lists.xensource.com; Ian Pratt Subject: RE: [Xen-devel] Re: Fix for get_s_time() Hi Dave -- Are you ready to release the guest-virtual-platform-timer on xen-system-time patch yet? If so, we''d be happy to give it some testing. Thanks, Dan > -----Original Message----- > From: Dave Winchell [mailto:dwinchell@virtualiron.com] > Sent: Friday, April 25, 2008 1:48 PM > To: Dave Winchell > Cc: Keir Fraser; Tian, Kevin; dan.magenheimer@oracle.com; > xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell > Subject: Re: [Xen-devel] Re: Fix for get_s_time() > > > Keir, > > Last nights run had the error in the 12 ppm range. > Here is the change we have been talking about. > > -Dave > > Dave Winchell wrote: > > > Keir Fraser wrote: > > > >> On 24/4/08 17:04, "Dave Winchell" > <dwinchell@virtualiron.com> wrote: > >> > >> > >> > >>> yes, this is the issue. What you suggest should be fine > and I am trying > >>> it now. > >>> With the locking version (and a fix to a bug I > introduced) I got .0012% > >>> error > >>> on an overnight run with hpet layered on > get_s_time_mono(), which is > >>> the > >>> max(prev, cur) layer on get_s_time we discussed. > >>> > >> > >> > >> 12 parts per million is pretty good. Is that cumulative deviation > >> from ''wall > >> time'' over ~12 hours? > >> > > yes, deviation between the guest''s time and an ntp reference. > > > >> That could easily be explained by the fact that Xen > >> system time is not sync''ed with ntp. > >> > >> > > That''s true. And, as we have discussed, this error seems to > vary quite > > a bit > > platform to platform for some reason. I will verify that > this still is > > the case. > > > > -Dave > > > >> -- Keir > >> > >> > >> > >> > > > > > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer wrote:> Hi Dave -- > >> You know, its more like hpet on system time. > > I wonder how much of the problems we observed with skew on pit was due to > the pit-on-tsc "bug"... in other words, should the virtual pit be based on > system time also?For guests that compute missed ticks, it may not help. That''s because here the guests are using tsc in their computations of offset and last interrupt time stamp. Also, there is the esoteric use of delay in the computations for pit. With hpet, on the other hand, the guests don''t read the tsc and don''t use delay - they only rely on the hpet main counter. It might improve accuracy for a guest that does not compute missed ticks. But you would still have the time going backwards issue, unless you patched the guest. Most of the hpet accuracy we see is due to clean and correct algorithms in the guest, in my opinion. Of course we have to do the right things in emulating the hpet in xen. -Dave> > Dan > > -----Original Message----- > *From:* Dave Winchell [mailto:dwinchell@virtualiron.com] > *Sent:* Friday, April 25, 2008 7:54 PM > *To:* dan.magenheimer@oracle.com > *Cc:* Keir Fraser; Tian, Kevin; xen-devel@lists.xensource.com; Ian > Pratt; Dave Winchell > *Subject:* RE: [Xen-devel] Re: Fix for get_s_time() > > Hi Dan, > > I just need to remove some debug and merge with unstable. > I should be able to send you a patch Monday or Tuesday. > You know, its more like hpet on system time. > Thanks for the testing offer. > > Regards, > Dave > > > -----Original Message----- > From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] > Sent: Fri 4/25/2008 5:03 PM > To: Dave Winchell > Cc: Keir Fraser; Tian, Kevin; xen-devel@lists.xensource.com; Ian Pratt > Subject: RE: [Xen-devel] Re: Fix for get_s_time() > > Hi Dave -- > > Are you ready to release the guest-virtual-platform-timer > on xen-system-time patch yet? If so, we''d be happy to > give it some testing. > > Thanks, > Dan > > > -----Original Message----- > > From: Dave Winchell [mailto:dwinchell@virtualiron.com] > > Sent: Friday, April 25, 2008 1:48 PM > > To: Dave Winchell > > Cc: Keir Fraser; Tian, Kevin; dan.magenheimer@oracle.com; > > xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell > > Subject: Re: [Xen-devel] Re: Fix for get_s_time() > > > > > > Keir, > > > > Last nights run had the error in the 12 ppm range. > > Here is the change we have been talking about. > > > > -Dave > > > > Dave Winchell wrote: > > > > > Keir Fraser wrote: > > > > > >> On 24/4/08 17:04, "Dave Winchell" > > <dwinchell@virtualiron.com> wrote: > > >> > > >> > > >> > > >>> yes, this is the issue. What you suggest should be fine > > and I am trying > > >>> it now. > > >>> With the locking version (and a fix to a bug I > > introduced) I got .0012% > > >>> error > > >>> on an overnight run with hpet layered on > > get_s_time_mono(), which is > > >>> the > > >>> max(prev, cur) layer on get_s_time we discussed. > > >>> > > >> > > >> > > >> 12 parts per million is pretty good. Is that cumulative deviation > > >> from ''wall > > >> time'' over ~12 hours? > > >> > > > yes, deviation between the guest''s time and an ntp reference. > > > > > >> That could easily be explained by the fact that Xen > > >> system time is not sync''ed with ntp. > > >> > > >> > > > That''s true. And, as we have discussed, this error seems to > > vary quite > > > a bit > > > platform to platform for some reason. I will verify that > > this still is > > > the case. > > > > > > -Dave > > > > > >> -- Keir > > >> > > >> > > >> > > >> > > > > > > > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan, Keir: Here is where I stand on the overhead of (hpet)read_64_main_counter() for the version layered on get_s_time with the max function compared to a version that goes to the hardware each time. There are two histograms, each with 100 buckets, each bucket is 64 cycles. There are 1991 cycles per usec on this box. Bucket 99 contains all events where overhead >= (99*64) cycles. Layered on stime the overhead is probably lower on average. Both histograms are bi-modal, but the going-to-the-hardware one seems to have a stronger second mode. As we have discussed, the cost of going to the hardware could vary quite a bit from platform to platform. I optimized the code around read_64_main_counter() over stime quite a bit, but I''m sure there is room for improvement. -Dave read_64_main_counter() On stime: (VMM) cycles per bucket 64 (VMM) (VMM) 0: 0 78795 148271 21173 15902 47704 89195 121962 (VMM) 8: 83632 51848 17531 12987 10976 8816 9120 8608 (VMM) 16: 5685 3972 3783 2518 1052 710 608 469 (VMM) 24: 277 159 83 46 34 23 19 16 (VMM) 32: 9 6 7 3 4 8 5 6 (VMM) 40: 9 7 14 13 17 25 22 29 (VMM) 48: 25 19 35 27 30 26 21 23 (VMM) 56: 17 24 12 27 22 18 10 22 (VMM) 64: 19 16 16 16 28 18 23 16 (VMM) 72: 22 22 12 14 21 19 17 19 (VMM) 80: 18 14 10 14 11 12 8 18 (VMM) 88: 16 10 17 14 10 8 11 11 (VMM) 96: 10 10 0 175 read_64_main_counter() Going to the hardware: (VMM) cycles per bucket 64 (VMM) (VMM) 0: 92529 148423 27850 12532 28042 43336 60516 59011 (VMM) 8: 36895 14043 8162 6857 7794 7401 5099 2986 (VMM) 16: 1636 1066 796 592 459 409 314 248 (VMM) 24: 206 195 138 97 71 45 35 34 (VMM) 32: 33 36 40 40 25 26 25 26 (VMM) 40: 37 23 18 30 27 30 34 44 (VMM) 48: 38 19 25 23 23 25 21 27 (VMM) 56: 28 24 43 80 220 324 568 599 (VMM) 64: 610 565 611 699 690 846 874 788 (VMM) 72: 703 542 556 613 605 603 559 500 (VMM) 80: 485 493 512 578 561 594 575 614 (VMM) 88: 759 851 895 856 807 770 719 958 (VMM) 96: 1127 1263 0 18219 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
OK, then although you are not saying it directly, assuming your patch is accepted, it sounds like the hpet will now always be more accurate than pit and the change a couple of months ago that turns OFF the guest hpet by default should now be reversed so that the guest hpet is turned ON by default (or perhaps the option should just be removed or ignored, with the previous behavior being restored that hpet is always on). True? Thanks, Dan> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com]On Behalf Of > Dave Winchell > Sent: Monday, April 28, 2008 12:09 PM > To: dan.magenheimer@oracle.com > Cc: Dave Winchell; Tian, Kevin; xen-devel@lists.xensource.com; Ian > Pratt; Keir Fraser > Subject: Re: [Xen-devel] Re: Fix for get_s_time() > > > Dan Magenheimer wrote: > > > Hi Dave -- > > > >> You know, its more like hpet on system time. > > > > I wonder how much of the problems we observed with skew on > pit was due to > > the pit-on-tsc "bug"... in other words, should the virtual > pit be based on > > system time also? > > For guests that compute missed ticks, it may not help. That''s > because here > the guests are using tsc in their computations of offset and last > interrupt time stamp. > Also, there is the esoteric use of delay in the computations for pit. > With hpet, on the other hand, the guests don''t read the tsc and don''t > use delay - > they only rely on the hpet main counter. > > It might improve accuracy for a guest that does not compute missed > ticks. But you > would still have the time going backwards issue, unless you > patched the > guest. > > Most of the hpet accuracy we see is due to clean and correct > algorithms > in the guest, > in my opinion. Of course we have to do the right things in > emulating the > hpet in xen. > > -Dave > > > > > Dan > > > > -----Original Message----- > > *From:* Dave Winchell [mailto:dwinchell@virtualiron.com] > > *Sent:* Friday, April 25, 2008 7:54 PM > > *To:* dan.magenheimer@oracle.com > > *Cc:* Keir Fraser; Tian, Kevin; > xen-devel@lists.xensource.com; Ian > > Pratt; Dave Winchell > > *Subject:* RE: [Xen-devel] Re: Fix for get_s_time() > > > > Hi Dan, > > > > I just need to remove some debug and merge with unstable. > > I should be able to send you a patch Monday or Tuesday. > > You know, its more like hpet on system time. > > Thanks for the testing offer. > > > > Regards, > > Dave > > > > > > -----Original Message----- > > From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] > > Sent: Fri 4/25/2008 5:03 PM > > To: Dave Winchell > > Cc: Keir Fraser; Tian, Kevin; > xen-devel@lists.xensource.com; Ian Pratt > > Subject: RE: [Xen-devel] Re: Fix for get_s_time() > > > > Hi Dave -- > > > > Are you ready to release the guest-virtual-platform-timer > > on xen-system-time patch yet? If so, we''d be happy to > > give it some testing. > > > > Thanks, > > Dan > > > > > -----Original Message----- > > > From: Dave Winchell [mailto:dwinchell@virtualiron.com] > > > Sent: Friday, April 25, 2008 1:48 PM > > > To: Dave Winchell > > > Cc: Keir Fraser; Tian, Kevin; dan.magenheimer@oracle.com; > > > xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell > > > Subject: Re: [Xen-devel] Re: Fix for get_s_time() > > > > > > > > > Keir, > > > > > > Last nights run had the error in the 12 ppm range. > > > Here is the change we have been talking about. > > > > > > -Dave > > > > > > Dave Winchell wrote: > > > > > > > Keir Fraser wrote: > > > > > > > >> On 24/4/08 17:04, "Dave Winchell" > > > <dwinchell@virtualiron.com> wrote: > > > >> > > > >> > > > >> > > > >>> yes, this is the issue. What you suggest should be fine > > > and I am trying > > > >>> it now. > > > >>> With the locking version (and a fix to a bug I > > > introduced) I got .0012% > > > >>> error > > > >>> on an overnight run with hpet layered on > > > get_s_time_mono(), which is > > > >>> the > > > >>> max(prev, cur) layer on get_s_time we discussed. > > > >>> > > > >> > > > >> > > > >> 12 parts per million is pretty good. Is that > cumulative deviation > > > >> from ''wall > > > >> time'' over ~12 hours? > > > >> > > > > yes, deviation between the guest''s time and an ntp > reference. > > > > > > > >> That could easily be explained by the fact that Xen > > > >> system time is not sync''ed with ntp. > > > >> > > > >> > > > > That''s true. And, as we have discussed, this error seems to > > > vary quite > > > > a bit > > > > platform to platform for some reason. I will verify that > > > this still is > > > > the case. > > > > > > > > -Dave > > > > > > > >> -- Keir > > > >> > > > >> > > > >> > > > >> > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer wrote:>OK, then although you are not saying it directly, assuming >your patch is accepted, it sounds like the hpet will now >always be more accurate than pit and the change a couple >of months ago that turns OFF the guest hpet by default >should now be reversed so that the guest hpet is turned ON >by default (or perhaps the option should just be removed >or ignored, with the previous behavior being restored that >hpet is always on). > >Dan, We should have the option on a per guest basis, as to whether that guest sees the hpet in the acpi tables. This is because one guest that I know of, Windows 2k8, doesn''t keep good time with hpet with either of the two policies that I support now. I may be able to come up with a policy for 2k8, but still, we should have the ability to turn hpet off. Furthermore, the great accuracy I have been reporting is for Linux guests, which, so far, all have computed missed ticks. Windows 2k3 does not compute missed ticks, and I have a policy for it which keeps it relatively accurate, but as I recall, not as accurate as the Linux guests. To turn the hpet off, not only should we not advertise it in acpi, but also we should return 0 for the hpet capabilities register as some guests just try to use the hpet regardless of acpi. Regards, Dave>True? > >Thanks, >Dan > > > >>-----Original Message----- >>From: xen-devel-bounces@lists.xensource.com >>[mailto:xen-devel-bounces@lists.xensource.com]On Behalf Of >>Dave Winchell >>Sent: Monday, April 28, 2008 12:09 PM >>To: dan.magenheimer@oracle.com >>Cc: Dave Winchell; Tian, Kevin; xen-devel@lists.xensource.com; Ian >>Pratt; Keir Fraser >>Subject: Re: [Xen-devel] Re: Fix for get_s_time() >> >> >>Dan Magenheimer wrote: >> >> >> >>>Hi Dave -- >>> >>> >>> >>>>You know, its more like hpet on system time. >>>> >>>> >>>I wonder how much of the problems we observed with skew on >>> >>> >>pit was due to >> >> >>>the pit-on-tsc "bug"... in other words, should the virtual >>> >>> >>pit be based on >> >> >>>system time also? >>> >>> >>For guests that compute missed ticks, it may not help. That''s >>because here >>the guests are using tsc in their computations of offset and last >>interrupt time stamp. >>Also, there is the esoteric use of delay in the computations for pit. >>With hpet, on the other hand, the guests don''t read the tsc and don''t >>use delay - >>they only rely on the hpet main counter. >> >>It might improve accuracy for a guest that does not compute missed >>ticks. But you >>would still have the time going backwards issue, unless you >>patched the >>guest. >> >>Most of the hpet accuracy we see is due to clean and correct >>algorithms >>in the guest, >>in my opinion. Of course we have to do the right things in >>emulating the >>hpet in xen. >> >>-Dave >> >> >> >>>Dan >>> >>> -----Original Message----- >>> *From:* Dave Winchell [mailto:dwinchell@virtualiron.com] >>> *Sent:* Friday, April 25, 2008 7:54 PM >>> *To:* dan.magenheimer@oracle.com >>> *Cc:* Keir Fraser; Tian, Kevin; >>> >>> >>xen-devel@lists.xensource.com; Ian >> >> >>> Pratt; Dave Winchell >>> *Subject:* RE: [Xen-devel] Re: Fix for get_s_time() >>> >>> Hi Dan, >>> >>> I just need to remove some debug and merge with unstable. >>> I should be able to send you a patch Monday or Tuesday. >>> You know, its more like hpet on system time. >>> Thanks for the testing offer. >>> >>> Regards, >>> Dave >>> >>> >>> -----Original Message----- >>> From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >>> Sent: Fri 4/25/2008 5:03 PM >>> To: Dave Winchell >>> Cc: Keir Fraser; Tian, Kevin; >>> >>> >>xen-devel@lists.xensource.com; Ian Pratt >> >> >>> Subject: RE: [Xen-devel] Re: Fix for get_s_time() >>> >>> Hi Dave -- >>> >>> Are you ready to release the guest-virtual-platform-timer >>> on xen-system-time patch yet? If so, we''d be happy to >>> give it some testing. >>> >>> Thanks, >>> Dan >>> >>> > -----Original Message----- >>> > From: Dave Winchell [mailto:dwinchell@virtualiron.com] >>> > Sent: Friday, April 25, 2008 1:48 PM >>> > To: Dave Winchell >>> > Cc: Keir Fraser; Tian, Kevin; dan.magenheimer@oracle.com; >>> > xen-devel@lists.xensource.com; Ian Pratt; Dave Winchell >>> > Subject: Re: [Xen-devel] Re: Fix for get_s_time() >>> > >>> > >>> > Keir, >>> > >>> > Last nights run had the error in the 12 ppm range. >>> > Here is the change we have been talking about. >>> > >>> > -Dave >>> > >>> > Dave Winchell wrote: >>> > >>> > > Keir Fraser wrote: >>> > > >>> > >> On 24/4/08 17:04, "Dave Winchell" >>> > <dwinchell@virtualiron.com> wrote: >>> > >> >>> > >> >>> > >> >>> > >>> yes, this is the issue. What you suggest should be fine >>> > and I am trying >>> > >>> it now. >>> > >>> With the locking version (and a fix to a bug I >>> > introduced) I got .0012% >>> > >>> error >>> > >>> on an overnight run with hpet layered on >>> > get_s_time_mono(), which is >>> > >>> the >>> > >>> max(prev, cur) layer on get_s_time we discussed. >>> > >>> >>> > >> >>> > >> >>> > >> 12 parts per million is pretty good. Is that >>> >>> >>cumulative deviation >> >> >>> > >> from ''wall >>> > >> time'' over ~12 hours? >>> > >> >>> > > yes, deviation between the guest''s time and an ntp >>> >>> >>reference. >> >> >>> > > >>> > >> That could easily be explained by the fact that Xen >>> > >> system time is not sync''ed with ntp. >>> > >> >>> > >> >>> > > That''s true. And, as we have discussed, this error seems to >>> > vary quite >>> > > a bit >>> > > platform to platform for some reason. I will verify that >>> > this still is >>> > > the case. >>> > > >>> > > -Dave >>> > > >>> > >> -- Keir >>> > >> >>> > >> >>> > >> >>> > >> >>> > > >>> > >>> > >>> > >>> >>> >>> >>> >>_______________________________________________ >>Xen-devel mailing list >>Xen-devel@lists.xensource.com >>http://lists.xensource.com/xen-devel >> >> >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
So does the below indicate that 92529 of your HPET accesses took between 0 and 63 cycles? That seems rather short for an access off-chip and to the southbridge. -- Keir On 28/4/08 19:40, "Dave Winchell" <dwinchell@virtualiron.com> wrote:> read_64_main_counter() On stime: > > (VMM) cycles per bucket 64 > (VMM) > (VMM) 0: 0 78795 148271 21173 15902 47704 89195 121962 > (VMM) 8: 83632 51848 17531 12987 10976 8816 9120 8608 > (VMM) 16: 5685 3972 3783 2518 1052 710 608 469 > (VMM) 24: 277 159 83 46 34 23 19 16 > (VMM) 32: 9 6 7 3 4 8 5 6 > (VMM) 40: 9 7 14 13 17 25 22 29 > (VMM) 48: 25 19 35 27 30 26 21 23 > (VMM) 56: 17 24 12 27 22 18 10 22 > (VMM) 64: 19 16 16 16 28 18 23 16 > (VMM) 72: 22 22 12 14 21 19 17 19 > (VMM) 80: 18 14 10 14 11 12 8 18 > (VMM) 88: 16 10 17 14 10 8 11 11 > (VMM) 96: 10 10 0 175 > > read_64_main_counter() Going to the hardware: > > (VMM) cycles per bucket 64 > (VMM) > (VMM) 0: 92529 148423 27850 12532 28042 43336 60516 59011 > (VMM) 8: 36895 14043 8162 6857 7794 7401 5099 2986 > (VMM) 16: 1636 1066 796 592 459 409 314 248 > (VMM) 24: 206 195 138 97 71 45 35 34 > (VMM) 32: 33 36 40 40 25 26 25 26 > (VMM) 40: 37 23 18 30 27 30 34 44 > (VMM) 48: 38 19 25 23 23 25 21 27 > (VMM) 56: 28 24 43 80 220 324 568 599 > (VMM) 64: 610 565 611 699 690 846 874 788 > (VMM) 72: 703 542 556 613 605 603 559 500 > (VMM) 80: 485 493 512 578 561 594 575 614 > (VMM) 88: 759 851 895 856 807 770 719 958 > (VMM) 96: 1127 1263 0 18219 >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yes. Its an AMD Warthog. I''ll try some other platforms when I get a chance. -Dave Keir Fraser wrote:>So does the below indicate that 92529 of your HPET accesses took between 0 >and 63 cycles? That seems rather short for an access off-chip and to the >southbridge. > > -- Keir > >On 28/4/08 19:40, "Dave Winchell" <dwinchell@virtualiron.com> wrote: > > > >>read_64_main_counter() On stime: >> >>(VMM) cycles per bucket 64 >>(VMM) >>(VMM) 0: 0 78795 148271 21173 15902 47704 89195 121962 >>(VMM) 8: 83632 51848 17531 12987 10976 8816 9120 8608 >>(VMM) 16: 5685 3972 3783 2518 1052 710 608 469 >>(VMM) 24: 277 159 83 46 34 23 19 16 >>(VMM) 32: 9 6 7 3 4 8 5 6 >>(VMM) 40: 9 7 14 13 17 25 22 29 >>(VMM) 48: 25 19 35 27 30 26 21 23 >>(VMM) 56: 17 24 12 27 22 18 10 22 >>(VMM) 64: 19 16 16 16 28 18 23 16 >>(VMM) 72: 22 22 12 14 21 19 17 19 >>(VMM) 80: 18 14 10 14 11 12 8 18 >>(VMM) 88: 16 10 17 14 10 8 11 11 >>(VMM) 96: 10 10 0 175 >> >>read_64_main_counter() Going to the hardware: >> >>(VMM) cycles per bucket 64 >>(VMM) >>(VMM) 0: 92529 148423 27850 12532 28042 43336 60516 59011 >>(VMM) 8: 36895 14043 8162 6857 7794 7401 5099 2986 >>(VMM) 16: 1636 1066 796 592 459 409 314 248 >>(VMM) 24: 206 195 138 97 71 45 35 34 >>(VMM) 32: 33 36 40 40 25 26 25 26 >>(VMM) 40: 37 23 18 30 27 30 34 44 >>(VMM) 48: 38 19 25 23 23 25 21 27 >>(VMM) 56: 28 24 43 80 220 324 568 599 >>(VMM) 64: 610 565 611 699 690 846 874 788 >>(VMM) 72: 703 542 556 613 605 603 559 500 >>(VMM) 80: 485 493 512 578 561 594 575 614 >>(VMM) 88: 759 851 895 856 807 770 719 958 >>(VMM) 96: 1127 1263 0 18219 >> >> >> > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel