I am very frequently seeing these Time went backwards messages on my test systems. Running a 2 VCPU guest on a single CPU computer seems to be effective at triggering this ;) As you can see from the messages below, cpu_processed tends to be one or two jiffies larger than processed, which equals to "delta - cpu_delta". I have been staring at timer_interrupt() for a few days now and trying a few things, but the bug continues. Note that this problem starts very early at boot time, before the root filesystem is mounted even... Timer ISR/0: Time went backwards: delta=3962070 cpu_delta=-37930 shadow=152513705782853 off=54253888 processed=152513756074150 cpu_processed=152513760074150 0: 152513760074150 1: 152513420074150 Timer ISR/0: Time went backwards: delta=4009918 cpu_delta=-3990082 shadow=152662706855499 off=369229091 processed=152663072074150 cpu_processed=152663080074150 0: 152663080074150 1: 152663016074150 Timer ISR/0: Time went backwards: delta=3973296 cpu_delta=-26704 shadow=152662706855499 off=373192366 processed=152663076074150 cpu_processed=152663080074150 0: 152663080074150 1: 152663016074150 Timer ISR/0: Time went backwards: delta=2084348 cpu_delta=-1915652 shadow=152662706855499 off=375303210 processed=152663080074150 cpu_processed=152663084074150 0: 152663084074150 1: 152663016074150 Timer ISR/0: Time went backwards: delta=1932754 cpu_delta=-2067246 shadow=152692707072523 off=234935002 processed=152692940074150 cpu_processed=152692944074150 0: 152692944074150 1: 152692780074150 -- All Rights Reversed _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10 Mar 2006, at 17:00, Rik van Riel wrote:> I am very frequently seeing these Time went backwards messages > on my test systems. Running a 2 VCPU guest on a single CPU > computer seems to be effective at triggering this ;)That''s interesting, particularly if you can trigger on a single physical CPU. In that case cpu_delta should definitely *never* be negative. Maybe the steal_time logic, which does mess with per-cpu processed_system_time, is not quite right? :-) When did you start seeing the messages? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 10 Mar 2006, Keir Fraser wrote:> On 10 Mar 2006, at 17:00, Rik van Riel wrote: > > > I am very frequently seeing these Time went backwards messages > > on my test systems. Running a 2 VCPU guest on a single CPU > > computer seems to be effective at triggering this ;) > > That''s interesting, particularly if you can trigger on a single physical > CPU. In that case cpu_delta should definitely *never* be negative. Maybe > the steal_time logic, which does mess with per-cpu > processed_system_time, is not quite right? :-)My current suspicion is that the per-cpu processed_system_time gets incremented twice during one timer irq handling, and then we notice it was incremented too far during the _next_ timer interrupt.> When did you start seeing the messages?They became frequent after the new steal time code went in. -- All Rights Reversed _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10 Mar 2006, at 17:14, Keir Fraser wrote:>> I am very frequently seeing these Time went backwards messages >> on my test systems. Running a 2 VCPU guest on a single CPU >> computer seems to be effective at triggering this ;) > > That''s interesting, particularly if you can trigger on a single > physical CPU. In that case cpu_delta should definitely *never* be > negative. Maybe the steal_time logic, which does mess with per-cpu > processed_system_time, is not quite right? :-)To that end, it would be interesting to see if after the first do-while loop in timer_interrupt(), you can end up with sched_time > (processed_system_time + delta). If so, we could end up running per-cpu processed_system_time ahead of current system time and so get -ve delta the *next* time we run timer_interrupt(). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 10 Mar 2006, Keir Fraser wrote:> To that end, it would be interesting to see if after the first do-while > loop in timer_interrupt(), you can end up with sched_time > > (processed_system_time + delta). > > If so, we could end up running per-cpu processed_system_time ahead of > current system time and so get -ve delta the *next* time we run > timer_interrupt().In my observations, delta is always positive. It is cpu_delta which goes negative. I''ll go count the cpu specific deltas and will check to see if they are larger than the system wide delta. -- All Rights Reversed _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10 Mar 2006, at 17:33, Rik van Riel wrote:>> To that end, it would be interesting to see if after the first >> do-while >> loop in timer_interrupt(), you can end up with sched_time > >> (processed_system_time + delta). >> >> If so, we could end up running per-cpu processed_system_time ahead of >> current system time and so get -ve delta the *next* time we run >> timer_interrupt(). > > In my observations, delta is always positive. It is cpu_delta > which goes negative. I''ll go count the cpu specific deltas and > will check to see if they are larger than the system wide delta.Yeah, sorry, I was being sloppy in my terminology. I think comparing stolen/blocked delta values with cpu_delta would be useful -- see if those are bigger than we''d expect and pushing per-cpu processed time too far forward. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 10 Mar 2006, Keir Fraser wrote:> Yeah, sorry, I was being sloppy in my terminology. I think comparing > stolen/blocked delta values with cpu_delta would be useful -- see if > those are bigger than we''d expect and pushing per-cpu processed time too > far forward.stolen + blocked is larger than delta_cpu all the time: Timer ISR/0: stolen 1636284 + blocked 109702879 > delta_cpu 108006671 printk: 33 messages suppressed. Timer ISR/0: stolen 3664960 + blocked 25758521 > delta_cpu 24005880 printk: 32 messages suppressed. Timer ISR/0: stolen 904813 + blocked 118716967 > delta_cpu 116009003 printk: 41 messages suppressed. Timer ISR/0: stolen 2512450 + blocked 48318256 > delta_cpu 48005732 printk: 34 messages suppressed. Digging into the hypervisor code now... -- All Rights Reversed _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 10 Mar 2006, Rik van Riel wrote:> stolen + blocked is larger than delta_cpu all the time:In fact, just blocked by itself is often larger than delta_cpu! I guess this may be a bad sign...> Timer ISR/0: stolen 1636284 + blocked 109702879 > delta_cpu 108006671 > printk: 33 messages suppressed. > Timer ISR/0: stolen 3664960 + blocked 25758521 > delta_cpu 24005880 > printk: 32 messages suppressed. > Timer ISR/0: stolen 904813 + blocked 118716967 > delta_cpu 116009003 > printk: 41 messages suppressed. > Timer ISR/0: stolen 2512450 + blocked 48318256 > delta_cpu 48005732 > printk: 34 messages suppressed.-- All Rights Reversed _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10 Mar 2006, at 21:57, Rik van Riel wrote:> stolen + blocked is larger than delta_cpu all the time: > > Timer ISR/0: stolen 1636284 + blocked 109702879 > delta_cpu 108006671 > printk: 33 messages suppressed.Is sched_time > (processed_system_time + delta)? That would be the real killer. Because we only account stolen/blocked/running time in jiffy-sized chunks, it''s possible for the three deltas to be out of sync with each other by a jiffy. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, 11 Mar 2006, Keir Fraser wrote:> On 10 Mar 2006, at 21:57, Rik van Riel wrote: > > > stolen + blocked is larger than delta_cpu all the time: > > > > Timer ISR/0: stolen 1636284 + blocked 109702879 > delta_cpu 108006671 > > printk: 33 messages suppressed. > > Is sched_time > (processed_system_time + delta)? That would be the real > killer.Very frequently. By significant amounts, too... No, I have no idea why... Timer ISR/0: stolen 1921037 + blocked 79704210 > delta_cpu 79978775 Timer ISR/0: sched_time 429620376038547 > delta 4006385 + cpu processed_system_time 429620356036193 Timer ISR/0: sched_time 429620710119143 > delta_cpu 206088604 + global processed_system_time 429620500036193 Timer ISR/0: sched_time 429621780039343 > delta 404013043 + cpu processed_system_time 429621360036193 Timer ISR/0: sched_time 429622284012371 > delta_cpu -11948 + global processed_system_time 429622280036193> Because we only account stolen/blocked/running time in jiffy-sized chunks, > it''s possible for the three deltas to be out of sync with each other by a > jiffy.But we round everything down, so while it is possible for the sum of rounded down stolen/blocked/running/idle to be way less than the total, I don''t see how the sum of these rounded down values could ever be more than the total (which is only rounded off once)... -- All Rights Reversed _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 13 Mar 2006, Rik van Riel wrote:> On Sat, 11 Mar 2006, Keir Fraser wrote: > > On 10 Mar 2006, at 21:57, Rik van Riel wrote: > > > > > stolen + blocked is larger than delta_cpu all the time: > > > > > > Timer ISR/0: stolen 1636284 + blocked 109702879 > delta_cpu 108006671 > > > printk: 33 messages suppressed. > > > > Is sched_time > (processed_system_time + delta)? That would be the real > > killer. > > Very frequently. By significant amounts, too... > > No, I have no idea why... > > Timer ISR/0: stolen 1921037 + blocked 79704210 > delta_cpu 79978775 > Timer ISR/0: sched_time 429620376038547 > delta 4006385 + cpu processed_system_time 429620356036193 > Timer ISR/0: sched_time 429620710119143 > delta_cpu 206088604 + global processed_system_time 429620500036193Never mind me. I mixed up the global and per-cpu variables. Putting them back in the right place makes this printk go away rather nicely. Back to square one... -- All Rights Reversed _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 13 Mar 2006, Rik van Riel wrote:> Back to square one...OK, finally nailed it! When both stolen and blocked are rounded down, it is possible for the final increment of the cpu local processed_system_time to move the cpu local system time ahead a bit further than expected - but still proper wrt. wall clock time. This gives problems on a next timer interrupt, when stolen and blocked both get rounded up and end up incrementing the per cpu processed_system_time too far. These simple checks make sure that the cpu local processed_system_time never gets advanced too far. Not advancing the variable now should be fine, since we''ll increment it at the next timer tick... This patch has made the "time went backwards" error messages go away completely. Signed-off-by: Rik van Riel <riel@redhat.com> --- linux-2.6.15.x86_64/arch/i386/kernel/time-xen.c.overflow 2006-03-10 00:35:53.000000000 -0500 +++ linux-2.6.15.x86_64/arch/i386/kernel/time-xen.c 2006-03-13 18:02:41.000000000 -0500 @@ -671,6 +671,10 @@ */ if (stolen > 0) { delta_cpu -= stolen; + if (unlikely(delta_cpu < 0)) { + stolen += delta_cpu; + delta_cpu = blocked = 0; + } do_div(stolen, NS_PER_TICK); per_cpu(processed_stolen_time, cpu) += stolen * NS_PER_TICK; per_cpu(processed_system_time, cpu) += stolen * NS_PER_TICK; @@ -684,13 +688,23 @@ */ if (blocked > 0) { delta_cpu -= blocked; + if (unlikely(delta_cpu < 0)) { + blocked += delta_cpu; + delta_cpu = 0; + } do_div(blocked, NS_PER_TICK); per_cpu(processed_blocked_time, cpu) += blocked * NS_PER_TICK; per_cpu(processed_system_time, cpu) += blocked * NS_PER_TICK; account_steal_time(idle_task(cpu), (cputime_t)blocked); } - /* Account user/system ticks. */ + /* + * Account user/system ticks. + * This gets excess time accounted when blocked and/or stolen get + * rounded down, but the "if (delta_cpu < 0)" tests above compensate + * for that to keep cpu local processed_system_time from advancing + * too far. + */ if (delta_cpu > 0) { do_div(delta_cpu, NS_PER_TICK); per_cpu(processed_system_time, cpu) += delta_cpu * NS_PER_TICK; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13 Mar 2006, at 23:07, Rik van Riel wrote:> OK, finally nailed it! When both stolen and blocked are rounded > down, it is possible for the final increment of the cpu local > processed_system_time to move the cpu local system time ahead a > bit further than expected - but still proper wrt. wall clock time.Thanks for tracking this down! I''m not sure how you mean that rounding down stolen and blocked causes time to get progressed further than expected, but I''m happy to see some more sanity checking and clamping in that code, and if it fixes your observed problems... :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 14 Mar 2006, Keir Fraser wrote:> On 13 Mar 2006, at 23:07, Rik van Riel wrote: > > > OK, finally nailed it! When both stolen and blocked are rounded > > down, it is possible for the final increment of the cpu local > > processed_system_time to move the cpu local system time ahead a > > bit further than expected - but still proper wrt. wall clock time. > > Thanks for tracking this down! > > I''m not sure how you mean that rounding down stolen and blocked causes > time to get progressed further than expected,Not initially, but on the next timer tick. Lets use jiffies as the time unit in this example: Tick N: delta = 2.2 stolen = 0.9 blocked = 0.7 stolen and blocked will not get incremented, while the per-cpu processed_time gets 2 jiffies Tick N+1: delta = 1.5 stolen = 1.3 (includes the 0.9 from the last tick) blocked = 1.1 (includes the 0.7 from the last tick) stolen gets incremented by 1, blocked gets incremented by 1, and (as a consequence) the per-cpu procssed_time gets incremented by 2 - 1 more than it should be! -- All Rights Reversed _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14 Mar 2006, at 15:13, Rik van Riel wrote:> Tick N: > delta = 2.2 > stolen = 0.9 > blocked = 0.7 > > stolen and blocked will not get incremented, while > the per-cpu processed_time gets 2 jiffiesBut we decrement delta_cpu *before* rounding down. So, after handling stolen and blocked, we''ll end up with delta_cpu==0.6 which gets rounded down to 0 and so per-cpu processed_time gets 0 jiffies? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 14 Mar 2006, Keir Fraser wrote:> On 14 Mar 2006, at 15:13, Rik van Riel wrote: > > > Tick N: > > delta = 2.2 > > stolen = 0.9 > > blocked = 0.7 > > > > stolen and blocked will not get incremented, while > > the per-cpu processed_time gets 2 jiffies > > But we decrement delta_cpu *before* rounding down. So, after handling > stolen and blocked, we''ll end up with delta_cpu==0.6 which gets rounded > down to 0 and so per-cpu processed_time gets 0 jiffies?Hmmmm good point! Now I no longer understand my own bugfix. Great... -- All Rights Reversed _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 14 Mar 2006, Rik van Riel wrote:> Hmmmm good point! > > Now I no longer understand my own bugfix. Great...Actually, we come into timer_interrupt() sometimes with blocked + stolen > delta_cpu, so the bugfix is valid. I just no longer understand how we can get into this situation... -- All Rights Reversed _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14 Mar 2006, at 15:32, Rik van Riel wrote:> Actually, we come into timer_interrupt() sometimes with > blocked + stolen > delta_cpu, so the bugfix is valid. > > I just no longer understand how we can get into this > situation...Could be due to horrendous clock jitter across CPUs. I checked in code yesterday to add a new debug key (''t'') to Xen. If you hit that over the serial line, you''ll get told the maximum clock difference across CPUs in microseconds. On my system that is usually zero or one microseconds. If it''s much much bigger than that on your system then it needs looking into. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 14 Mar 2006, Keir Fraser wrote:> On 14 Mar 2006, at 15:32, Rik van Riel wrote: > > > Actually, we come into timer_interrupt() sometimes with > > blocked + stolen > delta_cpu, so the bugfix is valid. > > > > I just no longer understand how we can get into this > > situation... > > Could be due to horrendous clock jitter across CPUs.Not on a single CPU system ;) -- All Rights Reversed _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14 Mar 2006, at 15:43, Rik van Riel wrote:>>> Actually, we come into timer_interrupt() sometimes with >>> blocked + stolen > delta_cpu, so the bugfix is valid. >>> >>> I just no longer understand how we can get into this >>> situation... >> >> Could be due to horrendous clock jitter across CPUs. > > Not on a single CPU system ;)Hmmm... well, a few suggestions: 1. The first do-while loop in timer_interrupt() in time-xen.c aims to get a consistent snapshot of blocked, stolen and delta_cpu, by checking timestamp-info version numbers in the two while predicates. I just added a rmb() to time_values_up_to_date() -- required for absolute accuracy and will also act as a compiler barrier too. But the likelihood of gcc having moved that test earlier is very small I think. :-) If it had, though, you could end up with a later snapshot of stolen/blocked than of delta_cpu, which would be bad. 2. Maybe the time synchronising code in Xen is broken. I tested it pretty hard when I implemented it, but you never know. It would be bad if time could step backwards during local_time_calibration() in xen/arch/x86/time.c. It might be worth doing a before-and-after check in that function (e.g., see whether value returned by get_s_time() gets smaller). This bug can''t be *that* hard to find on a single CPU system. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel