Still looking at bug #195 which shows up occasionally upon boot, and now, I can recreate by generating a lot of traffic over the serial line into Xen (holding down ''r'' to print the run queues). I started comparing how Xen sets up and calibrates the timers with Linux and have a few questions where they differ. In Linux, the i386 and x86_64 set up the pit timer using binary, mode 2, LSB/MSB, ch 0, and then periodically read PIT_CH0 to obtain a count. Linux also uses PIT CH2 to do some calibration. Xen, in xen/arch/x86/i8259.c sets up the PIT in the same manner (binary, mode 2, LSB/MSB, ch 0) but as far as I can see, PIT_CH0 is never used again, specifically, PIT_CH0 is not used when handling a timer interrupt. Instead, in xen/arch/x86/time.c, when Xen uses PIT as the platform timer, it runs pit_read_counter() which gets a count from PIT_CH2, which was used during calibrate_boot_tsc() but in mode 0. I don''t fully understand the different modes of the PIT, but looking at the some intel [1]documentation I see: MODE 0: INTERRUPT ON TERMINAL COUNT Mode 0 is typically used for event counting After the Control Word is written OUT is initially low and will remain low until the Counter reaches zero OUT then goes high and remains high until a new count or a new Mode 0 Control Word is written into the Counter MODE 2 RATE GENERATOR This Mode functions like a divide-by-N counter It is typicially used to generate a Real Time Clock interrupt OUT will initially be high When the initial count has decremented to 1 OUT goes low for one CLK pulse OUT then goes high again the Counter reloads the initial count and the process is repeated Mode 2 is periodic the same sequence is repeated indefinitely For an initial count of N the sequence repeats every N CLK cycles Why does Xen chose to use mode 0, on PIT CH2 for calculating how much time has passed rather than how Linux using PIT_CH0 in mode 2? Are there some trade offs? 1. http://www.cs.utexas.edu/users/dahlin/Classes/UGOS/reading/82C54.pdf -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 23 Sep 2005, at 20:49, Ryan Harper wrote:> Why does Xen chose to use mode 0, on PIT CH2 for calculating how much > time has passed rather than how Linux using PIT_CH0 in mode 2? Are > there some trade offs?The way we track time in Xen is completely different to Linux. Linux tracks time off periodic interrupts from PIT_CH0: initially assuming each tick is 10ms but allowing that to be trimmed by adjtimex if ntpd is running. Xen only uses PIT_CH0 as a fallback timer on uniprocessor systems with no local APIC: aprt from that it is basically unused. Xen doesn''t rely on fixed periodic interrupts -- instead programming the local APIC timer to the next event of interest. We track time by assuming the PIT crystal is a precise 1.119380Mhz source: we can therefore measure the passage of time by reading how far PIT_CH2 has decremented and perform a multiplication. To more quickly estimate the passage of time, each CPU calibrates its local APIC oscillator to PIT_CH2 -- we can then estimate time-of-day with a RDTSC instruction + multiplication. The main problem with using PIT as platform timer is that the counters are only 16 bits. We therefore periodically (every few 10s of ms) read the PIT_CH2 delta out into a wider 64-bit variable. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir, Thanks for the explanation. I''m still trying to reason out why we are seeing the ''Time went backwards'' every now and then, as well as being able to forcibly create the issue via serial interrupt floods (holding ''r'' with serial input sent to Xen). Is either assumption, PIT_CH0 ~= 10ms (Linux) or PIT_CH2 = 1.119380Mhz source (Xen) more valid? It sounds like either is valid. Though Linux can be adjusted via ntpd, is there any correcting factor for Xen? I know we can run ntpd in dom0 and it can update the wall clock timer, but AFAICT, wall clock doesn''t affect system_timestamp (which is where we detect the Time went backwards in linux-2.6-xen-sparse/arch/x86/xen/i386/kernel/time.c). Via some trace observation, I''ve noticed that the per-cpu time in Xen (specifically stime_local_stamp) can vary widely between cpus. Is this the best source to be using for updating Linux system_timestamp since it can vary significantly ( >1000000 ) between processors? I haven''t gotten around to doing it yet, but I was going to instrument irq disable/enable to see how long we run with irq''s disable with the thought that we might be missing some events from which Xen derives time calculations. Is this a worthwhile investigation? Do you have any other suggestions on where I should investigate? I know this isn''t a problem for most folks, but we are still concerned that it shows up every now and then on our platform under Xen, though we don''t see any of the Linux lost_tick stuff when running Linux. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 26 Sep 2005, at 16:22, Ryan Harper wrote:> Keir, > > Thanks for the explanation. I''m still trying to reason out why we are > seeing the ''Time went backwards'' every now and then, as well as being > able > to forcibly create the issue via serial interrupt floods (holding ''r'' > with serial input sent to Xen). > > Is either assumption, PIT_CH0 ~= 10ms (Linux) or PIT_CH2 = 1.119380Mhz > source (Xen) more valid? It sounds like either is valid. Though Linux > can be adjusted via ntpd, is there any correcting factor for Xen? I > know we can run ntpd in dom0 and it can update the wall clock timer, > but > AFAICT, wall clock doesn''t affect system_timestamp (which is where we > detect the Time went backwards in > linux-2.6-xen-sparse/arch/x86/xen/i386/kernel/time.c).NTP isn''t plumbed thru into Xen yet. So ntpd will only affect the domain it is run in right now.> Via some trace observation, I''ve noticed that the per-cpu time in Xen > (specifically stime_local_stamp) can vary widely between cpus. Is this > the best source to be using for updating Linux system_timestamp since > it > can vary significantly ( >1000000 ) between processors?It is supposed to be a trustworthy source. Given we resync every CPU every second, being out by 1ms would indicate either a bug in the resync code or local oscillators jittering by 1000ppm, which is hard to believe!> I haven''t gotten around to doing it yet, but I was going to instrument > irq disable/enable to see how long we run with irq''s disable with the > thought that we might be missing some events from which Xen derives > time > calculations. Is this a worthwhile investigation?It would be interesting. Unless you are sync''ed to the PIT you should be able to go reasonably long periods with no timer interrupts with no ill effects (except the CPU time may get to wander off track a little more than it would otherwise have done). If you are sync''ed to the PIT (you have no cyclone, hpet or other chipset timer) then CPU0 needs to take a timer interrupt at least every 50ms or it will miss the 16-bit PIT counter wrapping.> Do you have any other suggestions on where I should investigate? I > know > this isn''t a problem for most folks, but we are still concerned that it > shows up every now and then on our platform under Xen, though we don''t > see any of the Linux lost_tick stuff when running Linux.Getting this stuff right seems a lot harder than it ought to be. There are clearly problems in the existing code -- if you''ve been able to ascertain that there are real sync issues between CPUs (ought by >1ms relative to each other) then that is a start. I would investigate how they manage to get so out of sync, when all oscillators in the system ought to be driven by crystals with stability much better than that. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
* Keir Fraser <Keir.Fraser@cl.cam.ac.uk> [2005-09-26 10:45]:> >I haven''t gotten around to doing it yet, but I was going to instrument > >irq disable/enable to see how long we run with irq''s disable with the > >thought that we might be missing some events from which Xen derives > >time > >calculations. Is this a worthwhile investigation? > > It would be interesting. Unless you are sync''ed to the PIT you should > be able to go reasonably long periods with no timer interrupts with no > ill effects (except the CPU time may get to wander off track a little > more than it would otherwise have done). If you are sync''ed to the PIT > (you have no cyclone, hpet or other chipset timer) then CPU0 needs to > take a timer interrupt at least every 50ms or it will miss the 16-bit > PIT counter wrapping.Interesting. So, in the case of the dual-opteron box, we are slaved to the PIT, and while there is an HPET (or at least Xen was happy when I booted with hpet_force=1), it is not detectable via ACPI code (i.e. ACPI tables don''t include an ACPI_HPET table). In the case where I can force the timer to miss via serial interrupts, I believe we are preventing CPU0 from taking a timer interrupt within 50ms. The other case where I see ''Time went backwards'' is during dom0 boot up. I''ll dig into tracking the frequency of timer interrupts. Speaking of timer interrupts, why doesn''t the xen timer_interrupt() actually handle the platform timer read and overflow check there rather than raising a softirq? Thanks a lot for the help Keir. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel