thr3ads.net - freebsd stable - incorrect usleep/select delays with HZ

If this information is useful, please help other people find it:
Share via:

Luigi Rizzo

2009-Sep-06 15:46 UTC

incorrect usleep/select delays with HZ > 2500

(this problem seems to affect both current and -stable,
so let's see if here i have better luck)

I just noticed [Note 1,2] that when setting HZ > 2500 (even if it is
an exact divisor of the APIC/CPU clock) there is a significant
drift between the delays generated by usleep()/select() and those
computed by gettimeofday().  In other words, the error grows with
the amount of delay requested.

To show the problem, try this function

        int f(int wait_time) {  // wait_time in usec
                struct timeval start, end;
                gettimeofday(&start);
                usleep(w);      // or try select
                gettimeofday(&end)
                timersub(&end, &start, &x);
                return = x.tv_usec + 1000000*x.tv_sec - wait_time;
        }

for various HZ (kern.hz=NNNN in /boot/loader.conf) and wait times.
Ideally, we would expect the timings to be in error by something
between 0 and 1 (or 2) ticks, irrespective of the value of wait_time.
In fact, this is what you see with HZ=1000, 2000 and 2500.
But larger values of HZ (e.g. 4000, 5000, 10k, 40k) create
a drift of 0.5% and above (i.e. with HZ=5000, a 1-second delay
lasts 1.0064s and a 10s delay lasts 10.062s; with HZ=10k the
error becomes 1% and at HZ=40k the error becomes even bigger.

Note that with the fixes described below, even HZ=40k works perfectly well.

Turns out that the error has three components (described with
possible fixes):

1.  CAUSE: Errors in the measurement of the TSC (and APIC) frequencies,
        see [Note 3] for more details. This is responsible for the drift.
    FIX: It can be removed by rounding the measurement to the closest
        nominal values (e.g. my APIC runs at 100 MHz; we can use a
        table of supported values). Otherwise, see [Note 4]
    PROBLEM: is this general enough ?

2.  CAUSE: Use of approximate kernel time functions (getnanotime/getmicrotime)
        in nanosleep() and select(). This imposes an error of max(1tick, 1ms)
        in the computation of delays, irrespective of HZ values.
        BTW For reasons I don't understand this seems to affect
        nanosleep() more than select().
    FIX: It can be reduced to just 1 tick making kern.timecounter.tick writable
        and letting the user set it to 1 if high precision is required.
    PROBLEM: none that i see.

3.  CAUSE an error in tvtohz(), reported long ago in
        http://www.dragonflybsd.org/presentations/nanosleep/
        which causes a systematic error of an extra tick in the computation
        of the sleep times.
    FIX: the above link also contains a proposed fix (which in fact
	reverts a bug introduced in some old commit on FreeBSD)
    PROBLEM: none that i see.

Applying these three fixes i was able to run a kernel with HZ=40000
and see timing errors within 80-90us even with ten second delays.
This would put us on par with Linux [Note 5].
This is a significant improvement over the current situation
and the reason why I would like to explore the possibility of applying
some of these fixes.

I know there are open problems -- e.g. when the timer source used
by gettimeofday() gets corrected by ntp or other things, hardclock()
still ticks at the same rate so you'll see a drift if you don't apply
corrections there as well. Similarly, if HZ is not an exact
divisor of the clock source used by gettimeofday(), i suppose
errors will accumulate as well. However fixing these other
drift seems reasonably orthogonal at least to #2 and #3 above, and
a lot more difficult so we could at least start from these simple
fixes.


Would anyone be interested in reproducing the experiment (test program
attached -- run it with 'lat -p -i wait_time_in_microseconds')
and try to explain me what changes the system's behaviour above HZ=2500 ?

	cheers
	luigi

Notes:

[Note 1] I have some interest in running machines with high HZ values
    because this gives better precision to dummynet and various
    other tasks with soft timing constraints.

[Note 2] I have seen the same phenomenon on the following platform:
        RELENG_8/amd64 with AMD BE-2400 dual core cpu
        RELENG_7/i386 with AMD BE-2400 dual core cpu
        RELENG_7/i386 with Intel Centrino single core (Dell X1 Laptop)


[Note 3] the TSC frequency is computed reading the tsc around a
        call to DELAY(1000000) and assuming that the i8254 runs
        at the nominal rate, 1.193182 MHz.
        From tests I have made, the measurement in init_TSC() returns
        a large error when HZ is large, whereas repeating the measurement
        at a later time returns a much more reliable value.
        As an example, see the following:

    Sep  6 14:21:59 lr kernel: TSC clock: at init_TSC 2323045616 Hz
    Sep  6 14:21:59 lr kernel:
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
    Sep  6 14:21:59 lr kernel: AMD
Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
    Sep  6 14:21:59 lr kernel: TSC: P-state invariant
    Sep  6 14:21:59 lr kernel: TSC clock: at cpu_startup_end 2323056910 Hz
    Sep  6 14:21:59 lr kernel: TSC clock: at acpi_timer_probe 2311254060 Hz
    Sep  6 14:21:59 lr kernel: TSC clock: at acpi_timer_probe_2 2311191310 Hz
    Sep  6 14:21:59 lr kernel: TSC clock: at pn_probe_start 2300822648 Hz
    Sep  6 14:21:59 lr kernel: TSC clock: at pn_attach_start 2300830946 Hz
    Sep  6 14:21:59 lr kernel: TSC clock: at pn_probe_start 2300840133 Hz
    Sep  6 14:21:59 lr kernel: TSC clock: at pn_attach_start 2300835253 Hz
    Sep  6 14:21:59 lr kernel: TSC clock: at lapic_setup_clock 2300868376 Hz

        The latter values are close to what is reported when HZ=1000.

[Note 4] DELAY() uses the TSC when available, but perhaps for
        larger intervals (say above 100ms) it might be better
        to always use the 8254 which at least does not change frequency
        over time ?

[Note 5] Linux has some high precision timers which one could expect
    to use for high precision delays. In fact, I ran the test
    program on a recent Linux 2.6.30 (using HZ=1000), and the
    usleep() version has an error between 80 and 90us irrespective
    of the delay. However the select() version is much worse,
    and loses approx 1ms per second.

------------------------------------------------------

Peter Wemm

2009-Sep-07 01:08 UTC

head link

incorrect usleep/select delays with HZ > 2500

On Sun, Sep 6, 2009 at 8:51 AM, Luigi Rizzo<rizzo@iet.unipi.it>
wrote:> (this problem seems to affect both current and -stable,
> so let's see if here i have better luck)
>
> I just noticed [Note 1,2] that when setting HZ > 2500 (even if it is
> an exact divisor of the APIC/CPU clock) there is a significant
> drift between the delays generated by usleep()/select() and those
> computed by gettimeofday(). ?In other words, the error grows with
> the amount of delay requested.
>
> To show the problem, try this function
>
> ? ? ? ?int f(int wait_time) { ?// wait_time in usec
> ? ? ? ? ? ? ? ?struct timeval start, end;
> ? ? ? ? ? ? ? ?gettimeofday(&start);
> ? ? ? ? ? ? ? ?usleep(w); ? ? ?// or try select
> ? ? ? ? ? ? ? ?gettimeofday(&end)
> ? ? ? ? ? ? ? ?timersub(&end, &start, &x);
> ? ? ? ? ? ? ? ?return = x.tv_usec + 1000000*x.tv_sec - wait_time;
> ? ? ? ?}
>
> for various HZ (kern.hz=NNNN in /boot/loader.conf) and wait times.
> Ideally, we would expect the timings to be in error by something
> between 0 and 1 (or 2) ticks, irrespective of the value of wait_time.
> In fact, this is what you see with HZ=1000, 2000 and 2500.
> But larger values of HZ (e.g. 4000, 5000, 10k, 40k) create
> a drift of 0.5% and above (i.e. with HZ=5000, a 1-second delay
> lasts 1.0064s and a 10s delay lasts 10.062s; with HZ=10k the
> error becomes 1% and at HZ=40k the error becomes even bigger.
Technically, it isn't even an error because the sleeps are defined as
'at least' the value specified.  If you're looking for real-time-OS
level performance, you probably need to look at one.
> Note that with the fixes described below, even HZ=40k works perfectly well.
>
> Turns out that the error has three components (described with
> possible fixes):
>
> 1. ?CAUSE: Errors in the measurement of the TSC (and APIC) frequencies,
> ? ? ? ?see [Note 3] for more details. This is responsible for the drift.
> ? ?FIX: It can be removed by rounding the measurement to the closest
> ? ? ? ?nominal values (e.g. my APIC runs at 100 MHz; we can use a
> ? ? ? ?table of supported values). Otherwise, see [Note 4]
> ? ?PROBLEM: is this general enough ?
>
> 2. ?CAUSE: Use of approximate kernel time functions
(getnanotime/getmicrotime)
> ? ? ? ?in nanosleep() and select(). This imposes an error of max(1tick,
1ms)
> ? ? ? ?in the computation of delays, irrespective of HZ values.
> ? ? ? ?BTW For reasons I don't understand this seems to affect
> ? ? ? ?nanosleep() more than select().
> ? ?FIX: It can be reduced to just 1 tick making kern.timecounter.tick
writable
> ? ? ? ?and letting the user set it to 1 if high precision is required.
> ? ?PROBLEM: none that i see.
>
> 3. ?CAUSE an error in tvtohz(), reported long ago in
> ? ? ? ?http://www.dragonflybsd.org/presentations/nanosleep/
> ? ? ? ?which causes a systematic error of an extra tick in the computation
> ? ? ? ?of the sleep times.
> ? ?FIX: the above link also contains a proposed fix (which in fact
> ? ? ? ?reverts a bug introduced in some old commit on FreeBSD)
> ? ?PROBLEM: none that i see.
This change, as-is, is extremely dangerous.  tsleep/msleep() use a
value of 0 meaning 'forever'.  Changing tvtohz() so that it can now
return 0 for a non-zero tv is setting land-mines all over the place.
There's something like 27 callers of tvtohz() in sys/kern alone, some
of which are used to supply tsleep/msleep() timeouts.  Note that the
dragonflybsd patch above only adds the 'returns 0' check to one single
caller.  You either need to patch all callers of tvtohz() since you've
change the semantics, or add a 'if (ticks == 0) ticks = 1' check (or
checks) in the appropriate places inside tvtohz().

If you don't do it, then you end up with callers of random functions
with very small timeouts instead finding themselves sleeping forever.
> Applying these three fixes i was able to run a kernel with HZ=40000
> and see timing errors within 80-90us even with ten second delays.
> This would put us on par with Linux [Note 5].
> This is a significant improvement over the current situation
> and the reason why I would like to explore the possibility of applying
> some of these fixes.
>
> I know there are open problems -- e.g. when the timer source used
> by gettimeofday() gets corrected by ntp or other things, hardclock()
> still ticks at the same rate so you'll see a drift if you don't
apply
> corrections there as well. Similarly, if HZ is not an exact
> divisor of the clock source used by gettimeofday(), i suppose
> errors will accumulate as well. However fixing these other
> drift seems reasonably orthogonal at least to #2 and #3 above, and
> a lot more difficult so we could at least start from these simple
> fixes.
>
>
> Would anyone be interested in reproducing the experiment (test program
> attached -- run it with 'lat -p -i wait_time_in_microseconds')
> and try to explain me what changes the system's behaviour above HZ=2500
?
>
> ? ? ? ?cheers
> ? ? ? ?luigi
>
> Notes:
>
> [Note 1] I have some interest in running machines with high HZ values
> ? ?because this gives better precision to dummynet and various
> ? ?other tasks with soft timing constraints.
>
> [Note 2] I have seen the same phenomenon on the following platform:
> ? ? ? ?RELENG_8/amd64 with AMD BE-2400 dual core cpu
> ? ? ? ?RELENG_7/i386 with AMD BE-2400 dual core cpu
> ? ? ? ?RELENG_7/i386 with Intel Centrino single core (Dell X1 Laptop)
>
>
> [Note 3] the TSC frequency is computed reading the tsc around a
> ? ? ? ?call to DELAY(1000000) and assuming that the i8254 runs
> ? ? ? ?at the nominal rate, 1.193182 MHz.
> ? ? ? ?From tests I have made, the measurement in init_TSC() returns
> ? ? ? ?a large error when HZ is large, whereas repeating the measurement
> ? ? ? ?at a later time returns a much more reliable value.
> ? ? ? ?As an example, see the following:
>
> ? ?Sep ?6 14:21:59 lr kernel: TSC clock: at init_TSC 2323045616 Hz
> ? ?Sep ?6 14:21:59 lr kernel:
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
> ? ?Sep ?6 14:21:59 lr kernel: AMD
Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
> ? ?Sep ?6 14:21:59 lr kernel: TSC: P-state invariant
> ? ?Sep ?6 14:21:59 lr kernel: TSC clock: at cpu_startup_end 2323056910 Hz
> ? ?Sep ?6 14:21:59 lr kernel: TSC clock: at acpi_timer_probe 2311254060 Hz
> ? ?Sep ?6 14:21:59 lr kernel: TSC clock: at acpi_timer_probe_2 2311191310
Hz
> ? ?Sep ?6 14:21:59 lr kernel: TSC clock: at pn_probe_start 2300822648 Hz
> ? ?Sep ?6 14:21:59 lr kernel: TSC clock: at pn_attach_start 2300830946 Hz
> ? ?Sep ?6 14:21:59 lr kernel: TSC clock: at pn_probe_start 2300840133 Hz
> ? ?Sep ?6 14:21:59 lr kernel: TSC clock: at pn_attach_start 2300835253 Hz
> ? ?Sep ?6 14:21:59 lr kernel: TSC clock: at lapic_setup_clock 2300868376 Hz
>
> ? ? ? ?The latter values are close to what is reported when HZ=1000.
>
> [Note 4] DELAY() uses the TSC when available, but perhaps for
> ? ? ? ?larger intervals (say above 100ms) it might be better
> ? ? ? ?to always use the 8254 which at least does not change frequency
> ? ? ? ?over time ?
>
> [Note 5] Linux has some high precision timers which one could expect
> ? ?to use for high precision delays. In fact, I ran the test
> ? ?program on a recent Linux 2.6.30 (using HZ=1000), and the
> ? ?usleep() version has an error between 80 and 90us irrespective
> ? ?of the delay. However the select() version is much worse,
> ? ?and loses approx 1ms per second.
>
> ------------------------------------------------------
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"
>



-- 
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell

John Baldwin

2009-Sep-08 18:00 UTC

head link

incorrect usleep/select delays with HZ > 2500

On Sunday 06 September 2009 11:51:54 am Luigi Rizzo
wrote:> [Note 3] the TSC frequency is computed reading the tsc around a
>         call to DELAY(1000000) and assuming that the i8254 runs
>         at the nominal rate, 1.193182 MHz.
>         From tests I have made, the measurement in init_TSC() returns
>         a large error when HZ is large, whereas repeating the measurement
>         at a later time returns a much more reliable value.
>         As an example, see the following:
> 
>     Sep  6 14:21:59 lr kernel: TSC clock: at init_TSC 2323045616 Hz
>     Sep  6 14:21:59 lr kernel: 
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>>     Sep  6 14:21:59 lr kernel: AMD 
Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>>     Sep  6 14:21:59 lr kernel: TSC: P-state invariant
>     Sep  6 14:21:59 lr kernel: TSC clock: at cpu_startup_end 2323056910 Hz
>     Sep  6 14:21:59 lr kernel: TSC clock: at acpi_timer_probe 2311254060 Hz
>     Sep  6 14:21:59 lr kernel: TSC clock: at acpi_timer_probe_2 2311191310 
Hz>     Sep  6 14:21:59 lr kernel: TSC clock: at pn_probe_start 2300822648 Hz
>     Sep  6 14:21:59 lr kernel: TSC clock: at pn_attach_start 2300830946 Hz
>     Sep  6 14:21:59 lr kernel: TSC clock: at pn_probe_start 2300840133 Hz
>     Sep  6 14:21:59 lr kernel: TSC clock: at pn_attach_start 2300835253 Hz
>     Sep  6 14:21:59 lr kernel: TSC clock: at lapic_setup_clock 2300868376
Hz
> 
>         The latter values are close to what is reported when HZ=1000.
Try disabling legacy USB support in the BIOS to see if an SMI# is firing 
during the DELAY() causing the TSC freq to be too high.  I have seen the USB 
legacy support cause this in other machines.
 
-- 
John Baldwin

Luigi Rizzo

2009-Sep-08 20:55 UTC

head link

incorrect usleep/select delays with HZ > 2500

On Tue, Sep 08, 2009 at 01:08:23PM -0400, John Baldwin
wrote:> On Sunday 06 September 2009 11:51:54 am Luigi Rizzo wrote:
> > [Note 3] the TSC frequency is computed reading the tsc around a
> >         call to DELAY(1000000) and assuming that the i8254 runs
> >         at the nominal rate, 1.193182 MHz.
> >         From tests I have made, the measurement in init_TSC() returns
> >         a large error when HZ is large, whereas repeating the
measurement
> >         at a later time returns a much more reliable value.
> >         As an example, see the following:
> > 
> >     Sep  6 14:21:59 lr kernel: TSC clock: at init_TSC 2323045616 Hz
> >     Sep  6 14:21:59 lr kernel: 
>
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
> >     Sep  6 14:21:59 lr kernel: AMD 
> Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
> >     Sep  6 14:21:59 lr kernel: TSC: P-state invariant
> >     Sep  6 14:21:59 lr kernel: TSC clock: at cpu_startup_end
2323056910 Hz
> >     Sep  6 14:21:59 lr kernel: TSC clock: at acpi_timer_probe
2311254060 Hz
> >     Sep  6 14:21:59 lr kernel: TSC clock: at acpi_timer_probe_2
2311191310
> Hz
> >     Sep  6 14:21:59 lr kernel: TSC clock: at pn_probe_start 2300822648
Hz
> >     Sep  6 14:21:59 lr kernel: TSC clock: at pn_attach_start
2300830946 Hz
> >     Sep  6 14:21:59 lr kernel: TSC clock: at pn_probe_start 2300840133
Hz
> >     Sep  6 14:21:59 lr kernel: TSC clock: at pn_attach_start
2300835253 Hz
> >     Sep  6 14:21:59 lr kernel: TSC clock: at lapic_setup_clock
2300868376 Hz
> > 
> >         The latter values are close to what is reported when HZ=1000.
> 
> Try disabling legacy USB support in the BIOS to see if an SMI# is firing 
> during the DELAY() causing the TSC freq to be too high.  I have seen the
USB
> legacy support cause this in other machines.
Thanks, will try tomorrow.
Would this explain the measurement becomes better as we get
further into the initialization, and why high HZ values affect
the measurement ?

	cheers
	luigi

John Baldwin

2009-Sep-09 18:24 UTC

head link

incorrect usleep/select delays with HZ > 2500

On Tuesday 08 September 2009 5:01:00 pm Luigi Rizzo
wrote:> On Tue, Sep 08, 2009 at 01:08:23PM -0400, John Baldwin wrote:
> > On Sunday 06 September 2009 11:51:54 am Luigi Rizzo wrote:
> > > [Note 3] the TSC frequency is computed reading the tsc around a
> > >         call to DELAY(1000000) and assuming that the i8254 runs
> > >         at the nominal rate, 1.193182 MHz.
> > >         From tests I have made, the measurement in init_TSC()
returns
> > >         a large error when HZ is large, whereas repeating the 
measurement> > >         at a later time returns a much more reliable value.
> > >         As an example, see the following:
> > > 
> > >     Sep  6 14:21:59 lr kernel: TSC clock: at init_TSC 2323045616
Hz
> > >     Sep  6 14:21:59 lr kernel: 
> > 
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>> > >     Sep  6 14:21:59 lr kernel: AMD 
> >
Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
> > >     Sep  6 14:21:59 lr kernel: TSC: P-state invariant
> > >     Sep  6 14:21:59 lr kernel: TSC clock: at cpu_startup_end
2323056910
Hz> > >     Sep  6 14:21:59 lr kernel: TSC clock: at acpi_timer_probe
2311254060
Hz> > >     Sep  6 14:21:59 lr kernel: TSC clock: at acpi_timer_probe_2 
2311191310 > > Hz
> > >     Sep  6 14:21:59 lr kernel: TSC clock: at pn_probe_start
2300822648
Hz> > >     Sep  6 14:21:59 lr kernel: TSC clock: at pn_attach_start
2300830946
Hz> > >     Sep  6 14:21:59 lr kernel: TSC clock: at pn_probe_start
2300840133
Hz> > >     Sep  6 14:21:59 lr kernel: TSC clock: at pn_attach_start
2300835253
Hz> > >     Sep  6 14:21:59 lr kernel: TSC clock: at lapic_setup_clock 
2300868376 Hz> > > 
> > >         The latter values are close to what is reported when
HZ=1000.
> > 
> > Try disabling legacy USB support in the BIOS to see if an SMI# is
firing
> > during the DELAY() causing the TSC freq to be too high.  I have seen
the
USB > > legacy support cause this in other machines.
> 
> Thanks, will try tomorrow.
> Would this explain the measurement becomes better as we get
> further into the initialization, and why high HZ values affect
> the measurement ?
It would explain why it gets better later since the uhci(4), ohci(4) and 
ehci(4) drivers disable the SMI# interrupts while attaching to the 
controllers.

-- 
John Baldwin

Luigi Rizzo

2009-Sep-09 20:26 UTC

head link

incorrect usleep/select delays with HZ > 2500

On Wed, Sep 09, 2009 at 01:24:53PM -0400, John Baldwin
wrote:> On Tuesday 08 September 2009 5:01:00 pm Luigi Rizzo wrote:
> > On Tue, Sep 08, 2009 at 01:08:23PM -0400, John Baldwin wrote:
> > > On Sunday 06 September 2009 11:51:54 am Luigi Rizzo wrote:
> > > > [Note 3] the TSC frequency is computed reading the tsc
around a
> > > >         call to DELAY(1000000) and assuming that the i8254
runs
> > > >         at the nominal rate, 1.193182 MHz.
> > > >         From tests I have made, the measurement in
init_TSC() returns
> > > >         a large error when HZ is large, whereas repeating
the
> measurement
> > > >         at a later time returns a much more reliable value.
...> > > Try disabling legacy USB support in the BIOS to see if an SMI# is
firing
> > > during the DELAY() causing the TSC freq to be too high.  I have
seen the
> USB 
> > > legacy support cause this in other machines.
> > 
> > Thanks, will try tomorrow.
> > Would this explain the measurement becomes better as we get
> > further into the initialization, and why high HZ values affect
> > the measurement ?
> 
> It would explain why it gets better later since the uhci(4), ohci(4) and 
> ehci(4) drivers disable the SMI# interrupts while attaching to the 
> controllers.
ok makes sense -- if the SMI interrupts last longer than 1/HZ
they will cause us to miss one or more wraps of the i8254 in
$ARCH/isa/clock.c::DELAY(). I suppose with a few measurements
at different HZ values and some back of the envelope calculations
one could even determine estimate the frequency and duration
of those SMI interrupts!

thanks
luigi

John Baldwin

2009-Sep-09 20:42 UTC

head link

incorrect usleep/select delays with HZ > 2500

On Wednesday 09 September 2009 4:32:16 pm Luigi Rizzo
wrote:> On Wed, Sep 09, 2009 at 01:24:53PM -0400, John Baldwin wrote:
> > On Tuesday 08 September 2009 5:01:00 pm Luigi Rizzo wrote:
> > > On Tue, Sep 08, 2009 at 01:08:23PM -0400, John Baldwin wrote:
> > > > On Sunday 06 September 2009 11:51:54 am Luigi Rizzo wrote:
> > > > > [Note 3] the TSC frequency is computed reading the tsc
around a
> > > > >         call to DELAY(1000000) and assuming that the
i8254 runs
> > > > >         at the nominal rate, 1.193182 MHz.
> > > > >         From tests I have made, the measurement in
init_TSC()
returns> > > > >         a large error when HZ is large, whereas
repeating the
> > measurement
> > > > >         at a later time returns a much more reliable
value.
> ...
> > > > Try disabling legacy USB support in the BIOS to see if an
SMI# is
firing > > > > during the DELAY() causing the TSC freq to be too high.  I
have seen
the > > USB 
> > > > legacy support cause this in other machines.
> > > 
> > > Thanks, will try tomorrow.
> > > Would this explain the measurement becomes better as we get
> > > further into the initialization, and why high HZ values affect
> > > the measurement ?
> > 
> > It would explain why it gets better later since the uhci(4), ohci(4)
and
> > ehci(4) drivers disable the SMI# interrupts while attaching to the 
> > controllers.
> 
> ok makes sense -- if the SMI interrupts last longer than 1/HZ
> they will cause us to miss one or more wraps of the i8254 in
> $ARCH/isa/clock.c::DELAY(). I suppose with a few measurements
> at different HZ values and some back of the envelope calculations
> one could even determine estimate the frequency and duration
> of those SMI interrupts!
On recent motherboards I have seen the SMI# interrupt fire every 250 ms with 
execution times ranging from 50 us to 1ms for the legacy USB interrupt 
handler.  We consistently see the TSC frequency miscalculated on the 
motherboards with the 1ms duration interrupt.  I suspect that the clock that 
drives the periodic SMI# interrupt is tied to the i8254 meaning that it often 
fires at the same time that the i8254 wraps causing the TSC frequency to 
often be wrong.

-- 
John Baldwin

freebsd stable - Sep 2009 - incorrect usleep/select delays with HZ > 2500

incorrect usleep/select delays with HZ > 2500

incorrect usleep/select delays with HZ > 2500

incorrect usleep/select delays with HZ > 2500

incorrect usleep/select delays with HZ > 2500

incorrect usleep/select delays with HZ > 2500

incorrect usleep/select delays with HZ > 2500

incorrect usleep/select delays with HZ > 2500