On Wed, 4 Nov 2015 14:31:35 +0000, Tom Evans wrote:
> On Tue, Nov 3, 2015 at 5:47 PM, Ian Smith <smithi at nimnet.asn.au>
wrote:
> > Like yourself, I think this is far from 'cosmetic' as is oft
suggested,
> > especially in some fairly ill-informed forum posts but also various
list
> > posts. I've been watching load averages since OS/2 through
FreeBSD 2.2
> > till present and some Linux systems, and have never before seen
anything
> > like this, especially on virtually entirely idle systems. While the
LAs
> > reported during (say) make -j4 buildworld appear more same, who
knows?
> >
> > I'm not suggesting that I think there's any sort of
performance hit from
> > this; so far I don't, but 'cosmetic' suggests that it
doesn't matter ..
I'm afraid I can't resist being undoubtedly too thorough with this post,
but it does summarise quite a lot of time I've spent this year, and I'm
feeling a little in need of celebrating an apparent win as a result ..
> Have you read mav's explanation on the PR? It certainly seems a valid
> explanation of why it is probably a cosmetic issue - that is, we could
> wake up more to get a truly perfect load average, but why bother?
I've read it over and again. I think it may well explain some relative
'fuzziness' in LAs over perhaps a 0.1 to 0.3 range that I see now since
finding - more truthfully, stumbling upon - and implementing a fix for
the major issue I've had and been working on in bursts over many months.
Being that on recent stable/9, 2.4GHz Core2Duo, ~99.99% idle after boot,
just sshd, powerd, ntpd and hald; showing worst-seen extremes, commonly
seen range and more or less steady at 1, 5, 15 min loadavg respectively:
eventtimer mode 1min 5 min 15min
HPET one-shot 0.4-0.85 0.5-0.75 0.6-0.7
HPET periodic 0.99-1.00 (!) 0.99-0.99 to 0.99
LAPIC one-shot 0.0-0.05 0.0-0.02 0.0-0.0
LAPIC periodic 0.0-0.05 0.0-0.02 0.0-0.0
I run a task that's 100% cpu-bound (well, about 7 I/O reads in the first
few milliseconds) that is configured to run for 15.5 minutes, in bg so I
can watch and take measurements throughout. It uses 100% of one CPU (so
scheduled as ~50% per cpu) for that long. With LAPIC eventtimer, 1min
LAs soon reach 0.99-1.00, with 5min LA getting there in ~15 minutes.
If I run 2 instances, each runs in 15.5 minutes (plus ~5 seconds) and
both cpus show 100% for those tasks, 1min LA soon reaches 1.99-2.00.
If I run 4 instances, on its 2 cpus, 1min LAs soon reach 3.99-4.00, and
5min LAs reach 3.99 after 15-20 minutes. Since 4 CPU-bound tasks want
to run on 2 cpus, each runs for ~15.5m cpu time, so total runtime for
each is ~31 minutes, by which time 15min LA is heading for 4 also.
In other words, load averages shown with LAPIC are pretty much exactly
what one would (and should) expect for 1, 2 or 4 longrunning cpu-bound
tasks, true even in periodic mode, which has almost as few interrupts
and context switches as with HPET (but without its valuable C3 usage)
I should add that with HPET one-shot mode, 2 such tasks report ~2.4 LA,
while 4 tasks report ~4.4 LA, so there's definitely an increment there.
However switching back to HPET soon reverts to 'normal' behaviour, which
over long idles (hours) show 0.66 or 0.67 15min LA as 'average average'.
HPET in periodic mode, which has helped some using 10.x w/ LAPIC, here
only rapidly pins LAs to 0.99-1.00 at idle, before and after more load.
Both of these HPET reported LAs just smelled like bad maths. I spent way
too much time down the (upper parts of) the rabbit hole/s with little to
show; I should have asked more questions but wanted to suss it out ..
> ISTM, the problem reports are of the form "My server is idle and it
is
> reporting a non-idle load average below 1", and not "My server
is
> supposedly idle, but is doing actual work causing the load average to
> be higher than it should be". Is there any evidence that we have the
> latter and not the former?
Er yes, it is the former, not the latter. And not that it's shown as
below 1, as much as it's shown above 0.00 - ie, something > 0.005 - when
yes, genuinely idle below that level, top showing 100.00% idle on both
cpu idle tasks, and say 100.0% and 99.8% idle each from cp_times (-SCHP)
So I was wondering if I really had to post my hundreds of ks of logs and
running reports of the above to properly demonstrate what was happening
here, when I thought I'd try cpuset(1) to pin the abovementioned task
to one cpu, rather than letting the scheduler share it out over both,
and so first up I tried one of the examples (except for 1 of 2 cpus):
# cpuset -l 0 -s 1
And lo! the previously 0.66-ish HPET LAs headed towards 0.00, taking
around 5 minutes to get there. Wow. So I ran some more tests using
abovementioned tasks, and sure enough, LAs as good as with LAPIC.
Then after a while I reenabled CPU1 ..
# cpuset -l 0-1 -s 1
# cpuset -g
pid -1 mask: 0, 1
.. fully expecting LAs to climb back to 'normal' HPET behaviour .. but
they did not! They stayed where they were, basically 0.00 0.00 0.00 in
such idleness, and appropriately like 0.1 - 0.2 with X fired up, idle.
Reboot. Straight away log in as root:
# cpuset -l 0 -s 1
# cpuset -l 0-1 -s 1 # ie: -l "0-`sysctl -n kern.smp.maxid`"
LAs headed back toward 0.00, taking around 5 minutes. Wow again ..
Adding the above cpuset commands to (a new) /etc/rc.local, not even the
sleep between that I thought may be needed. Reboot. 0.00 LAs. Whee!
So, is this going to be a fix that Affects Only Me? I'm keen to hear.
Yes I will update the PR, after I'm quite sure I'm not hallucinating :)
Importantly, will doing that for those using LAPIC eventtimer who had
unexpectedly high LAs make any difference? Perhaps so if it's the first
eventtimer configured at boot that has - in some cases - such problem?
Or perhaps this only works for 9.x? I have no 10.x system currently,
and could easily have missed some commit that picked up on $whatever.
Current hypothesis: some variable/s are getting improperly initialised
at boot, but are (somehow?) getting properly re-initialised on changing
cpuset to 1 then back to 2 cpus - though I've no idea how, or by what.
Again, apologies for a short story writ too long.
cheers, Ian