(Please keep me CC'd as I'm not subscribed to freebsd-stable@) Today I took the liberty of upgrading my main home server from 9.3-STABLE (r268785) to 10.0-STABLE (r268894). The upgrade consisted of doing a fresh install of 10.0-STABLE on a brand new unused SSD. Most everything went as planned, barring a couple ports-related anomalies, and I seemed fairly impressed by the fact that buildworld times had dropped to 27 minutes and buildkernel to 4 minutes with clang (something I'd been avoiding like the plague for a long while). Kudos. But after an hour or so, I noticed a consistent (i.e. reproducible) trend: the system load average tends to hang around 0.10 to 0.15 all the time. There are times where the load drops to 0.03 or 0.04 but then something kicks it back up to 0.15 or 0.20 and then it slowly levels out again (over the course of a few minutes) then repeats. Obviously this is normal behaviour for a system when something is going on periodically. So I figured it might have been a userland process behaving differently under 10.x than 9.x. I let top -a -S -s 1 run and paid very very close attention to it for several minutes. Nothing. It doesn't appear to be something userland -- it appears to be something kernel-level, but nothing in top -S shows up as taking up any CPU time other than "[idle]" so I have no idea what might be doing it. The box isn't doing anything like routing network traffic/NAT, it's pure IPv4 (IPv6 disabled in world and kernel, and my home network does basically no IPv6) and sits idle most of the time fetching mail. It does use ZFS, but not for /, swap, /var, /tmp, or /usr. vmstat -i doesn't particularly show anything awful. All the cpuX:timer entries tend to fluctuate in rate, usually 120-200 or so; I'd expect an interrupt storm to be showing something in the 1000+ range. The only thing I can think of is the fact that the SSD being used has no 4K quirk entry in the kernel (and its ATA IDENTIFY responds with 512 logical, 512 physical, even though we know it's 4K). The partitions are all 1MB-aligned regardless. This is all bare-metal, by the way -- no virtualisation involved. I do have DTrace enabled/built on this box but I have absolutely no clue how to go about profiling things. For example maybe output of this sort would be helpful (but I've no idea how to get it): http://lists.freebsd.org/pipermail/freebsd-stable/2014-July/079276.html I'm certain I didn't see this behaviour in 9.x so I'd be happy to try and track it down if I had a little bit of hand-holding. I've put all the things I can think of that might be relevant to "system config/tuning bits" up here: http://jdc.koitsu.org/freebsd/releng10_perf_issue/ I should note my kernel config is slightly inaccurate (I've removed some stuff from the file in attempt to rebuild, but building world prior to kernel failed due to r268896 breaking world, but anyone subscribed here has already seen the Jenkins job of that ;-) ). Thanks. -- | Jeremy Chadwick jdc at koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB |
Hi, I don't know how to do this with dtrace, but take a look at tools/sched/schedgraph.py and enable KTR to get some trace records. KTR logs the scheduler activity -and- the loadav with it. -a On 19 July 2014 23:24, Jeremy Chadwick <jdc at koitsu.org> wrote:> (Please keep me CC'd as I'm not subscribed to freebsd-stable@) > > Today I took the liberty of upgrading my main home server from > 9.3-STABLE (r268785) to 10.0-STABLE (r268894). The upgrade consisted of > doing a fresh install of 10.0-STABLE on a brand new unused SSD. Most > everything went as planned, barring a couple ports-related anomalies, > and I seemed fairly impressed by the fact that buildworld times had > dropped to 27 minutes and buildkernel to 4 minutes with clang (something > I'd been avoiding like the plague for a long while). Kudos. > > But after an hour or so, I noticed a consistent (i.e. reproducible) > trend: the system load average tends to hang around 0.10 to 0.15 all the > time. There are times where the load drops to 0.03 or 0.04 but then > something kicks it back up to 0.15 or 0.20 and then it slowly levels out > again (over the course of a few minutes) then repeats. > > Obviously this is normal behaviour for a system when something is going > on periodically. So I figured it might have been a userland process > behaving differently under 10.x than 9.x. I let top -a -S -s 1 run and > paid very very close attention to it for several minutes. Nothing. It > doesn't appear to be something userland -- it appears to be something > kernel-level, but nothing in top -S shows up as taking up any CPU time > other than "[idle]" so I have no idea what might be doing it. > > The box isn't doing anything like routing network traffic/NAT, it's pure > IPv4 (IPv6 disabled in world and kernel, and my home network does > basically no IPv6) and sits idle most of the time fetching mail. It > does use ZFS, but not for /, swap, /var, /tmp, or /usr. > > vmstat -i doesn't particularly show anything awful. All the cpuX:timer > entries tend to fluctuate in rate, usually 120-200 or so; I'd expect an > interrupt storm to be showing something in the 1000+ range. > > The only thing I can think of is the fact that the SSD being used has no > 4K quirk entry in the kernel (and its ATA IDENTIFY responds with 512 > logical, 512 physical, even though we know it's 4K). The partitions are > all 1MB-aligned regardless. > > This is all bare-metal, by the way -- no virtualisation involved. > > I do have DTrace enabled/built on this box but I have absolutely no clue > how to go about profiling things. For example maybe output of this sort > would be helpful (but I've no idea how to get it): > > http://lists.freebsd.org/pipermail/freebsd-stable/2014-July/079276.html > > I'm certain I didn't see this behaviour in 9.x so I'd be happy to try > and track it down if I had a little bit of hand-holding. > > I've put all the things I can think of that might be relevant to "system > config/tuning bits" up here: > > http://jdc.koitsu.org/freebsd/releng10_perf_issue/ > > I should note my kernel config is slightly inaccurate (I've removed some > stuff from the file in attempt to rebuild, but building world prior to > kernel failed due to r268896 breaking world, but anyone subscribed here > has already seen the Jenkins job of that ;-) ). > > Thanks. > > -- > | Jeremy Chadwick jdc at koitsu.org | > | UNIX Systems Administrator http://jdc.koitsu.org/ | > | Making life hard for others since 1977. PGP 4BD6C0CB | > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
If you add -H -z to your top command does anything stand out? Regards Steve ----- Original Message ----- From: "Jeremy Chadwick" <jdc at koitsu.org> To: <freebsd-stable at freebsd.org> Sent: Sunday, July 20, 2014 7:24 AM Subject: Consistently "high" CPU load on 10.0-STABLE> (Please keep me CC'd as I'm not subscribed to freebsd-stable@) > > Today I took the liberty of upgrading my main home server from > 9.3-STABLE (r268785) to 10.0-STABLE (r268894). The upgrade consisted of > doing a fresh install of 10.0-STABLE on a brand new unused SSD. Most > everything went as planned, barring a couple ports-related anomalies, > and I seemed fairly impressed by the fact that buildworld times had > dropped to 27 minutes and buildkernel to 4 minutes with clang (something > I'd been avoiding like the plague for a long while). Kudos. > > But after an hour or so, I noticed a consistent (i.e. reproducible) > trend: the system load average tends to hang around 0.10 to 0.15 all the > time. There are times where the load drops to 0.03 or 0.04 but then > something kicks it back up to 0.15 or 0.20 and then it slowly levels out > again (over the course of a few minutes) then repeats. > > Obviously this is normal behaviour for a system when something is going > on periodically. So I figured it might have been a userland process > behaving differently under 10.x than 9.x. I let top -a -S -s 1 run and > paid very very close attention to it for several minutes. Nothing. It > doesn't appear to be something userland -- it appears to be something > kernel-level, but nothing in top -S shows up as taking up any CPU time > other than "[idle]" so I have no idea what might be doing it. > > The box isn't doing anything like routing network traffic/NAT, it's pure > IPv4 (IPv6 disabled in world and kernel, and my home network does > basically no IPv6) and sits idle most of the time fetching mail. It > does use ZFS, but not for /, swap, /var, /tmp, or /usr. > > vmstat -i doesn't particularly show anything awful. All the cpuX:timer > entries tend to fluctuate in rate, usually 120-200 or so; I'd expect an > interrupt storm to be showing something in the 1000+ range. > > The only thing I can think of is the fact that the SSD being used has no > 4K quirk entry in the kernel (and its ATA IDENTIFY responds with 512 > logical, 512 physical, even though we know it's 4K). The partitions are > all 1MB-aligned regardless. > > This is all bare-metal, by the way -- no virtualisation involved. > > I do have DTrace enabled/built on this box but I have absolutely no clue > how to go about profiling things. For example maybe output of this sort > would be helpful (but I've no idea how to get it): > > http://lists.freebsd.org/pipermail/freebsd-stable/2014-July/079276.html > > I'm certain I didn't see this behaviour in 9.x so I'd be happy to try > and track it down if I had a little bit of hand-holding. > > I've put all the things I can think of that might be relevant to "system > config/tuning bits" up here: > > http://jdc.koitsu.org/freebsd/releng10_perf_issue/ > > I should note my kernel config is slightly inaccurate (I've removed some > stuff from the file in attempt to rebuild, but building world prior to > kernel failed due to r268896 breaking world, but anyone subscribed here > has already seen the Jenkins job of that ;-) ). > > Thanks. > > -- > | Jeremy Chadwick jdc at koitsu.org | > | UNIX Systems Administrator http://jdc.koitsu.org/ | > | Making life hard for others since 1977. PGP 4BD6C0CB | > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >
On Sat, Jul 19, 2014 at 11:24 PM, Jeremy Chadwick <jdc at koitsu.org> wrote:> (Please keep me CC'd as I'm not subscribed to freebsd-stable@)[skip]> > I do have DTrace enabled/built on this box but I have absolutely no clue > how to go about profiling things. For example maybe output of this sort > would be helpful (but I've no idea how to get it): > > http://lists.freebsd.org/pipermail/freebsd-stable/2014-July/079276.htmlYou can probably use hotkernel or something similar? http://www.freebsd.org/doc/handbook/dtrace-using.html cheers, Hiren [skip]