Gerrit Kühn
2018-Nov-26 08:46 UTC
high cpu irq load and slow boot after update from 10.4 to 11.2
Hi all, A couple of weeks ago, I updated an older storage server (2 CPUs, 4 cores each, 48GB RAM, 36x4GB HDDs, 3 LSI-based mps controllers) from 10.4 to 11.2. The first thing I noticed was that booting takes much longer now. The system probes each HDD (there are 36 of them, attached to mps controllers) very slowly multiple times (I can see the light of each disk blinking, it takes seconds to go on to the next disk), the whole process takes several minutes (was much faster before). A more nasty issue appears after a couple of weeks of operation (so far, roughly between 15 and 30 days): Suddenly there is a very high irq load on one of the CPU cores (cpu<n>:timer), causing high system load and high cpu load (top easily shows average load over 10, whereas it was always below 1 before). I cannot find any process or device as a culprit. First I thought this problem can only be made to go away by rebooting, but now I managed to get rid of it (at least for some time, don't know if or when it will be back) while checking out the latest source in background (I actually intended to fiddle with some kernel settings, but suddenly the issue was gone after persisting permanently over the weekend), causing. Looking around, I found a couple of vaguely similar reports (like https://lists.freebsd.org/pipermail/freebsd-current/2017-January/064419.html), but these all appear to be fixed by now. I have a couple of other storage machines (mostly mps-based, but always slightly different hardware) that show no such issue after updating to 11.2. Any ideas? cu Gerrit
Eugene Grosbein
2018-Nov-26 12:34 UTC
high cpu irq load and slow boot after update from 10.4 to 11.2
26.11.2018 15:46, Gerrit K?hn wrote:> A couple of weeks ago, I updated an older storage server (2 CPUs, 4 cores > each, 48GB RAM, 36x4GB HDDs, 3 LSI-based mps controllers) from 10.4 to > 11.2. The first thing I noticed was that booting takes much longer now. The > system probes each HDD (there are 36 of them, attached to mps controllers) > very slowly multiple times (I can see the light of each disk blinking, > it takes seconds to go on to the next disk), the whole process takes > several minutes (was much faster before). > > A more nasty issue appears after a couple of weeks of operation (so far, > roughly between 15 and 30 days): > Suddenly there is a very high irq load on one of the CPU cores > (cpu<n>:timer), causing high system load and high cpu load (top easily > shows average load over 10, whereas it was always below 1 before). I cannot > find any process or device as a culprit. First I thought this problem can > only be made to go away by rebooting, but now I managed to get rid of it > (at least for some time, don't know if or when it will be back) while > checking out the latest source in background (I actually intended to fiddle > with some kernel settings, but suddenly the issue was gone after > persisting permanently over the weekend), causing. > > Looking around, I found a couple of vaguely similar reports (like > https://lists.freebsd.org/pipermail/freebsd-current/2017-January/064419.html), > but these all appear to be fixed by now. > I have a couple of other storage machines (mostly mps-based, but always > slightly different hardware) that show no such issue after updating to > 11.2. > > Any ideas?Maybe this box has some clocking problems incompatible with tickless kernel. Try get back to old periodic ticking with sysctl kern.eventtimer.periodic=1 instead of now default 0. Of, if you are curious, run ntpd if it is not already running, wait about an hour then look to its /var/db/ntpd.drift file to see if system clock is good or not. Perhaps, you can get better behaviour changing default value of kern.timecounter.hardware to another one from kern.timecounter.choice; same with kern.eventtimer.timer and kern.eventtimer.choice