Rong-en Fan
2006-May-15 18:15 UTC
6.1-RELEASE, em0 high interrupt rate and nfsd eats lots of cpu
Hi, After upgrading from 5.5-PRERELEASE to 6.1-RELEASE on one nfs server today, I noticed that the load is very high, ranging from 4.x to 30.x, depends how many nfsd I run. From mrtg traffic graph, I did not notice there is high traffic. This box is 2 physical Xeon CPU w/ HTT enabled. Some screen snapshots: input (Total) output packets errs bytes packets errs bytes colls 4593 0 5431035 2122 0 1463331 0 2224 0 2500421 1459 0 1310224 0 1929 0 2210035 1252 0 1165426 0 2381 0 2782648 1724 0 1795611 0 1975 0 2340899 1314 0 1342320 0 2114 0 2537347 1254 0 1195396 0 2050 0 2465473 890 0 611592 0 1482 0 1660772 985 0 898894 0 2002 0 2179834 1900 0 2092382 0 1912 0 2202576 1598 0 1743046 0 2436 0 3051876 1368 0 1345762 0 2759 0 2977552 1346 0 730580 0 systat -vmstat 1: 3 users Load 19.80 14.39 11.08 May 16 02:12 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 138500 6420 220028 10240 237472 count All 810252 9900 1696014k 17272 pages 26 zfod Interrupts Proc:r p d s w Csw Trp Sys Int Sof Flt 2 cow 61756 total 5 18 60 53544 612233728595 12 49 173928 wire 56 52: mpt 153000 act 53: mpt 68.7%Sys 29.9%Intr 1.4%User 0.0%Nice 0.0%Idl 455576 inact 38: ips | | | | | | | | | | 29920 cache 1: atkb ==================================+++++++++++++++> 207552 free 169 4: sio0 daefr 49713 16: em0 Namei Name-cache Dir-cache 38 prcfr 3199 cpu0: time Calls hits % hits % react 2965 cpu3: time 342 333 97 pdwak 2717 cpu1: time pdpgs 2937 cpu2: time Disks ipsd0 da0 da1 pass0 pass1 intrn KB/t 4.00 70.83 22.67 0.00 0.00 113872 buf tps 1 58 10 0 0 698 dirtybuf MB/s 0.00 4.03 0.21 0.00 0.00 100000 desiredvnodes % busy 0 46 18 0 0 25368 numvnodes 17281 freevnodes vmstat -i : interrupt total rate irq52: mpt0 586784 48 irq53: mpt1 12 0 irq38: ips0 74926 6 irq1: atkbd0 2 0 irq4: sio0 20363 1 irq16: em0 100321381 8348 cpu0: timer 23813454 1981 cpu3: timer 22903961 1906 cpu1: timer 21907744 1823 cpu2: timer 22886458 1904 Total 192515085 16021 The high interrupt rate of em0 looks very suspicious. I even saw 30K~90K interrupt in systat -vmstat 1's output. As for top's output: last pid: 21888; load averages: 25.52, 16.86, 12.22 up 0+03:30:42 02:13:06 143 processes: 29 running, 99 sleeping, 2 zombie, 13 waiting CPU states: 0.5% user, 0.0% nice, 66.7% system, 32.8% interrupt, 0.0% idle Mem: 152M Active, 566M Inact, 172M Wired, 29M Cache, 111M Buf, 78M Free Swap: 2048M Total, 100K Used, 2048M PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 15 root 1 -32 -151 0K 8K CPU1 0 48:47 46.83% swi4: cloc 94182 root 1 4 0 1300K 720K RUN 1 11:17 39.31% nfsd 94183 root 1 -4 0 1300K 720K RUN 1 7:15 37.70% nfsd 94186 root 1 -4 0 1300K 720K RUN 0 3:35 30.81% nfsd 17 root 1 -44 -163 0K 8K WAIT 1 32:56 28.71% swi1: net 94185 root 1 -8 0 1300K 720K biowr 1 4:18 28.27% nfsd 94187 root 1 -4 0 1300K 720K RUN 1 3:16 26.42% nfsd 6 root 1 -8 0 0K 8K CPU3 0 18:57 26.03% g_down 94180 root 1 -4 0 1300K 720K RUN 2 4:58 24.85% nfsd 94184 root 1 4 0 1300K 720K RUN 1 2:59 24.76% nfsd 94188 root 1 -4 0 1300K 720K RUN 1 2:39 22.95% nfsd 31 root 1 -68 -187 0K 8K WAIT 0 10:48 20.41% irq16: em0 27 root 1 -64 -183 0K 8K WAIT 1 12:33 15.87% irq52: mpt 21 root 1 -40 -159 0K 8K CPU0 0 8:19 9.18% swi2: camb 40 root 1 -16 0 0K 8K sdflus 1 6:04 5.13% softdepflu The wait channel of nfsd are usually biord, biowd, ufs, RUN, CPUX, and -. The kernel conf is GENERIC without unneeded hardware + ipfw2, FAST_IPSEC, QUOTA (but I don't have any userquota or groupquota in fstab). I also tuned some sysctls: machdep.hyperthreading_allowed=1 vm.kmem_size_max=419430400 vm.kmem_size_scale=2 net.link.ether.inet.log_arp_wrong_iface=0 net.inet.tcp.sendspace=65536 net.inet.tcp.recvspace=65536 net.inet.udp.recvspace=65536 kern.ipc.somaxconn=4096 kern.maxfiles=65535 kern.ipc.shmmax=104857600 kern.ipc.shmall=25600 net.inet.ip.random_id=1 kern.maxvnodes=100000 vfs.read_max=16 kern.cam.da.retry_count=20 kern.cam.da.default_timeout=300 Anything that I can provide to help nail this problem down? Thanks, Rong-En Fan
Dmitriy Kirhlarov
2006-May-15 19:17 UTC
6.1-RELEASE, em0 high interrupt rate and nfsd eats lots of cpu
On Mon, May 15, 2006 at 02:15:08PM -0400, Rong-en Fan wrote:> Hi, > > After upgrading from 5.5-PRERELEASE to 6.1-RELEASE on one > nfs server today, I noticed that the load is very high, ranging from 4.x > to 30.x, depends how many nfsd I run. From mrtg traffic graph, I did > not notice there is high traffic. This box is 2 physical Xeon CPU w/I have same situation today on RC2. One client installing world from nfs share. nfsd eat 91% CPU, load average 6-8. Very small disk activitie. I don't look interrupt rate. I, also, have em0. WBR -- Dmitriy Kirhlarov OILspace, 26 Leninskaya sloboda, bld. 2, 2nd floor, 115280 Moscow, Russia P:+7 495 105 7247 ext.203 F:+7 495 105 7246 E:DmitriyKirhlarov@oilspace.com OILspace - The resource enriched - www.oilspace.com