Its been looking like this pretty much all day ... top shows nothing major, and the drive looks reaonably quiet ... there is nothing in messages to indicate a problem that I can see (even those enclosure messages have been reasonably quiet) ... What consumes SYS CPU? Stuff like apache and jakarta-tomcat use up USER CPU, correct? neptune# iostat 5 tty aacd0 pass0 pass1 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 1 125 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 13 0 48 0 39 0 108 17.39 88 1.49 0.00 0 0.00 0.00 0 0.00 16 0 83 0 0 0 23 17.79 23 0.39 0.00 0 0.00 0.00 0 0.00 6 0 92 0 1 0 294 15.99 28 0.43 0.00 0 0.00 0.00 0 0.00 8 0 92 0 0 0 35 17.56 156 2.67 0.00 0 0.00 0.00 0 0.00 18 0 82 0 0 0 1048 15.86 68 1.05 0.00 0 0.00 0.00 0 0.00 22 0 77 0 0 3 981 14.63 79 1.13 0.00 0 0.00 0.00 0 0.00 10 0 89 1 0 0 353 14.85 77 1.12 0.00 0 0.00 0.00 0 0.00 18 0 81 0 0 0 728 14.75 31 0.45 0.00 0 0.00 0.00 0 0.00 8 0 91 0 0 1 669 13.93 11 0.15 0.00 0 0.00 0.00 0 0.00 11 0 89 0 0 0 15 10.42 4 0.04 0.00 0 0.00 0.00 0 0.00 6 0 93 0 0 0 15 13.42 31 0.40 0.00 0 0.00 0.00 0 0.00 3 0 97 0 0 0 15 8.52 25 0.21 0.00 0 0.00 0.00 0 0.00 3 0 97 0 0 5 686 15.71 64 0.98 0.00 0 0.00 0.00 0 0.00 9 0 91 0 0 5 463 8.94 10 0.09 0.00 0 0.00 0.00 0 0.00 6 0 93 0 0 5 24 10.67 3 0.03 0.00 0 0.00 0.00 0 0.00 14 0 85 1 0 6 41 16.95 128 2.12 0.00 0 0.00 0.00 0 0.00 9 0 90 0 0 8 274 16.14 114 1.80 0.00 0 0.00 0.00 0 0.00 7 0 93 0 0 8 33 15.13 8 0.11 0.00 0 0.00 0.00 0 0.00 11 0 89 0 0 3 20 15.65 33 0.51 0.00 0 0.00 0.00 0 0.00 4 0 96 0 0 doing a ps, the 'server processes' don't look to be consuming much CPU, other then the vmdaemon/syncer (that is 18 and 9 hrs respectively, right?) USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 0 0.0 0.0 0 0 ?? DLs Thu07AM 0:45.80 (swapper) root 1 0.0 0.0 552 196 ?? SLs Thu07AM 0:03.37 /sbin/init -- root 2 0.0 0.0 0 0 ?? DL Thu07AM 0:00.26 (aac0aif) root 3 0.0 0.0 0 0 ?? DL Thu07AM 0:19.50 (pagedaemon) root 4 0.4 0.0 0 0 ?? DL Thu07AM 18:27.81 (vmdaemon) root 5 0.0 0.0 0 0 ?? DL Thu07AM 0:01.62 (bufdaemon) root 6 0.0 0.0 0 0 ?? DL Thu07AM 0:01.03 (vnlru) root 7 0.0 0.0 0 0 ?? DL Thu07AM 9:03.81 (syncer) root 28 0.0 0.0 212 0 ?? IWs - 0:00.00 adjkerntz -i root 134 0.0 0.0 952 528 ?? Ss Thu10AM 0:06.17 /usr/sbin/sysl root 138 0.0 0.1 6304 5608 ?? Ss Thu10AM 0:51.81 /usr/sbin/name daemon 140 0.0 0.0 988 436 ?? Ss Thu10AM 0:02.75 /usr/sbin/port root 142 0.0 0.0 572 356 ?? Ss Thu10AM 0:02.76 mountd -r root 144 0.0 0.0 368 0 ?? IWs - 0:00.00 nfsd: master ( root 145 0.0 0.0 360 0 ?? IW - 0:00.00 nfsd: server ( root 146 0.0 0.0 360 0 ?? IW - 0:00.00 nfsd: server ( root 148 0.0 0.0 360 0 ?? IW - 0:00.00 nfsd: server ( root 149 0.0 0.0 360 0 ?? IW - 0:00.00 nfsd: server ( root 150 0.0 0.0 263088 0 ?? Ss Thu10AM 0:02.73 rpc.statd daemon 152 0.0 0.0 908 0 ?? IWs - 0:00.00 rwhod root 158 0.0 0.0 1024 0 ?? IWs - 0:00.00 /usr/sbin/cron root 160 0.0 0.0 2592 928 ?? Ss Thu10AM 0:03.87 /usr/sbin/sshd Right now, kvm and vnodes are looking like: vm.kvm_free: 809500672 - debug.numvnodes: 414708 - debug.freevnodes: 95815 which is pretty much standard for my servers ... and there are only ~1200 processes running on this one, which is a Dual PIII with 4Gb of RAM ... Starting commands seems to take a long time ... top takes forever, and pstat -s shows: neptune# time pstat -s Device 1K-blocks Used Avail Capacity Type /dev/aacd0s1b 8388480 119916 8268564 1% Interleaved 0.245u 3.670s 0:27.39 14.2% 16+218k 0+0io 0pf+0w And not much swap is being used for the # of processes .. What else should I be looking at? :(
Marc G. Fournier wrote:> Its been looking like this pretty much all day ... top shows nothing > major, and the drive looks reaonably quiet ... there is nothing in > messages to indicate a problem that I can see (even those enclosure > messages have been reasonably quiet) ... > > What consumes SYS CPU? Stuff like apache and jakarta-tomcat use up USER > CPU, correct?Stuff like the in-kernel network stack, swapping, etc. You don't show a top listing ... it might be useful to get a few caps of top. You can watch the swap line to see what the in/out swapping is like. My first guess is that, although the system isn't using much swap, it's constantly swapping in/out. This results in a lot of disk activity more system time on the processor. It's also possible (since you have nfs daemons running) that your disks are simply saturated. vmstat would help diagnose this (actually, I prefer the vmstat screen on systat, because it updates) You don't mention much about the disk subsystem or the usage of the machine, so it's a little hard to make guesses at the problem ... but those are mine so far. If the vmstat output for your drives shows continually heavy activity, you may have to move to faster drives (i.e. high-end SCSI) or break this machine's workload up or use a form of RAID that's well suited to the type of load you've got (different RAID types perform differently depending on whether you have a lot of read traffic only or r/w or whatever) Hope this helps.> neptune# iostat 5 > tty aacd0 pass0 pass1 cpu > tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id > 1 125 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 13 0 48 0 39 > 0 108 17.39 88 1.49 0.00 0 0.00 0.00 0 0.00 16 0 83 0 0 > 0 23 17.79 23 0.39 0.00 0 0.00 0.00 0 0.00 6 0 92 0 1 > 0 294 15.99 28 0.43 0.00 0 0.00 0.00 0 0.00 8 0 92 0 0 > 0 35 17.56 156 2.67 0.00 0 0.00 0.00 0 0.00 18 0 82 0 0 > 0 1048 15.86 68 1.05 0.00 0 0.00 0.00 0 0.00 22 0 77 0 0 > 3 981 14.63 79 1.13 0.00 0 0.00 0.00 0 0.00 10 0 89 1 0 > 0 353 14.85 77 1.12 0.00 0 0.00 0.00 0 0.00 18 0 81 0 0 > 0 728 14.75 31 0.45 0.00 0 0.00 0.00 0 0.00 8 0 91 0 0 > 1 669 13.93 11 0.15 0.00 0 0.00 0.00 0 0.00 11 0 89 0 0 > 0 15 10.42 4 0.04 0.00 0 0.00 0.00 0 0.00 6 0 93 0 0 > 0 15 13.42 31 0.40 0.00 0 0.00 0.00 0 0.00 3 0 97 0 0 > 0 15 8.52 25 0.21 0.00 0 0.00 0.00 0 0.00 3 0 97 0 0 > 5 686 15.71 64 0.98 0.00 0 0.00 0.00 0 0.00 9 0 91 0 0 > 5 463 8.94 10 0.09 0.00 0 0.00 0.00 0 0.00 6 0 93 0 0 > 5 24 10.67 3 0.03 0.00 0 0.00 0.00 0 0.00 14 0 85 1 0 > 6 41 16.95 128 2.12 0.00 0 0.00 0.00 0 0.00 9 0 90 0 0 > 8 274 16.14 114 1.80 0.00 0 0.00 0.00 0 0.00 7 0 93 0 0 > 8 33 15.13 8 0.11 0.00 0 0.00 0.00 0 0.00 11 0 89 0 0 > 3 20 15.65 33 0.51 0.00 0 0.00 0.00 0 0.00 4 0 96 0 0 > > doing a ps, the 'server processes' don't look to be consuming much CPU, > other then the vmdaemon/syncer (that is 18 and 9 hrs respectively, right?) > > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND > root 0 0.0 0.0 0 0 ?? DLs Thu07AM 0:45.80 (swapper) > root 1 0.0 0.0 552 196 ?? SLs Thu07AM 0:03.37 /sbin/init -- > root 2 0.0 0.0 0 0 ?? DL Thu07AM 0:00.26 (aac0aif) > root 3 0.0 0.0 0 0 ?? DL Thu07AM 0:19.50 (pagedaemon) > root 4 0.4 0.0 0 0 ?? DL Thu07AM 18:27.81 (vmdaemon) > root 5 0.0 0.0 0 0 ?? DL Thu07AM 0:01.62 (bufdaemon) > root 6 0.0 0.0 0 0 ?? DL Thu07AM 0:01.03 (vnlru) > root 7 0.0 0.0 0 0 ?? DL Thu07AM 9:03.81 (syncer) > root 28 0.0 0.0 212 0 ?? IWs - 0:00.00 adjkerntz -i > root 134 0.0 0.0 952 528 ?? Ss Thu10AM 0:06.17 /usr/sbin/sysl > root 138 0.0 0.1 6304 5608 ?? Ss Thu10AM 0:51.81 /usr/sbin/name > daemon 140 0.0 0.0 988 436 ?? Ss Thu10AM 0:02.75 /usr/sbin/port > root 142 0.0 0.0 572 356 ?? Ss Thu10AM 0:02.76 mountd -r > root 144 0.0 0.0 368 0 ?? IWs - 0:00.00 nfsd: master ( > root 145 0.0 0.0 360 0 ?? IW - 0:00.00 nfsd: server ( > root 146 0.0 0.0 360 0 ?? IW - 0:00.00 nfsd: server ( > root 148 0.0 0.0 360 0 ?? IW - 0:00.00 nfsd: server ( > root 149 0.0 0.0 360 0 ?? IW - 0:00.00 nfsd: server ( > root 150 0.0 0.0 263088 0 ?? Ss Thu10AM 0:02.73 rpc.statd > daemon 152 0.0 0.0 908 0 ?? IWs - 0:00.00 rwhod > root 158 0.0 0.0 1024 0 ?? IWs - 0:00.00 /usr/sbin/cron > root 160 0.0 0.0 2592 928 ?? Ss Thu10AM 0:03.87 /usr/sbin/sshd > > Right now, kvm and vnodes are looking like: > > vm.kvm_free: 809500672 - debug.numvnodes: 414708 - debug.freevnodes: 95815 > > which is pretty much standard for my servers ... and there are only ~1200 > processes running on this one, which is a Dual PIII with 4Gb of RAM ... > > Starting commands seems to take a long time ... top takes forever, and > pstat -s shows: > > neptune# time pstat -s > Device 1K-blocks Used Avail Capacity Type > /dev/aacd0s1b 8388480 119916 8268564 1% Interleaved > 0.245u 3.670s 0:27.39 14.2% 16+218k 0+0io 0pf+0w > > And not much swap is being used for the # of processes .. > > What else should I be looking at? :(-- Bill Moran Potential Technologies http://www.potentialtech.com
On Fri, Apr 11, 2003, Marc G. Fournier wrote:> > Its been looking like this pretty much all day ... top shows nothing > major, and the drive looks reaonably quiet ... there is nothing in > messages to indicate a problem that I can see (even those enclosure > messages have been reasonably quiet) ...You can profile the kernel to find out, although at this hour I probably can't recall all the necessary details. I believe you need to say 'make buildkernel CONFIGARGS=-p', then use kgmon(8) and gprof(1) to extract and analyze the data. Based on the ps output you gave, it looks like vmdaemon and syncer are taking up most of the time. This suggests that perhaps there is a hardware or driver problem with disk I/O. Some drivers perform very small delays to wait for the hardware by spinning instead of eating the overhead of a context switch. It could be that one of these drivers has a bug, or the hardware is taking longer than expected to respond.> What consumes SYS CPU? Stuff like apache and jakarta-tomcat use up USER > CPU, correct?Every process consumes system time when it is running in the kernel to process a trap or system call. They use user time when they are running their own (unprivileged) code. Kernel threads such as the pageout daemon and syncer use system time exclusively. time(1) will show you how much system and user time a process uses. Unfortunately, you can't use it on a process that is already running, and tools like 'ps' don't show both numbers (although they can be easily hacked to do so.) For example: > time du >/dev/null 0.074u 0.568s 0:03.36 18.7% 11+285k 1431+0io 0pf+0w This shows that the 'du' command took 74 ms of user time and 568 ms in system calls. If you're not using the C shell or you want more descriptive output, use '/usr/bin/time -l'.