> > [ns8]# vmstat -i
> > interrupt total rate
> > irq4: sio0 57065 0
> > irq17: em1 3989494045 554
> > irq18: arcmsr0 558098657 77
> > cpu0: timer 14381393929 2000
> > irq256: em0 22763077 3
> > cpu1: timer 14381384902 1999
> > Total 33333191675 4635
> > [ns8]#
> >
> > arcmsr0: <Areca SATA Host Adapter RAID Controller
>
>> > > mem 0xe8600000-0xe8600fff,0xe8000000-0xe83fffff irq 18 at
device
>>
> > 14.0 on pci2
> > ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07
> > ARECA RAID ADAPTER0: FIRMWARE VERSION V1.43 2007-4-17
> > arcmsr0: [ITHREAD]
> > .....
> > Waiting 5 seconds for SCSI devices to settle
> > (probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step
> > da0 at arcmsr0 bus 0 target 0 lun 0
> > da0: <Areca ARC-1210-VOL#00 R001> Fixed Direct Access SCSI-5
device
> > da0: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit)
> > da0: 305175MB (624999424 512 byte sectors: 255H 63S/T 38904C)
> > SMP: AP CPU #1 Launched!
>
Hi and thanks for your reply.
I do not believe the interrupts are the problem at the moment as the
stats. Here is a vmstat when the system usage is spiking and just before
http needs to be killed to get going again most recently.
vmstat -i
interrupt total rate
irq1: atkbd0 2 0
irq4: sio0 22880 0
irq14: ata0 58 0
irq22: uhci1 uhci3 18068 0
irq23: uhci0 uhci+ 1 0
irq26: arcmsr0 496094 14
cpu0: timer 61769334 1791
irq256: em0 1 0
irq258: em2 48505 1
irq259: em3 1 0
cpu1: timer 61762043 1791
cpu3: timer 61299367 1777
cpu2: timer 61299179 1777
cpu4: timer 61326132 1778
cpu7: timer 60845245 1764
cpu5: timer 61326513 1778
cpu6: timer 60845018 1764
Total 491058441 14243
There are no errors en the event console for the areca-cli.
ARC-1130-VOL#00 Main Raid Array Raid1+0 1000.0GB 00/00/00 Normal
Main Raid Array 4 2000.0GB 0.0GB 1234 Normal
Main Processor : 500MHz
CPU ICache Size : 32KB
CPU DCache Size : 32KB
CPU SCache Size : 0KB
System Memory : 1024MB/333MHz/ECC
Firmware Version : V1.44 2008-2-1
BOOT ROM Version : V1.44 2008-1-28
The buildworld taking a really long time was just an example of the
problem I am seeing that is easy to quantify. If I run boxbackup, dump,
clamscan or a few other IO intensive everything gets VERY slow even
when reading files from the server.
When the HTTP locks up (another issue that is seen and is connected to
the same issue in my view) this is what it looks like. It is almost as
if the http gets backed up from what I can tell and I need a plunger to
clean out the blockage :) I have to kill it and then restart it to get
things back to normal for a bit.
last pid: 46013; load averages: 105.30, 67.67,
34.45 up 4+23:59:42 19:08:40
629 processes: 89 running, 540 sleeping
CPU: 21.9% user, 0.0% nice, 74.5% system, 3.1% interrupt, 0.4% idle
Mem: 1538M Active, 11G Inact, 898M Wired, 303M Cache, 214M Buf, 1346M Free
Swap: 8192M Total, 1036K Used, 8191M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
46000 www 1 65 0 86728K 15008K RUN 1 0:01 12.06% httpd
45994 www 1 56 0 86728K 15008K CPU1 3 0:01 10.16% httpd
46002 www 1 -4 0 150M 20648K RUN 3 0:00 6.98% httpd
45195 www 1 68 0 121M 19748K RUN 1 0:29 6.88% httpd
45991 www 1 53 0 150M 21060K select 3 0:01 6.59% httpd
45997 www 1 -4 0 150M 20992K ufs 5 0:01 6.59% httpd
45950 www 1 57 0 153M 23388K RUN 2 0:01 6.49% httpd
45999 www 1 -4 0 150M 20640K ufs 6 0:00 5.96% httpd
45189 www 1 66 0 161M 29660K RUN 6 0:26 5.76% httpd
45974 www 1 -4 0 151M 21564K ufs 3 0:01 5.76% httpd
45984 www 1 -4 0 151M 21376K ufs 5 0:01 5.66% httpd
45998 www 1 -4 0 150M 20652K ufs 3 0:00 5.57% httpd
45780 www 1 -4 0 153M 23516K ufs 6 0:06 5.37% httpd
45972 www 1 -4 0 151M 21332K RUN 4 0:01 5.37% httpd
46001 www 1 20 0 150M 20568K lockf 4 0:00 5.37% httpd
45425 www 1 60 0 164M 31516K RUN 7 0:15 5.18% httpd
45995 www 1 63 0 150M 20820K RUN 2 0:00 5.18% httpd
45845 www 1 -4 0 151M 21692K ufs 6 0:02 4.98% httpd
45854 www 1 52 0 151M 22080K CPU6 0 0:02 4.88% httpd
45977 root 1 47 0 10160K 3260K CPU2 6 0:01 4.88% top
45509 www 1 56 0 155M 25028K RUN 0 0:14 4.79% httpd
45735 www 1 -4 0 158M 27096K RUN 3 0:07 4.79% httpd
45730 www 1 20 0 151M 21728K lockf 2 0:04 4.79% httpd
45847 www 1 -4 0 150M 21092K RUN 5 0:02 4.69% httpd
85338 root 1 46 0 150M 20560K select 7 5:03 4.59% httpd
45835 www 1 -4 0 150M 21100K ufs 0 0:02 4.59% httpd
45443 www 1 -4 0 151M 22220K ufs 6 0:12 4.49% httpd
45699 www 1 -4 0 157M 26528K RUN 0 0:07 4.39% httpd
45722 www 1 -4 0 152M 22908K RUN 0 0:05 4.39% httpd
45701 www 1 -4 0 152M 22268K RUN 2 0:07 4.30% httpd
45849 www 1 -4 0 151M 21748K ufs 6 0:02 4.30% httpd
46010 nagios 1 -4 0 7684K 1136K ufs 5 0:00 4.30%
check_ping
45515 www 1 -4 0 160M 29048K ufs 5 0:13 4.20% httpd
45606 www 1 -4 0 152M 22420K ufs 0 0:09 4.20% httpd
vfs.numvnodes: 355382
kern.maxvnodes: 400000
vfs.ufs.dirhash_docheck: 0
vfs.ufs.dirhash_mem: 3239015
vfs.ufs.dirhash_maxmem: 10485760
vfs.ufs.dirhash_minsize: 2560
kern.ipc.nsfbufs: 0
kern.openfiles: 3395
kern.maxfiles: 12328
Results from netstat -m
------------------------
1185/3360/4545 mbufs in use (current/cache/total)
1062/2856/3918/25600 mbuf clusters in use (current/cache/total/max)
1062/1556 mbuf+clusters out of packet secondary zone in use (current/cache)
10/1550/1560/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
2460K/12752K/15212K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
46262 requests for I/O initiated by sendfile
0 calls to protocol drain routines
Results from vmstat -m
------------------------
Type InUse MemUse HighUse Requests Size(s)
cdev 22 6K - 22 256
acd_driver 1 2K - 1 2048
sigio 1 1K - 1626 64
filedesc 684 941K - 1199696 16,32,64,128,256,512,1024,2048,4096
kenv 68 11K - 70 16,32,64
kqueue 368 414K - 1740632 256,2048,4096
proc-args 52 4K - 5389885 16,32,64,128,256
ithread 99 19K - 99 32,128,256
acpisem 13 2K - 13 128
CAM queue 12 1K - 302 16,32,64,128,256
KTRACE 100 13K - 100 128
linker 45 4K - 71 16,32,64,128,512
lockf 314 38K - 16413112 64,128,256,512,1024,2048,4096
scsi_da 0 0K - 65 16
ip6ndp 7 1K - 7 64,128
ip6opt 1 1K - 50503 256
temp 66 222K - 6704801 16,32,64,128,256,512,1024,2048,4096
devbuf 16781 35476K - 108258 16,32,64,128,256,512,1024,2048,4096
CAM SIM 2 1K - 2 256
module 204 26K - 204 128
acpidev 78 5K - 78 64
mtx_pool 1 8K - 1
subproc 1111 1606K - 1045562 512,4096
proc 2 16K - 2
session 34 5K - 20772 128
pgrp 39 5K - 158890 128
cred 24950 6238K - 11839905 256
uidinfo 13 3K - 7337 64,2048
plimit 24 6K - 226179 256
CAM periph 7 2K - 45 16,32,64,128,256
sysctltmp 0 0K - 215050 16,32,64,128,256
sysctloid 4373 216K - 4373 16,32,64,128
sysctl 0 0K - 828292 16,32,64
umtx 1692 212K - 1692 128
p1003.1b 1 1K - 1 16
SWAP 2 1097K - 2 64
CAM XPT 51 24K - 19790153 32,64,128,256,1024
bus-sc 111 101K - 1879 16,32,64,128,256,512,1024,2048,4096
bus 804 77K - 5926 16,32,64,128,256,1024
devstat 10 21K - 10 32,4096
eventhandler 57 5K - 57 64,128
kobj 125 500K - 160 4096
kbdmux 6 9K - 6 16,256,512,2048,4096
rman 168 21K - 576 16,64,128
sbuf 0 0K - 840 16,32,64,128,256,512,1024,2048,4096
pci_link 16 2K - 16 16,128
stack 0 0K - 14 256
taskqueue 19 2K - 19 16,32,128
Unitno 16 1K - 22074 32,64
iov 0 0K - 12126863 16,64,128,256,512
ioctlops 0 0K - 388714 16,32,64,128,256,512,1024,2048
msg 4 30K - 4 2048,4096
sem 4 8K - 4 512,1024,2048,4096
shm 1 16K - 1
ttys 1170 169K - 80824 128,1024
ptys 5 2K - 5 256
accf 3 1K - 301 32,64
mbuf_tag 0 0K - 520852 32,128
pcb 47 158K - 1332310 16,32,128,1024,2048,4096
soname 187 23K - 10680643 16,32,128
biobuf 1 2K - 143707 2048
vfscache 1 1024K - 1
cl_savebuf 0 0K - 154293 64,128
vfs_hash 1 512K - 1
vnodes 2 1K - 3 32,256
vnodemarker 1 1K - 124142 512
mount 111 6K - 495 16,32,64,128,256,2048
acpi_perf 8 1K - 8 64
BPF 6 1K - 6 128
ether_multi 29 2K - 32 16,32,64
ifaddr 136 48K - 136 32,64,128,256,512,4096
ifnet 7 13K - 7 256,2048
clone 6 24K - 6 4096
arpcom 5 1K - 5 16
lo 1 1K - 1 32
acpica 3057 292K - 68659 16,32,64,128,256,512,1024
routetbl 303 86K - 1027 32,64,128
,256,512
in_multi 4 1K - 4 64
IpFw/IpAcct 60 9K - 60 64,128,2048
sctp_iter 0 0K - 65 256
sctp_ifn 4 1K - 4 128
sctp_ifa 66 9K - 66 128
sctp_vrf 1 1K - 1 64
sctp_a_it 0 0K - 65 16
hostcache 1 36K - 1
entropy 1024 64K - 1024 64
syncache 1 100K - 1
in6_multi 16 1K - 16 32,64,128
audit_evclass 150 5K - 187 32
savedino 0 0K - 406078 256
newdirblk 0 0K - 5047 64
dirrem 18 2K - 2259617 64
mkdir 1 1K - 283528 64
diradd 183 12K - 3426340 64
freefile 55 4K - 1081462 64
freeblks 26 7K - 792864 256
freefrag 2 1K - 781740 64
allocindir 5 1K - 2842332 128
indirdep 4 1K - 116594 64
allocdirect 62 16K - 4832896 256
bmsafemap 12 2K - 271759 128
newblk 1 1K - 7675229 64,512
inodedep 270 580K - 2593883 256
pagedep 12 130K - 318828 128
ufs_dirhash 2848 1230K - 42435 16,32,64,128,256,512,1024,2048,4096
ufs_quota 1 512K - 1
ufs_mount 15 241K - 51 128,256,512,2048,4096
UMAHash 5 572K - 33 512,1024,2048,4096
USBHC 0 0K - 660 16
USBdev 22 10K - 682 16,128,512
USB 761 683K - 4079 16,32,64,128,256,1024
vm_pgdata 2 129K - 2 128
DEVFS1 115 58K - 115 512
DEVFS3 250 63K - 251 256
DEVFS2 115 2K - 115 16
DEVFS_RULE 36 17K - 36 64,512
DEVFS 30 1K - 31 16,128
io_apic 2 4K - 2 2048
pfs_nodes 20 5K - 20 256
memdesc 1 4K - 2 4096
msi 4 1K - 4 128
nexusdev 4 1K - 4 16
acpitask 0 0K - 9 64
GEOM 104 20K - 882 16,32,64,128,256,512,1024,2048
atkbddev 2 1K - 2 64
isadev 7 1K - 7 128
CAM dev queue 2 1K - 2 128
ata_generic 1 1K - 1 1024
ata_dma 1 1K - 1 256
Results from systat -v
-----------------------
1 users Load 143 90.86 47.13 Nov 21 19:10
Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 1754100 25964 4719924 55728 1551492 count
All 1916252 113912 9413004 269144 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt 179 cow 16002 total
73 133 454 2 32k 816 2520 3 29k 726 504 zfod atkbd0 1
ozfod sio1 irq3
86.8%Sys 3.5%Intr 9.2%User 0.0%Nice 0.6%Idle %ozfod sio0 irq4
| | | | | | | | | | | daefr ata0 irq14
===========================================++>>>>> 16 prcfr
uhci1 uhci
314 dtbuf 90 totfr uhci0 uhci
Namei Name-cache Dir-cache 400000 desvn react 2 arcmsr0 26
Calls hits % hits % 355344 numvn pdwak 2004 cpu0: time
76763 76730 100 24902 frevn pdpgs em0 irq256
intrn 1 em2 irq258
Disks da0 da1 pass0 pass1 pass2 934624 wire em3 irq259
KB/t 9.00 0.00 0.00 0.00 0.00 1697060 act 2000 cpu1: time
tps 1 0 0 0 0 12038912 inact 1996 cpu2: time
MB/s 0.01 0.00 0.00 0.00 0.00 308732 cache 2000 cpu3: time
%busy 0 0 0 0 0 1244784 free 2001 cpu7: time
219632 buf 1999 cpu4: time
1999 cpu6: time
2000 cpu5: time
Here is a "normal" sysstat -v to compare when there are no
"visible" problems:
3 users Load 1.67 1.03 1.02 Nov 25 22:12
Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 797576 31388 2318500 57324 4051340 count
All 952256 114916 6781828 226696 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt 556 cow 16001 total
1 6 474 5463 1602 3387 1 31k 1567 853 zfod atkbd0 1
ozfod sio1 irq3
8.4%Sys 4.0%Intr 2.8%User 0.0%Nice 84.8%Idle %ozfod sio0 irq4
| | | | | | | | | | | daefr ata0 irq14
====++>> 602 prcfr
uhci1 uhci
125 dtbuf 1443 totfr uhci0 uhci
Namei Name-cache Dir-cache 400000 desvn react arcmsr0 26
Calls hits % hits % 328748 numvn pdwak 2026 cpu0: time
52734 52660 100 24705 frevn pdpgs em0 irq256
intrn 1 em2 irq258
Disks da0 da1 pass0 pass1 pass2 857028 wire em3 irq259
KB/t 0.00 0.00 0.00 0.00 0.00 750716 act 2026 cpu1: time
tps 0 0 0 0 0 10564316 inact 1975 cpu2: time
MB/s 0.00 0.00 0.00 0.00 0.00 303468 cache 1977 cpu3: time
%busy 0 0 0 0 0 3748056 free 1999 cpu7: time
219632 buf 2000 cpu4: time
1997 cpu6: time
2000 cpu5: time
----------------------------------
Here are the results of vmstat -w 1 during the problem:
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs
us sy id
157 110 13 5544M 1111M 1141 0 0 0 1100 44 0 0 47 115
8744 7 14 79
146 34 98 5546M 1099M 4191 0 0 0 729 0 2 0 17 18583
102586 9 91 0
224 33 15 5548M 1091M 3825 0 0 0 664 0 0 0 7 14115
141707 10 90 0
165 103 11 5633M 1064M 12222 0 0 0 6745 0 2 0 42 41519
403437 14 86 0
214 73 4 5653M 1044M 4539 0 0 0 959 0 0 0 7 15698
94269 11 88 1
260 30 1 5664M 1034M 8457 0 0 0 2171 0 0 0 14 36978
248202 12 87 0
57 182 45 5667M 1029M 4761 0 0 0 2535 0 0 0 6 21004
133617 10 90 0
55 24 16 2152M 2454M 7993 0 0 0 3135 0 0 0 13 20263
173347 13 81 7
20 15 2 1972M 2537M 93820 0 0 0 465955 0 10 0 588 99274
716238 23 76 1
13 11 0 1877M 2581M 7820 0 0 0 31044 0 6 0 38 7859
76120 16 83 1
9 12 1 1816M 2599M 6889 0 0 0 14550 0 20 0 79 359198
21333 14 79 7
11 13 0 1797M 2613M 6542 0 0 0 8416 0 3 0 17 606119
15341 18 61 21
1 9 1 1740M 2636M 1744 0 0 0 6267 0 2 0 14 11617
15322 8 63 29
2 3 0 1694M 2657M 3417 0 0 0 8669 0 15 0 52 50341
12045 6 29 65
Here is another view of top at a later date with the same problem happening
focusing on IO setting
in Top:
--------------------------------------------------------------
last pid: 17984; load averages: 39.26, 37.68,
24.75 up 8+09:25:55
04:34:53
539 processes: 59 running, 480 sleeping
CPU: 9.8% user, 0.5% nice, 87.0% system, 2.3% interrupt, 0.4% idle
Mem: 1146M Active, 9663M Inact, 875M Wired, 582M Cache, 214M Buf, 3577M Free
Swap: 8192M Total, 1036K Used, 8191M Free
PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT
COMMAND
17587 www 446 62 0 0 0 0 0.00% httpd
17763 www 515 37 0 0 0 0 0.00% httpd
17860 www 538 47 0 0 0 0 0.00% httpd
17703 www 457 43 0 0 0 0 0.00% httpd
17701 www 485 34 0 0 0 0 0.00% httpd
17550 www 423 29 0 0 0 0 0.00% httpd
17579 www 0 0 0 0 0 0 0.00% httpd
17864 www 495 39 0 0 0 0 0.00% httpd
17836 www 520 36 0 0 0 0 0.00% httpd
17847 www 451 28 0 0 0 0 0.00% httpd
17756 www 462 29 0 0 0 0 0.00% httpd
17982 www 445 63 0 0 0 0 0.00% httpd
17581 www 451 60 0 0 0 0 0.00% httpd
17761 www 449 37 0 0 0 0 0.00% httpd
17582 www 509 30 0 0 0 0 0.00% httpd
17709 www 447 28 0 0 0 0 0.00% httpd
17705 www 515 30 0 0 0 0 0.00% httpd
17704 www 469 38 0 0 0 0 0.00% httpd
17706 www 508 53 0 0 0 0 0.00% httpd
17833 www 483 34 0 0 0 0 0.00% httpd
17834 www 499 43 0 0 0 0 0.00% httpd
17974 www 489 38 0 0 0 0 0.00% httpd
17978 www 467 45 0 0 0 0 0.00% httpd
17576 www 447 32 0 0 0 0 0.00% httpd
17570 www 443 37 0 0 0 0 0.00% httpd
17762 www 476 31 0 0 0 0 0.00% httpd
17837 www 508 44 0 0 0 0 0.00% httpd
17548 www 443 32 0 0 0 0 0.00% httpd
17783 www 390 22 0 0 0 0 0.00% httpd
17961 www 534 57 0 0 0 0 0.00% httpd
17590 www 498 50 0 0 0 0 0.00% httpd
17700 www 471 35 0 0 0 0 0.00% httpd
17580 www 438 41 0 0 0 0 0.00% httpd
This used to be on a 4.11x system with 1 cpu and only 1gb of ram and ran
flawlessly with much less resources with the same web site code for a long time.
I do not have this problem on the other 7.0 machine. I originally thought it was
just a cpu issue but it is very closely tied to when something is trying to use
the raid arrays and this seems to be the way to reproduce it.
I am having a hard time determining why the system load is so high. Can you
recommend the best way to identify the culprit?
Thanks,
Paul