Alexey Popov
2007-Oct-15 08:47 UTC
amrd disk performance drop after running under high load
Hi. I have 3 Dell 2850 with DELL PERC4 SCSI RAID5 6x300GB running lighttpd serving flash video at around 200Mbit/s. %grep amr /var/run/dmesg.boot amr0: <LSILogic MegaRAID 1.53> mem 0xf80f0000-0xf80fffff,0xfe9c0000-0xfe9fffff irq 46 at device 14.0 on pci2 amr0: Using 64-bit DMA amr0: delete logical drives supported by controller amr0: <LSILogic PERC 4e/Di> Firmware 521X, BIOS H430, 256MB RAM amr0: delete logical drives supported by controller amrd0: <LSILogic MegaRAID logical drive> on amr0 amrd0: 1430400MB (2929459200 sectors) RAID 5 (optimal) Trying to mount root from ufs:/dev/amrd0s1a %uname -a FreeBSD ???.ru 6.2-STABLE FreeBSD 6.2-STABLE #2: Mon Oct 8 16:25:20 MSD 2007 llp@???.ru:/usr/obj/usr/src/sys/SMP-amd64-HWPMC amd64 % After some time of running under high load disk performance become expremely poor. At that periods 'systat -vm 1' shows something like this: Disks amrd0 KB/t 85.39 tps 5 MB/s 0.38 % busy 99 It shows 100% load and just 2-10 tps. There's nothing bad in /var/log/messages or 'netstat -m' or 'vmstat -z' or anywhere else. This continues 15 - 30 minutes or so and everything becomes fine again. After some time - 10 - 12 hours it repeats. Apart of all, I tried to make mutex profiling and here's the results (sorted by the total number of acquisitions): Bad case: 102 223514 273977 0 14689 1651568 /usr/src/sys/vm/uma_core.c:2349 (512) 950 263099 273968 0 15004 14427 /usr/src/sys/vm/uma_core.c:2450 (512) 108 150422 175840 0 10978 22988519 /usr/src/sys/vm/uma_core.c:1888 (mbuf) 352 160635 173663 0 10896 9678 /usr/src/sys/vm/uma_core.c:2209 (mbuf) 110 134910 173575 0 10838 9464 /usr/src/sys/vm/uma_core.c:2104 (mbuf) 1104 1335319 106888 12 27 1259 /usr/src/sys/netinet/tcp_output.c:253 (so_snd) 171 77754 97685 0 176 207 /usr/src/sys/net/pfil.c:71 (pfil_head_mtx) 140 77104 97685 0 151 128 /usr/src/sys/netinet/ip_fw2.c:164 (IPFW static rules) 100 76543 97685 0 146 45450 /usr/src/sys/netinet/ip_fw2.c:156 (IPFW static rules) 82 77149 97685 0 243 141221 /usr/src/sys/net/pfil.c:63 (pfil_head_mtx) 1644 914481 97679 9 739 949977 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2320 (ipf filter load/unload mutex) 1642 556643 97679 5 0 0 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2455 (ipf filter rwlock) 107 89413 97679 0 0 0 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2142 (ipf cache rwlock) 907 148940 81439 1 3 7447 /usr/src/sys/kern/kern_lock.c:168 (lockbuilder mtxpool) 1764 152282 63435 2 438 336763 /usr/src/sys/net/route.c:197 (rtentry) And in the good case: 1738 821795 553033 1 41 284 /usr/src/sys/netinet/tcp_output.c:253 (so_snd) 2770 983643 490815 2 6 54 /usr/src/sys/kern/kern_lock.c:168 (lockbuilder mtxpool) 106 430941 477500 0 5555 4507 /usr/src/sys/netinet/ip_fw2.c:164 (IPFW static rules) 95 423926 477500 0 4412 5645 /usr/src/sys/netinet/ip_fw2.c:156 (IPFW static rules) 94 427239 477500 0 6323 7453 /usr/src/sys/net/pfil.c:63 (pfil_head_mtx) 82 432359 477500 0 5244 5768 /usr/src/sys/net/pfil.c:71 (pfil_head_mtx) 296 4751550 477498 9 20837 23019 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2320 (ipf filter load/unload mutex) 85 2913118 477498 6 0 0 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2455 (ipf filter rwlock) 55 473891 477498 0 0 0 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2142 (ipf cache rwlock) 59 291035 309222 0 0 0 /usr/src/sys/contrib/ipfilter/netinet/fil.c:2169 (ipf cache rwlock) 1627 697811 305094 2 2161 2535 /usr/src/sys/net/route.c:147 (radix node head) 232 804172 305094 2 12193 6500 /usr/src/sys/net/route.c:197 (rtentry) 148 892580 303518 2 594 649 /usr/src/sys/net/route.c:1281 (rtentry) 145 584970 303518 1 13479 11148 /usr/src/sys/net/route.c:1265 (rtentry) 121 282669 303518 0 3529 886 /usr/src/sys/net/if_ethersubr.c:409 (em0) Here you can see that high UMA activity happens in periods of low disk performance. But I'm not sure whether this is a root of the problem, not a consequence. I have similiar servers around doing the same things, and they work fine. Also I had the same problem a year ago with another project and that time nothing helped and i had to install Linux. I can provide additional information regarding this server if needed. What else can I try to solve the problem??? With best regards, Alexey Popov
Kris Kennaway
2007-Oct-15 17:42 UTC
amrd disk performance drop after running under high load
Alexey Popov wrote:> After some time of running under high load disk performance become > expremely poor. At that periods 'systat -vm 1' shows something like this:What does "high load" mean? You need to explain the system workload more.> Disks amrd0 > KB/t 85.39 > tps 5 > MB/s 0.38 > % busy 99> Apart of all, I tried to make mutex profiling and here's the results > (sorted by the total number of acquisitions): > > Bad case: > > 102 223514 273977 0 14689 1651568 /usr/src/sys/vm/uma_core.c:2349 (512) > 950 263099 273968 0 15004 14427 /usr/src/sys/vm/uma_core.c:2450 (512) > 108 150422 175840 0 10978 22988519 /usr/src/sys/vm/uma_core.c:1888 (mbuf)> Here you can see that high UMA activity happens in periods of low disk > performance. But I'm not sure whether this is a root of the problem, not > a consequence. The extremely high contention there does seem to say you have a mbuf starvation problem and not a disk problem. I don't know why this would be happening off-hand. Can you also provide more details about the system hardware and configuration? Kris