thr3ads.net - CentOS - [CentOS] 3Ware 9550SX and latency/system responsiveness [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Simon Banton

2007-Sep-13 12:17 UTC

[CentOS] 3Ware 9550SX and latency/system responsiveness

Dear list,

I thought I'd just share my experiences with this 3Ware card, and see 
if anyone might have any suggestions.

System: Supermicro H8DA8 with 2 x Opteron 250 2.4GHz and 4GB RAM 
installed. 9550SX-8LP hosting 4x Seagate ST3250820SV 250GB in a RAID 
1 plus 2 hot spare config. The array is properly initialized, write 
cache is on, as is queueing (and supported by the drives). StoreSave 
set to Protection.

OS is CentOS 4.5 i386, minimal install, default partitioning as 
suggested by the installer (ext3, small /boot on /dev/sda1, remainder 
as / on LVM VolGroup with 2GB swap).

Firmware from 3Ware codeset 9.4.1.2 in use, firmware/driver details:
//serv1> /c0 show all
/c0 Driver Version = 2.26.05.007
/c0 Model = 9550SX-8LP
/c0 Memory Installed  = 112MB
/c0 Firmware Version = FE9X 3.08.02.005
/c0 Bios Version = BE9X 3.08.00.002
/c0 Monitor Version = BL9X 3.01.00.006

I initially noticed something odd while installing 4.4, that writing 
the inode tables took a longer time than I expected (I thought the 
installer had frozen) and the system overall felt sluggish when doing 
its first yum update, certainly more sluggish than I'd expect with a 
comparatively powerful machine and hardware RAID 1.

I tried a few simple benchmarks (bonnie++, iozone, dd) and noticed up 
to 8 pdflush commands hanging about in uninterruptible sleep when 
writing to disk, along with kjournald and kswapd from time to time. 
Loadave during writing climbed considerably (up to >12) with 'ls' 
taking up to 30 seconds to give any output. I've tried CentOS 4.4, 
4.5, RHEL AS 4 update 5 (just in case) and openSUSE 10.2 and they all 
show the same symptoms.

Googling around makes me think that this may be related to queue 
depth, nr_requests and possibly VM params (the latter from 
https://bugzilla.redhat.com/show_bug.cgi?id=121434#c275). These are 
the default settings:

/sys/block/sda/device/queue_depth = 254
/sys/block/sda/queue/nr_requests = 8192
/proc/sys/vm/dirty_expire_centisecs = 3000
/proc/sys/vm/dirty_ratio = 30

3Ware mentions elevator=deadline, blockdev --setra 16384 along with 
nr_requests=512 in their performance tuning doc - these alone seem to 
make no difference to the latency problem.

Setting dirty_expire_centisecs = 1000 and dirty_ratio = 5 does indeed 
reduce the number of processes in 'b' state as reported by vmstat 1 
during an iozone benchmark (./iozone -s 20480m -r 64 -i 0 -i 1 -t 1 
-b filename.xls as per 3Ware's own tuning doc) but the problem is 
obviously still there, just mitigated somewhat. The comparison graphs 
are in a PDF here: 
http://community.novacaster.com/attach.pl/7411/482/iozone_vm_tweaks_xls.pdf 
Incidentally, the vmstat 1 output was directed to an NFS-mounted disk 
to avoid writing it to the arry during the actual testing.

I've tried eliminating LVM from the equation, going to ext2 rather 
than ext3 and booting single-processor all to no useful effect. I've 
also tried benchmarking with different blocksizes from 512B to 1M in 
powers of 2 and the problem remains - many processes in 
uninterruptible sleep blocking other IO. I'm about to start 
downloading CentOS 5 to give it a go, and after that I might have to 
resort to seeing if WinXP has the same issue.

My only real question is "where do I go from here?" I don't have 
enough specific tuning knowledge to know what else to look at.

Thanks for any pointers.

Simon

Feizhou

2007-Sep-13 12:52 UTC

head link

[CentOS] 3Ware 9550SX and latency/system responsiveness

Simon Banton wrote:> Dear list,
> 
> I thought I'd just share my experiences with this 3Ware card, and see
if
> anyone might have any suggestions.
> 
> System: Supermicro H8DA8 with 2 x Opteron 250 2.4GHz and 4GB RAM 
> installed. 9550SX-8LP hosting 4x Seagate ST3250820SV 250GB in a RAID 1 
> plus 2 hot spare config. The array is properly initialized, write cache 
> is on, as is queueing (and supported by the drives). StoreSave set to 
> Protection.
Well, the first thing I noted was that the H8DA8 was not on the list of 
compatible motherboards on the 3ware website.
> 
> My only real question is "where do I go from here?" I don't
have enough
> specific tuning knowledge to know what else to look at.
> 
Perhaps update to the latest firmware for both motherboard and 3ware 
board. Also check that you actually plugged the thing into a PCI-X 
64-bit 100/133 Mhz slot and that it is running at those speeds. Next 
question would be whether you are using a riser board?

Jim Perrin

2007-Sep-14 12:09 UTC

head link

[CentOS] 3Ware 9550SX and latency/system responsiveness

On 9/14/07, Simon Banton <centos at web.org.uk> wrote:
> I see where you're going with larger journal idea and I'll give
that a go.Have you done any filesystem optimization and tried matching the
filesystem to the raid chunk size? A while back Ross had a very good
email thread regarding this. Some of the details he went over in the
thread are here -> http://wiki.centos.org/HowTos/Disk_Optimization.

It makes a tremendous difference in raid operation.



-- 
During times of universal deceit, telling the truth becomes a revolutionary act.
George Orwell

matthias platzer

2007-Oct-02 10:30 UTC

head link

[CentOS] 3Ware 9550SX and latency/system responsiveness

hello,

i saw this thread a bit late, but I had /am having the exact same issues 
on a dual-2-core-cpu opteron box with a 9550SX. (Centos 5 x86_64)
What I did to work around them was basically switching to XFS for 
everything except / (3ware say their cards are fast, but only on XFS) 
AND using very low nr_requests for every blockdev on the 3ware card.
(like 32 or 64). That will limit the iowait times for the cpus and make 
the 3ware-drives respond faster (see yourself with iostat -x -m 1 while 
benchmarking).
If you can, you could also try _not_ putting the system disks on the 
3ware card, because additionally the 3ware driver/card gives writes 
priority. People suggested the unresponsive system behaviour is because 
the cpu hanging in iowait for writing and then reading the system 
binaries won't happen till the writes are done, so the binaries should 
be on another io path.

All this seem to be symptoms of a very complex issue consisting of 
kernel bugs/bad drivers/... and they seem to be worst on a AMD/3ware 
Combination.
here is another link:
http://bugzilla.kernel.org/show_bug.cgi?id=7372

regards,
matthias

Simon Banton schrieb:> Dear list,
> 
> I thought I'd just share my experiences with this 3Ware card, and see
if
> anyone might have any suggestions.
> 
> System: Supermicro H8DA8 with 2 x Opteron 250 2.4GHz and 4GB RAM 
> installed. 9550SX-8LP hosting 4x Seagate ST3250820SV 250GB in a RAID 1 
> plus 2 hot spare config. The array is properly initialized, write cache 
> is on, as is queueing (and supported by the drives). StoreSave set to 
> Protection.
> 
> OS is CentOS 4.5 i386, minimal install, default partitioning as 
> suggested by the installer (ext3, small /boot on /dev/sda1, remainder as 
> / on LVM VolGroup with 2GB swap).
> 
> Firmware from 3Ware codeset 9.4.1.2 in use, firmware/driver details:
> //serv1> /c0 show all
> /c0 Driver Version = 2.26.05.007
> /c0 Model = 9550SX-8LP
> /c0 Memory Installed  = 112MB
> /c0 Firmware Version = FE9X 3.08.02.005
> /c0 Bios Version = BE9X 3.08.00.002
> /c0 Monitor Version = BL9X 3.01.00.006
> 
> I initially noticed something odd while installing 4.4, that writing the 
> inode tables took a longer time than I expected (I thought the installer 
> had frozen) and the system overall felt sluggish when doing its first 
> yum update, certainly more sluggish than I'd expect with a
comparatively
> powerful machine and hardware RAID 1.
> 
> I tried a few simple benchmarks (bonnie++, iozone, dd) and noticed up to 
> 8 pdflush commands hanging about in uninterruptible sleep when writing 
> to disk, along with kjournald and kswapd from time to time. Loadave 
> during writing climbed considerably (up to >12) with 'ls' taking
up to
> 30 seconds to give any output. I've tried CentOS 4.4, 4.5, RHEL AS 4 
> update 5 (just in case) and openSUSE 10.2 and they all show the same 
> symptoms.
> 
> Googling around makes me think that this may be related to queue depth, 
> nr_requests and possibly VM params (the latter from 
> https://bugzilla.redhat.com/show_bug.cgi?id=121434#c275). These are the 
> default settings:
> 
> /sys/block/sda/device/queue_depth = 254
> /sys/block/sda/queue/nr_requests = 8192
> /proc/sys/vm/dirty_expire_centisecs = 3000
> /proc/sys/vm/dirty_ratio = 30
> 
> 3Ware mentions elevator=deadline, blockdev --setra 16384 along with 
> nr_requests=512 in their performance tuning doc - these alone seem to 
> make no difference to the latency problem.
> 
> Setting dirty_expire_centisecs = 1000 and dirty_ratio = 5 does indeed 
> reduce the number of processes in 'b' state as reported by vmstat 1
> during an iozone benchmark (./iozone -s 20480m -r 64 -i 0 -i 1 -t 1 -b 
> filename.xls as per 3Ware's own tuning doc) but the problem is
obviously
> still there, just mitigated somewhat. The comparison graphs are in a PDF 
> here: 
> http://community.novacaster.com/attach.pl/7411/482/iozone_vm_tweaks_xls.pdf
> Incidentally, the vmstat 1 output was directed to an NFS-mounted disk to 
> avoid writing it to the arry during the actual testing.
> 
> I've tried eliminating LVM from the equation, going to ext2 rather than
> ext3 and booting single-processor all to no useful effect. I've also 
> tried benchmarking with different blocksizes from 512B to 1M in powers 
> of 2 and the problem remains - many processes in uninterruptible sleep 
> blocking other IO. I'm about to start downloading CentOS 5 to give it a
> go, and after that I might have to resort to seeing if WinXP has the 
> same issue.
> 
> My only real question is "where do I go from here?" I don't
have enough
> specific tuning knowledge to know what else to look at.
> 
> Thanks for any pointers.
> 
> Simon
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>

Reasonably Related Threads

Search for more seemingly similar threads

CentOS - Sep 2007 - 3Ware 9550SX and latency/system responsiveness

[CentOS] 3Ware 9550SX and latency/system responsiveness

[CentOS] 3Ware 9550SX and latency/system responsiveness

[CentOS] 3Ware 9550SX and latency/system responsiveness

[CentOS] 3Ware 9550SX and latency/system responsiveness

Reasonably Related Threads