I have a small GlusterFS Cluster providing a replicated volume. Each server has 2 SAS disks for the OS and logs and 22 SATA disks for the actual data striped together as a RAID10 using MegaRAID SAS 9280-4i4e with this configuration: http://pastebin.com/2xj4401J Connected to this cluster are a few other servers with the native client running nginx to serve files stored on it in the order of 3-10MB. Right now a storage server has a outgoing bandwith of 300Mbit/s and the busy rate of the raid array is at 30-40%. There are also strange side-effects: Sometimes the io-latency skyrockets and there is no access possible on the raid for >10 seconds. This happens at 300Mbit/s or 1000Mbit/s of outgoing bandwidth. The file system used is xfs and it has been tuned to match the raid stripe size. I've tested all sorts of gluster settings but none seem to have any effect because of that I've reset the volume configuration and it is using the default one. Does anyone have an idea what could be the reason for such a bad performance? 22 Disks in a RAID10 should deliver *way* more throughput. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120413/589bb80a/attachment.html>
On Fri, Apr 13, 2012 at 11:25:58AM +0200, Philip wrote:> Sometimes the io-latency skyrockets and there is no > access possible on the raid for >10 seconds.Have you checked http://community.gluster.org/a/linux-kernel-tuning-for-glusterfs/ ? If you have a large amount of RAM and a lot of writes, maybe you're accumulating large amounts of dirty data which is then being synchronously flushed.
On Fri, 13 Apr 2012, Philip wrote:> Does anyone have an idea what could be the reason for such a bad > performance? 22 Disks in a RAID10 should deliver *way* more throughput.You may already have done so but you can check IO-utilization of the devices with the flag "-x" to "iostat" like for example "iostat -x 2" over a two second interval. Check percentage utilization in the "%util" column to the right. If you are closer to 100 than 0 then they (the disk subsystem) might actually be busy. --jerker