Ben England
2012-May-21  13:22 UTC
[Gluster-users] Gluster-users Digest, Vol 49, Issue 25 -- Disk utilization
Peter, see comments marked with ben> below, hope this helps. Message: 1 Date: Tue, 15 May 2012 22:12:10 +0200 From: Peter Frey <pfrey09 at googlemail.com> Subject: [Gluster-users] Disk utilisation To: gluster-users at gluster.org Message-ID: <CAFWmEw==E990t-DYa_DRB37w3dDrkNLJJ=qFGJt3-bptmtGamQ at mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi, we are using Gluster to make http file downloads available. We currently have 2 gluster servers serving a replicated volume. Each gluster server has 22 disks in a hardware raid, the underlying file system is XFS. The average file size is around 3-4MB. There are stored around 16TB of data on the volume. ben> Linux distro version and Gluster version would be helpful. What RAID stripe element size? If you have 64-KB stripe element size, then EVERY disk will be made busy by reading a single 4-MB file. Striping will not help you much at that file size. ~130 mbit/s = ~15 MB/s, most disks can read at > 50 MB/s, so your total system throughput is far less than throughput of a single disk drive, so why use striping? Wouldn't it be better to be able to serve many files in parallel from your disks? You may want to increase readahead if the application tends to sequentially read the entire file, try increasing it way up, the Linux default of 128 KB is not good for Gluster. Lastly, try the deadline I/O scheduler on your data disks, CFQ can't help with a Gluster server. Once we start sending live http traffic towards the infrastructure we see a horrible performance. For instance if the outgoing bandwidth on each of the gluster servers is at ~130mbit/s our hardware raid has a busy rate of ~30%. Once we increase the traffic towards 250mbit/s the busy rate doubles to 60%. With this the iowait values also increase. We started to play with the read buffers on the http servers. There is no difference between loading the whole file into memory at once and loading the file in 64k chunks. This makes me believe that the gluster server loads the file with its own buffers and the clients buffer has no influence. We have also enabled profiling on the gluster volume: There are roughly 18 read() calls for each open() call which should be an indication for too small buffers. ben> Gluster avoids read caching on the client side. You can give Gluster servers more memory so that XFS can cache more files if this leads to more cache hits. If you really need aggressive client-side caching, you can NFS mount the gluster server. If your app is HTTP-based and is RESTful then there are web caching servers that can intercept requests before they reach your application. 18 read calls/open is not a terrible ratio. In my experience, if network tuning is correct and read files are cached (or prefetched) on the server, Gluster reads at network speed (which is why disk read-ahead is important). How much traffic can your network transmit? Have you tested network by itself (i.e. without using Gluster to test it?) We have also made the mistake to store all files in a single directory but XFS advertises that it can handle millions of files in a single directory so it shouldn't be a problem or should it? ben> Never put millions of files in a single directory if you can help it. Many file systems do not do well with this many files/directory. But even if the filesystem is perfect at it, applications that attempt to display directory contents (other than "find") tend to lock up because apps will read entire directory, read all inodes in directory, sort them, then display them. Classic example: "ls" command. ben> Recent XFS versions (such as version in RHEL6.2) handles metadata far better than before (e.g. RHEL6.1), so you may want to make sure you're using the right one.