Hi Guys,
Currently we have 6 (high powered) front-end servers running 8 x 3TB SATA3
drives in RAID10. We are doing INSANE random IO. We are currently
implementing additional servers with 15 x 3TB SATA3 drives, where we are
forced to use RAID6 to maximize storage capacity.
Each front-end server has approximately 400 to 600 tcp clients, constantly
reading/writing files to the servers. The files are in a structured
directory based on the md5 checksum of the file name, and that structure of
directories are shared across all the servers via NFS. So all the front-end
servers read files from all the other front-end servers, but only writes to
its own directory structure. The new backend servers, takes specific parts
of the directory structure away from the front-end servers, and in such an
instance the front-end servers read & write via NFS to the new backend nodes
- this is in an attempt to offload IO from the front-end servers - it's
giving mixed results.
The biggest issue that we are having, is that we are talking about
-billions- of small (max 5MB) files. Seek times are killing us completely
from what we can make out. (OS, HW/RAID has been tweaked to kingdom come and
back).
Would there be any significant benefits to using GlusterFS in such an
environment? We're not too concerned about the reliability of files (i.e.
possible data corruption), but do have reasonable expectations in terms of
speed (esp. high concurrent random IO), and capacity.
I'm not yet too clued up on all the GlusterFS naming, but essentially if we
do go the GlusterFS route, we would like to use non replicated storage
bricks on all the front-end, as well as back-end servers in order to
maximize storage.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140922/293a6cfe/attachment.html>