On Wed, 26 Sep 2007, Jason P. Warr wrote:
> Hi all,
>
> I have an interesting project that I am working on. It is a large
> volume file download service that is in need of a new box. There
> current systems are not able to handle the load because for various
> reasons they have become very I/O limited. We currently run on
> Debian Linux with 3ware hardware RAID5. I am not sure of the exact
> disk config as it is a leased system that I have never seen.
>
> The usage patterns are usually pretty consistent in that out of
> 2-300 concurrent downloads you will find between 15 and 20 different
> files being fetched. You can guess that when the machine is trying
> to push a total of 2-300Mbit the disks are going crazy and due to
> file sizes and only having 4GB of ram on the system caching is of
> little use. The systems will regularly get into a 80-90% I/O wait
> mode. The disk write speed is of almost no concern as there are
> only a few files added each day.
>
> The system we have come up with is pretty robust. 2 dual core
> Opterons, 32GB of ram, 8 750GB SATA disks. The disks are going to
> be paired off in 2 disk RAID0 sets each with a complete copy of the
> data. Essentially a manually replicated 4 way mirror set. The
> download manager would then use each set in a round robin fashion.
> This should substantially reduce the amount of frantic disk head
> dancing. The second item is to dedicate 50-75% of the ram to a
> ramdisk that would be the fetch path for the top 10 and new, hot
> downloads. This should again reduce the seeking on files that are
> being downloaded many times concurrently.
>
> My question for this list is with ZFS is there a way to access the
> individual mirror sets in a pool if I were to create a 4 way
> mirrored stripe set with the 8 disks? Even better would be if zfs
> would manage the mirror set "load balancing" by intelligently
> splitting up the reads amongst the 4 sets. Either way would make
> for a more elegant solution than replicating the sets with
> cron/rsync.
>
Your basic requirement is for good/excellent random read performance
with many concurrent IOPS (I/O Operations/Sec). The metric I use for
disk IOPS are:
disk drive IOPS
---------- ----
7,200 RPM SATA 300
15k RPM 700
These numbers can be debated/argued - but that is what I use. So, to
solve your problem, my first recommendation would be to use 15k RPM
SAS drives, rather than SATA drives, for random I/O.
Next, ZFS will automatically load balance read requests among the
members of a multi-way mirror set. So if you were to form a pool
like:
zpool create fastrandom mirror disk1 disk2 disk3 disk4
you''d have a 4-way mirror that would sustain approx 1,200 (4 *
300) reads/Sec with SATA disks and 2,800 reads/Sec with 15k SAS disks.
In addition, ZFS will make intelligent use of available RAM to cache
data.
I would suggest that you use the above config as a starting point and
measure the resulting performance running the anticipated workload -
without using any system memory for use as system buffering or as a
RAM disk.
Since its very convenient/fast to configure/reconfigure storage pools
using the zfs interface, you can also experiment with 5-way, 6-way
etc. pools - or form one pool using 4 devices for fast random access
and assign the remaining disks as a raidz pool to maximize storage
space.
PS: If you take a look at genunix.org, you''ll see I have some
experience with this type of workload. We''ll be deploying a zfs based
storage system there next month - we had to resolve your type of issue
when Belenix 0.6 was released - except for one twist - we had to use
the existing infrastructure and would not accept any downtime.
Feel free to email me offlist if I can help.
Regards,
Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com
Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/