On Tue, Apr 10, 2012 at 03:24:54PM +0000, Sallee, Stephen (Jake)
wrote:> We would like to provide our users with the best experience possible
> but we do not have unlimited funds. My initial thoughts are to build 5
> - 7 servers with the following specs:
>
>
> 1 x AMD MagnyCorus (12 core) processor
>
> 8 x 1TB 7.2K SATA III (RAID 5) ***See
> notes***
>
> 1 x 10Gb Ethernet (for data)
>
> 1 x 1Gb Ethernet (for management)
>
>
I don't think you'll need that many cores, the bottleneck will be disk
and a
quad-core should perform just fine.
> ***notes***
>
> I know RAID 5 is terrible, however it has been deemed that RAID 10 is
> too expensive, hence the RAID 5.
>
> ***/notes***
Is it too expensive? How does the cost of 8 x 2TB disks compare with 8 x 1TB
disks? Not very much difference I would expect. If you are using RAID10
with "far" layout then all data will be stored in the first 1TB, with
the
second 1TB for backing up a different disk; this will give you faster seeks
on read, and the higher storage density will give you faster transfer rates.
Having said that, if your usage pattern is mostly streaming reads, then
RAID5 should perform adequately - except for when a disk fails, in which
case you'll get terrible read performance, and even worse during the time
the replacement disk rebuilds. And disks *will* fail.
You'll need to take some care choosing your disk layout. Suppose your
streaming clients are reading in 4MB chunks. With a 512K or smaller stripe
size, a single read will seek on all the disks; with a 4MB stripe size you
may only seek on one or two disks, leaving spare bandwidth on other spindles
for other clients. However RAID5 with a 4MB stripe size will have terrible
write performance.
Note that at a level of 30 reads per second x 4MB, that will only just fill
1Gbps. My suspicion is that you may not get as much benefit from 10G
connections into each server as you expect; it may be cheaper to start with
1G (or a pair of bonded 1G) and upgrade later if required. Remember that if
the material is distributed evenly then you'll get 1G x number of servers;
if your cluster's uplink into the core is 10G then you'll nearly fill
that
anyway.
Finally, have you decided whether to use gluster replication or not? That
gives you a "high availability" solution, where every file appears on
two
servers. You get a performance penalty, since every time you open a file
the client talks to both servers to check they are in sync; but streaming
large files is the best-case (as I understand it) and so the penalty should
be low. It doubles your storage cost of course.
Regards,
Brian.