I've been asked for ideas on building a rather large archival storage system for inhouse use, on the order of 100-400TB. Probably using CentOS 6. The existing system this would replace is using Solaris 10 and ZFS, but I want to explore using Linux instead. We have our own tomcat based archiving software that would run on this storage server, along with NFS client and server. Its a write once, read almost never kind of application, storing compressed batches of archive files for a year or two. 400TB written over 2 years translates to about 200TB/year or about 7MB/second average write speed. The very rare and occasional read accesses are done by batches where a client makes a webservice call to get a specific set of files, then they are pushed as a batch to staging storage where the user can then browse them, this can take minutes without any problems. My general idea is a 2U server with 1-4 SAS cards connected to strings of about 48 SATA disks (4 x 12 or 3 x 16), all configured as JBOD, so there would potentially be 48 or 96 or 192 drives on this one server. I'm thinking they should be laid as as 4 or 8 or 16 seperate RAID6 sets of 10 disks each, then use LVM to put those into a larger volume. About 10% of the disks would be reserved as global hot spares. So, my questions... A) Can CentOS 6 handle that many JBOD disks in one system? is my upper size too big and I should plan for 2 or more servers? What happens with the device names when you've gone past /dev/sdz ? B) What is the status of large file system support in CentOS 6? I know XFS is frequently mentioned with such systems, but I/we have zero experience with it, its never been natively supported in EL up to 5, anyways. C) Is GFS suitable for this, or is it strictly for clustered storage systems? D) anything important I've neglected? -- john r pierce N 37, W 122 santa cruz ca mid-left coast
>A) Can CentOS 6 handle that many JBOD disks in one system? is my upper >size too big and I should plan for 2 or more servers? What happens with >the device names when you've gone past /dev/sdz ?Dev names double, sdaa etc.>B) What is the status of large file system support in CentOS 6? I know XFS > is frequently mentioned with such systems, but I/we have zero experience >with it, its never been natively supported in EL up to 5, anyways.My use of XFS has been with great success in 5.x. I have never scaled that large though.>C) Is GFS suitable for this, or is it strictly for clustered storage systems?Unless you plan on mounting the fs by more than one server at once, I most certainly would not add that layer of complexity, its also slower.>D) anything important I've neglected?If I understand, it's not your only backup system, so I don't think it's that critical, but the rebuild time on each array versus the degraded IO capacity and its impact on serving content would be something interesting. Do you plan on making hotspares available? That many discs will likely have a higher rate of failure... What kind of discs and controllers do you intend on using? jlc
On Wed, Jul 13, 2011 at 11:32:14PM -0700, John R Pierce wrote:> I've been asked for ideas on building a rather large archival storage > system for inhouse use, on the order of 100-400TB. Probably using CentOS > 6. The existing system this would replace is using Solaris 10 and > ZFS, but I want to explore using Linux instead. > > We have our own tomcat based archiving software that would run on this > storage server, along with NFS client and server. Its a write once, > read almost never kind of application, storing compressed batches of > archive files for a year or two. 400TB written over 2 years translates > to about 200TB/year or about 7MB/second average write speed. The very > rare and occasional read accesses are done by batches where a client > makes a webservice call to get a specific set of files, then they are > pushed as a batch to staging storage where the user can then browse > them, this can take minutes without any problems. > > My general idea is a 2U server with 1-4 SAS cards connected to strings > of about 48 SATA disks (4 x 12 or 3 x 16), all configured as JBOD, so > there would potentially be 48 or 96 or 192 drives on this one server. > I'm thinking they should be laid as as 4 or 8 or 16 seperate RAID6 sets > of 10 disks each, then use LVM to put those into a larger volume. > About 10% of the disks would be reserved as global hot spares. > > So, my questions... > > D) anything important I've neglected? >Remember Solaris ZFS does checksumming for all data, so with weekly/monthly ZFS scrubbing it can detect silent data/disk corruption automatically and fix it. With a lot of data, that might get pretty important.. -- Pasi
On 7/14/2011 1:32 AM, John R Pierce wrote:> I've been asked for ideas on building a rather large archival storage > system for inhouse use, on the order of 100-400TB. Probably using CentOS > 6. The existing system this would replace is using Solaris 10 and > ZFS, but I want to explore using Linux instead. > > We have our own tomcat based archiving software that would run on this > storage server, along with NFS client and server. Its a write once, > read almost never kind of application, storing compressed batches of > archive files for a year or two. 400TB written over 2 years translates > to about 200TB/year or about 7MB/second average write speed. The very > rare and occasional read accesses are done by batches where a client > makes a webservice call to get a specific set of files, then they are > pushed as a batch to staging storage where the user can then browse > them, this can take minutes without any problems.If it doesn't have to look exactly like a file system you might like luwak which is a layer over the riak nosql distributed database to handle large files. (http://wiki.basho.com/Luwak.html) The underlying storage is distributed across any number of nodes with a scheme that lets you add more as needed and keeps redundant copies to handle node failures. A down side of luwak for most purposes is that because it chunks the data and re-uses duplicates, you can't remove anything, but for archive purposes it might work well. For something that looks more like a filesystem, but is also distributed and redundant: http://www.moosefs.org/. -- Les Mikesell lesmikesell at gmail.com
Two thoughts: 1. Others have already inquired as to your motivation to move away from ZFS/Solaris. If it is just the hardware & licensing aspect, you might want to consider ZFS on FreeBSD. (I understand that unlike the Linux ZFS implementation, the FreeBSD one is in-kernel.) 2. If you really want to move away from move away from ZFS, one possibility is to use glusterfs, which is a lockless distributed filesystem. Based on the glusterfs architecture, you scale out horizontally over time; instead of buying a single server with massive capacity, you buy smaller servers and add more as your space requirements exceed your current capacity. You also decide over how many nodes you want your data to be mirrored. Think about it as a RAID0/RAID1/RAID10 solution spread over machines rather than just disk. It uses fuse over native filesystems, so if you decide to back it out you turn off glusterfs and you still have your data on the native filesystem. From the client perspective the server cluster looks like a single logical entity, either over NFS or the native client software. (The native client is configured with info on all the server nodes, the NFS client depends on round-robin DNS to connect to *some* node of the cluster.) <http://www.gluster.org> Caveat: I've only used glusterfs in one small deployment in a mirrored-between-two-nodes configuration. Glusterfs doesn't have as many miles on it as ZFS or the other more common filesystems. I've not run into any serious hiccoughs, but put in a test cluster first and try it out. Commodity hardware is just fine for such a test cluster. Devin
On Wed, Jul 13, 2011 at 11:32:14PM -0700, John R Pierce spake thusly:> I've been asked for ideas on building a rather large archival storage > system for inhouse use, on the order of 100-400TB. Probably using CentOS > 6. The existing system this would replace is using Solaris 10 and > ZFS, but I want to explore using Linux instead.If you don't need a POSIX filesystem interface check out MogileFS. It could greatly simplify a lot of these scalability issues. -- Tracy Reed http://tracyreed.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: <http://lists.centos.org/pipermail/centos/attachments/20110726/837a4199/attachment-0001.sig>