Hi All, I am about to embark on a project that deals with allowing information archival, over time and seeing change over time as well. I can explain it a lot better, but I would certainly talk your ear off. I really don't have a lot of money to throw at the initial concept, but I have some. This device will host all of the operations for the first few months until I can afford to build a duplicate device. I already had a few parts of the idea done and ready to get live. I am contemplating building a BackBlaze Style POD. The goal of the device is to start acting as a place to have the crawls store information, massage it, get it into db's and then notify the user the task is done so they can start looking at the results. For reference here are a few links: http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ and http://cleanenergy.harvard.edu/index.php?ira=Jabba&tipoContenido=sidebar&sidebar=science There is room for 45 drives in the case (technically a few more). 45 x 1tb 7200rpm drives is really cheap, about $60 each. 45 x 1.5tb 7200rpm drives are about $70 each. 45 x 2tb 7200rpm drives are about $120 each 45 x 3tb 7200rpm drives are about $180-$230 each (or more, some are almost $400) I have question before I commit to building one and I was hoping to get advice. 1. Can anyone recommend a mobo/processor setup that can hold lots of RAM? Like 24gb or 64gb or more? 2. Hardware RAID or Software RAID for this? 3. Would CentOS be a good choice? I have never used CentOS on a device so massive. Just ordinary servers, so to speak. I assume that it could handle so many drives, a large, expanding file system. 4. Someone recommended ZFS but I dont recall that being available on CentOS, but it is on FreeBSD which I have little experience with. 5. How would someone realistically back something like this up? Ultimately I know over time I need to distribute my architecture out and have a number of web-servers, balancing, etc but to get started I think this device with good backups might fit the bill. I can be way more detailed if it helps, I just didn't want to clutter with information that might not be relevant. -- Jason
> -----Original Message----- > From: Jason > Sent: Sunday, May 08, 2011 14:04 > To: CentOS mailing list > Subject: [CentOS] Building a Back Blaze style POD > > Hi All, > > I am about to embark on a project that deals with allowing > information archival, over time and seeing change over time > as well. I can explain it a lot better, but I would certainly > talk your ear off. I really don't have a lot of money to > throw at the initial concept, but I have some. This device > will host all of the operations for the first few months > until I can afford to build a duplicate device. I already had > a few parts of the idea done and ready to get live. > > I am contemplating building a BackBlaze Style POD. The goal > of the device is to start acting as a place to have the > crawls store information, massage it, get it into db's and > then notify the user the task is done so they can start > looking at the results. > > For reference here are a few links: > > http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/> > and > > http://cleanenergy.harvard.edu/index.php?ira=Jabba&tipoContenido=sidebar&sidebar=science Distrubing, I was on the same pages a few hours ago.> > There is room for 45 drives in the case (technically a few more). > > 45 x 1tb 7200rpm drives is really cheap, about $60 each. > > 45 x 1.5tb 7200rpm drives are about $70 each. > > 45 x 2tb 7200rpm drives are about $120 each > > 45 x 3tb 7200rpm drives are about $180-$230 each (or more, > some are almost $400) > > I have question before I commit to building one and I was > hoping to get advice. > > 1. Can anyone recommend a mobo/processor setup that can hold > lots of RAM? Like 24gb or 64gb or more? > > 2. Hardware RAID or Software RAID for this?Hardware to costly in $ Software to costly in CPU. Try for redundancy.> > 3. Would CentOS be a good choice? I have never used CentOS on > a device so massive. Just ordinary servers, so to speak. I > assume that it could handle so many drives, a large, > expanding file system. >Multiple file systems of GFS?> 4. Someone recommended ZFS but I dont recall that being > available on CentOS, but it is on FreeBSD which I have little > experience with. > > 5. How would someone realistically back something like this up? >You don't. You replicate it. We are looking at using it as an online cache of our backup media.> Ultimately I know over time I need to distribute my > architecture out and have a number of web-servers, balancing, > etc but to get started I think this device with good backups > might fit the bill. > > I can be way more detailed if it helps, I just didn't want to > clutter with information that might not be relevant. > -- > Jason-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100 - - +1 (443) 269-1555 x333 Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00.
On Sun, May 8, 2011 at 8:03 PM, Jason <slackmoehrle.lists at gmail.com> wrote:> Hi All, > > I am about to embark on a project that deals with allowing information > archival, over time and seeing change over time as well. I can explain it a > lot better, but I would certainly talk your ear off. I really don't have a > lot of money to throw at the initial concept, but I have some. This device > will host all of the operations for the first few months until I can afford > to build a duplicate device. I already had a few parts of the idea done and > ready to get live. > > I am contemplating building a BackBlaze Style POD. The goal of the device > is to start acting as a place to have the crawls store information, massage > it, get it into db's and then notify the user the task is done so they can > start looking at the results. > > For reference here are a few links: > > > http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ > > and > > > http://cleanenergy.harvard.edu/index.php?ira=Jabba&tipoContenido=sidebar&sidebar=science > > There is room for 45 drives in the case (technically a few more). > > 45 x 1tb 7200rpm drives is really cheap, about $60 each. > > 45 x 1.5tb 7200rpm drives are about $70 each. > > 45 x 2tb 7200rpm drives are about $120 each > > 45 x 3tb 7200rpm drives are about $180-$230 each (or more, some are almost > $400) > > I have question before I commit to building one and I was hoping to get > advice. > > 1. Can anyone recommend a mobo/processor setup that can hold lots of RAM? > Like 24gb or 64gb or more? >Any brand server motherboard will do. I prefer supermicro, but you can use Dell, HP, Intell, etc, etc.> > 2. Hardware RAID or Software RAID for this? >Hardware RAID will be expensive on 45 drives. IF you can, split the 45 drives into a few smaller RAID arrays. To rebuild 1x large 45TB RAID array, with either hardware or software would probably take a week, or more, depending on which RAID type you use - i.e. RAID 5, or 6, or 10. I prefer RAID 10 since it's best for speed and the rebuilds are the quickest. But you loose half the space, i.e. 45TB drives will give you about 22TB space. 45x 2TB HDD's would give you about 44TB space though.> > 3. Would CentOS be a good choice? I have never used CentOS on a device so > massive. Just ordinary servers, so to speak. I assume that it could handle > so many drives, a large, expanding file system. >Yes it would be fine.> > 4. Someone recommended ZFS but I dont recall that being available on > CentOS, but it is on FreeBSD which I have little experience with. >I would also prefer to use ZFS for this type of setup. use one 128GB SL type SSD drive as a cache drive to speed up things and 2x log drives to help with drive recovery. With ZFS you would be able to use one large RAID array if you have the log drives since it was recover from driver failure much better than other file systems. Although you can install ZFS as user-land tools, which will be slower than running it via the kernel. But, it would be better to use Solaris or FreeBSD for this - look @ Nexenta / FreeNAS / OpenIndia for this.> > 5. How would someone realistically back something like this up? >To another one as large :) OR, more realistically, if you already have some backup servers, and the full 45TB isn't full of data yet, then simply backup what you have. By the sounds of it your project is still new so your data won't be that much. I would simply build a gluster / CLVM cluster of smaller cheaper servers - which basically allows you to add say 4TB / 8TB (depending on what chassis you use and how many drives it can take) at a time to the backup cluster, which will be cheaper than buying another one identical to this right now.> > Ultimately I know over time I need to distribute my architecture out and > have a number of web-servers, balancing, etc but to get started I think this > device with good backups might fit the bill. >If this device will be used for web + mail + SQL, then you may probably look at using 4 quad core CPU's + 128GB RAM. With this many drives (or rather, this much data) you'll probably run out of RAM / CPU / Network resources before you run out of HDD space. With a device this big (in terms of storage) I would rather have 2 separate "processing" servers which just mounts LUN's from this POD (exported as NFS / iSCSI / FCoE / etc) and then have a few faster SAS / SSD drives for SQL / log processing.> > I can be way more detailed if it helps, I just didn't want to clutter with > information that might not be relevant. > -- > Jason > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos >-- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20110508/989244f1/attachment-0005.html>
Rudy, Do you have a recommendation of a motherboard? I am still reading the rest of your post. Thanks! -Jason -- Jason On Sunday, May 8, 2011 at 11:26 AM, Rudi Ahlers wrote:> > > On Sun, May 8, 2011 at 8:03 PM, Jason <slackmoehrle.lists at gmail.com> wrote: > > Hi All, > > > > I am about to embark on a project that deals with allowing information archival, over time and seeing change over time as well. I can explain it a lot better, but I would certainly talk your ear off. I really don't have a lot of money to throw at the initial concept, but I have some. This device will host all of the operations for the first few months until I can afford to build a duplicate device. I already had a few parts of the idea done and ready to get live. > > > > I am contemplating building a BackBlaze Style POD. The goal of the device is to start acting as a place to have the crawls store information, massage it, get it into db's and then notify the user the task is done so they can start looking at the results. > > > > For reference here are a few links: > > > > http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ > > > > and > > > > http://cleanenergy.harvard.edu/index.php?ira=Jabba&tipoContenido=sidebar&sidebar=science > > > > There is room for 45 drives in the case (technically a few more). > > > > 45 x 1tb 7200rpm drives is really cheap, about $60 each. > > > > 45 x 1.5tb 7200rpm drives are about $70 each. > > > > 45 x 2tb 7200rpm drives are about $120 each > > > > 45 x 3tb 7200rpm drives are about $180-$230 each (or more, some are almost $400) > > > > I have question before I commit to building one and I was hoping to get advice. > > > > 1. Can anyone recommend a mobo/processor setup that can hold lots of RAM? Like 24gb or 64gb or more? > > Any brand server motherboard will do. I prefer supermicro, but you can use Dell, HP, Intell, etc, etc. > > > > > 2. Hardware RAID or Software RAID for this? > > Hardware RAID will be expensive on 45 drives. IF you can, split the 45 drives into a few smaller RAID arrays. To rebuild 1x large 45TB RAID array, with either hardware or software would probably take a week, or more, depending on which RAID type you use - i.e. RAID 5, or 6, or 10. I prefer RAID 10 since it's best for speed and the rebuilds are the quickest. But you loose half the space, i.e. 45TB drives will give you about 22TB space. 45x 2TB HDD's would give you about 44TB space though. > > > > > 3. Would CentOS be a good choice? I have never used CentOS on a device so massive. Just ordinary servers, so to speak. I assume that it could handle so many drives, a large, expanding file system. > > Yes it would be fine. > > > > 4. Someone recommended ZFS but I dont recall that being available on CentOS, but it is on FreeBSD which I have little experience with. > > I would also prefer to use ZFS for this type of setup. use one 128GB SL type SSD drive as a cache drive to speed up things and 2x log drives to help with drive recovery. With ZFS you would be able to use one large RAID array if you have the log drives since it was recover from driver failure much better than other file systems. Although you can install ZFS as user-land tools, which will be slower than running it via the kernel. But, it would be better to use Solaris or FreeBSD for this - look @ Nexenta / FreeNAS / OpenIndia for this. > > > > > 5. How would someone realistically back something like this up? > > To another one as large :) > > OR, more realistically, if you already have some backup servers, and the full 45TB isn't full of data yet, then simply backup what you have. By the sounds of it your project is still new so your data won't be that much. I would simply build a gluster / CLVM cluster of smaller cheaper servers - which basically allows you to add say 4TB / 8TB (depending on what chassis you use and how many drives it can take) at a time to the backup cluster, which will be cheaper than buying another one identical to this right now. > > > > > Ultimately I know over time I need to distribute my architecture out and have a number of web-servers, balancing, etc but to get started I think this device with good backups might fit the bill. > > If this device will be used for web + mail + SQL, then you may probably look at using 4 quad core CPU's + 128GB RAM. With this many drives (or rather, this much data) you'll probably run out of RAM / CPU / Network resources before you run out of HDD space. > > > > With a device this big (in terms of storage) I would rather have 2 separate "processing" servers which just mounts LUN's from this POD (exported as NFS / iSCSI / FCoE / etc) and then have a few faster SAS / SSD drives for SQL / log processing. > > > > > I can be way more detailed if it helps, I just didn't want to clutter with information that might not be relevant. > > -- > > Jason > > > > _______________________________________________ > > CentOS mailing list > > CentOS at centos.org > > http://lists.centos.org/mailman/listinfo/centos > > > -- > Kind Regards > Rudi Ahlers > SoftDux > > Website: http://www.SoftDux.com > Technical Blog: http://Blog.SoftDux.com > Office: 087 805 9573 > Cell: 082 554 7532 > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos >
On Sunday, May 08, 2011 04:23:23 PM John R Pierce wrote:> note that SAS supports N:M multiplexing where > any one of the N controller channels can address any of the M > devices.... plain SATA only supports 1:M simple expandersHmm, that explains how SAS can effectively replace fibre channel at the DAE..... (EMC has gone SAS on their newest midrange storage.....). For true fibre channel replacement you need dual-attach at the drive; N:M multiplex looks like dual-attach++, at least to my eye.> And, a significant problem in large drive arrays is mechanical > resonance.... you get an array of 24 or whatever disks all being > hammered at once in a RAID environment, and the mechanical vibrations > can cause interactions which can increase the error rate, this is > greatly compounded by a flimsy chassis.I like the EMC DAE design; it is most definitely not flimsy. However, I had often wondered about some of the design features of the DAE chassis, and thinking about mechanical resonance makes some things 'click' in my mind that didn't before, things like the thick cast rack ears instead of extending the chassis sheet metal and folding an ear..... And those EMC DAE's are 15 drive enclosures. I wonder how much the custom EMC drive firmware impacts mechanical resonance, especially on large RAID groups....