Hi all, I''m putting together a OpenSolaris ZFS-based system and need help picking hardware. I''m thinking about using this 26-disk case: [FYI: 2-disk RAID1 for the OS & 4*(4+2) RAIDZ2 for SAN] http://rackmountpro.com/productpage.php?prodid=2418 Regarding the mobo, cpus, and memory - I searched goggle and the ZFS site and all I came up with so far is that, for a dedicated iSCSI-based SAN, I''ll need about 1 Gb of memory and a low-end processor - can anyone clarify exactly how much memory/cpu I''d need to be in the safe-zone? Also, are there any mobo/chipsets that are particularly well suited for a dedicated iSCSI-based SAN? This is for my home network, which includes internet/intranet services (mail, web, ldap, samba, netatalk, code-repository), build/test environments (for my cross-platform projects), and a video server (mythtv-backend). Right now, the aforementioned run on two separate machines, but I''m planning to consolidate them into a single Xen-based server. One idea I have is to host a Xen-server on this same machine - that is, an OpenSolaris-based Dom0 serving ZFS-based volumes to the DomU guest machines. But if I go this way, then I''d be looking at 4-socket Opteron mobo to use with AMD''s just released quad-core CPUs and tons of memory. My biggest concern with this approach is getting PSUs large enough to power it all - if anyone has experience on this front, I''d love to hear about it too Thanks! Kent
Jonathan Loran
2007-Sep-14 04:39 UTC
[zfs-discuss] hardware sizing for a zfs-based system?
I will only comment on the chassis, as this is made by AIC (short for American Industrial Computer), and I have three of these in service at my work. These chassis are quite well made, but I have experienced the following two problems: 1) The rails really are not up to the task of supporting such a heavy box when fully extended. If you rack this guy, you are at serious risk of having a rail failure, and dropping the whole party on the floor. Ouch. If you do use this chassis in a rack, I highly recommend you either install a very strong rail mounted shelf below it, or you support it with a lift when the rails are fully extended. 3) The power distribution board in these are a little flaky. I haven''t ever had one outright fail on me, but, I have had some interesting power on scenarios. For example, after a planned power outage, the chassis would power on, but then turn it''s self off again after about 4-5 seconds. I couldn''t get it powered on to stay. What was happening was the power distribution card was confused, and thought it didn''t have the necessary 3 (of 4) power supplies on line, and safed its self off. To fix this, I had to pull the power supplied all out, and wait a few minutes to fully discharge the power distribution card, then plug the supplies back in. Then it was able to power on again to stay. A real odd pain in the posterior. For all new systems, I''ve gone with this chassis instead (I just noticed Rackmount Pro sells ''em also): http://rackmountpro.com/productpage.php?prodid=2043 Functional rails, and better power system. One other thing, that you may know already. Rackmount Pro will try to sell you 3ware cards, which work great in the Linux/Windows environment, but aren''t supported in Open Solaris, even in JBOD mode. You will need alternate SATA host adapters for this application. Good luck, Jon Kent Watsen wrote:> Hi all, > > I''m putting together a OpenSolaris ZFS-based system and need help > picking hardware. > > I''m thinking about using this 26-disk case: [FYI: 2-disk RAID1 for the > OS & 4*(4+2) RAIDZ2 for SAN] > > http://rackmountpro.com/productpage.php?prodid=2418 > > Regarding the mobo, cpus, and memory - I searched goggle and the ZFS > site and all I came up with so far is that, for a dedicated iSCSI-based > SAN, I''ll need about 1 Gb of memory and a low-end processor - can anyone > clarify exactly how much memory/cpu I''d need to be in the safe-zone? > Also, are there any mobo/chipsets that are particularly well suited for > a dedicated iSCSI-based SAN? > > This is for my home network, which includes internet/intranet services > (mail, web, ldap, samba, netatalk, code-repository), build/test > environments (for my cross-platform projects), and a video server > (mythtv-backend). > > Right now, the aforementioned run on two separate machines, but I''m > planning to consolidate them into a single Xen-based server. One idea I > have is to host a Xen-server on this same machine - that is, an > OpenSolaris-based Dom0 serving ZFS-based volumes to the DomU guest > machines. But if I go this way, then I''d be looking at 4-socket Opteron > mobo to use with AMD''s just released quad-core CPUs and tons of memory. > My biggest concern with this approach is getting PSUs large enough to > power it all - if anyone has experience on this front, I''d love to hear > about it too > > Thanks! > Kent > > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3
> > I will only comment on the chassis, as this is made by AIC (short for > American Industrial Computer), and I have three of these in service at > my work. These chassis are quite well made, but I have experienced > the following two problems: > > <snip>Oh my, thanks for the heads-up! Charlie at RMP said that they were most popular - so I assumed that they were solid...> For all new systems, I''ve gone with this chassis instead (I just > noticed Rackmount Pro sells ''em also): > > http://rackmountpro.com/productpage.php?prodid=2043But I was hoping for resiliency and easy replacement for the OS drive - hot-swap RAID1 seemed like a no-brainer... This case has 1 internal and one external 3.5" drive bays. I could use a CF reader for resiliency and reduce need for replacement - assuming I spool logs to internal drive so as to not burn out the CF. Alternatively, I could put a couple 2.5" drives into a single 3.5" bay for RAID1 resiliency, but I''d have to shutdown to replace a drive... What do you recommend?> One other thing, that you may know already. Rackmount Pro will try to > sell you 3ware cards, which work great in the Linux/Windows > environment, but aren''t supported in Open Solaris, even in JBOD mode. > You will need alternate SATA host adapters for this application.Indeed, but why pay for a RAID controller when you only need SATA ports? - thats why I was thinking of picking up three of these bad boys (http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm) for about $100 each> Good luck,Getting there - can anybody clue me into how much CPU/Mem ZFS needs? I have an old 1.2Ghz with 1Gb of mem laying around - would it be sufficient? Thanks! Kent
Kent Watsen wrote:> I''m putting together a OpenSolaris ZFS-based system and need help > picking hardware.Fun exercise! :)> I''m thinking about using this 26-disk case: [FYI: 2-disk RAID1 for the > OS & 4*(4+2) RAIDZ2 for SAN]What are you *most* interested in for this server? Reliability? Capacity? High Performance? Reading or writing? Large contiguous reads or small seeks? One thing that I did that got a good feedback from this list was picking apart the requirements of the most demanding workflow I imagined for the machine I was speccing out.> Regarding the mobo, cpus, and memory - I searched goggle and the ZFS > site and all I came up with so far is that, for a dedicated iSCSI-based > SAN, I''ll need about 1 Gb of memory and a low-end processor - can anyone > clarify exactly how much memory/cpu I''d need to be in the safe-zone? > Also, are there any mobo/chipsets that are particularly well suited for > a dedicated iSCSI-based SAN?I''m learning more and more about this subject as I test the server (not all that dissimilar to what you''ve described, except with only 18 disks) I now have. I''m frustrated at the relative unavailability of PCIe SATA controller cards that are ZFS-friendly (i.e., JBOD), and the relative unavailability of motherboards that support both the latest CPUs as well as have a good PCI-X architecture. If you come across some potential solutions, I think a lot of people here will thank you for sharing... adam
> Fun exercise! :) >Indeed! - though my wife and kids don''t seem to appreciate it so much ;)>> I''m thinking about using this 26-disk case: [FYI: 2-disk RAID1 for >> the OS & 4*(4+2) RAIDZ2 for SAN] > > What are you *most* interested in for this server? Reliability? > Capacity? High Performance? Reading or writing? Large contiguous reads > or small seeks? > > One thing that I did that got a good feedback from this list was > picking apart the requirements of the most demanding workflow I > imagined for the machine I was speccing out.My first posting contained my use-cases, but I''d say that video recording/serving will dominate the disk utilization - thats why I''m pushing for 4 striped sets of RAIDZ2 - I think that it would be all around goodness> I''m learning more and more about this subject as I test the server > (not all that dissimilar to what you''ve described, except with only 18 > disks) I now have. I''m frustrated at the relative unavailability of > PCIe SATA controller cards that are ZFS-friendly (i.e., JBOD), and the > relative unavailability of motherboards that support both the latest > CPUs as well as have a good PCI-X architecture.Good point - another reply I just sent noted a PCI-X sata controller card, but I''d prefer a PCIe card - do you have a recommendation on a PCIe card? As far as a mobo with "good PCI-X architecture" - check out the latest from Tyan (http://tyan.com/product_board_detail.aspx?pid=523) - it has three 133/100MHz PCI-X slots> If you come across some potential solutions, I think a lot of people > here will thank you for sharing...Will keep the list posted! Thanks, Kent
Kent Watsen wrote:>> What are you *most* interested in for this server? Reliability? >> Capacity? High Performance? Reading or writing? Large contiguous reads >> or small seeks? >> >> One thing that I did that got a good feedback from this list was >> picking apart the requirements of the most demanding workflow I >> imagined for the machine I was speccing out. > My first posting contained my use-cases, but I''d say that video > recording/serving will dominate the disk utilization - thats why I''m > pushing for 4 striped sets of RAIDZ2 - I think that it would be all > around goodnessIt sounds good, that way, but (in theory), you''ll see random I/O suffer a bit when using RAID-Z2: the extra parity will drag performance down a bit. The RAS guys will flinch at this, but have you considered 8*(2+1) RAID-Z1? I don''t want to over-pimp my links, but I do think my blogged experiences with my server (also linked in another thread) might give you something to think about: http://lindsay.at/blog/archive/tag/zfs-performance/> >> I''m learning more and more about this subject as I test the server >> (not all that dissimilar to what you''ve described, except with only 18 >> disks) I now have. I''m frustrated at the relative unavailability of >> PCIe SATA controller cards that are ZFS-friendly (i.e., JBOD), and the >> relative unavailability of motherboards that support both the latest >> CPUs as well as have a good PCI-X architecture. > Good point - another reply I just sent noted a PCI-X sata controller > card, but I''d prefer a PCIe card - do you have a recommendation on a > PCIe card?Nope, but I can endorse the Supermicro card you mentioned. That''s one component in my server I have few doubts about. When I was kicking around possibilities on the list, I started out thinking about Areca''s PCIe RAID drivers, used in JBOD mode. The on-list consensus was that they would be overkill. (Plus, there''s the reliance on Solaris drivers from Areca.) It''s true, for my configuration: disk I/O far exceeds the network I/O I''ll be dealing with. Testing 16 disks locally, however, I do run into noticeable I/O bottlenecks, and I believe it''s down to the top limits of the PCI-X bus. > As far as a mobo with "good PCI-X architecture" - check out> the latest from Tyan (http://tyan.com/product_board_detail.aspx?pid=523) > - it has three 133/100MHz PCI-X slotsI use a Tyan in my server, and have looked at a lot of variations, but I hadn''t noticed that one. It has some potential. Still, though, take a look at the block diagram on the datasheet: that actually looks like 1x PCI-X 133MHz slot and a bridge sharing 2x 100MHz slots. My benchmarks so far show that putting a controller on a 100MHz slot is measurably slower than 133MHz, but contention over a single bridge can be even worse. hth, adam
Richard Elling
2007-Sep-14 17:00 UTC
[zfs-discuss] hardware sizing for a zfs-based system?
comments from a RAS guy below... Adam Lindsay wrote:> Kent Watsen wrote: > >>> What are you *most* interested in for this server? Reliability? >>> Capacity? High Performance? Reading or writing? Large contiguous reads >>> or small seeks? >>> >>> One thing that I did that got a good feedback from this list was >>> picking apart the requirements of the most demanding workflow I >>> imagined for the machine I was speccing out. >> My first posting contained my use-cases, but I''d say that video >> recording/serving will dominate the disk utilization - thats why I''m >> pushing for 4 striped sets of RAIDZ2 - I think that it would be all >> around goodness > > It sounds good, that way, but (in theory), you''ll see random I/O suffer > a bit when using RAID-Z2: the extra parity will drag performance down a > bit. The RAS guys will flinch at this, but have you considered 8*(2+1) > RAID-Z1?Nit: small, random read I/O may suffer. Large random read or any random write workloads should be ok. For 24 data disks there are enough combinations that it is not easy to pick from. The attached RAIDoptimizer output may help you decide on the trade-offs. For description of the theory behind it, see my blog http://blogs.sun.com/relling I recommend loading it into StarOffice and using graphs or sorts to reorder the data, based on your priorities. Also, this uses a generic model, knowing the drive model will allow bandwidth analysis (with the caveats shown in Adam''s blog below).> I don''t want to over-pimp my links, but I do think my blogged > experiences with my server (also linked in another thread) might give > you something to think about: > http://lindsay.at/blog/archive/tag/zfs-performance/ > >>> I''m learning more and more about this subject as I test the server >>> (not all that dissimilar to what you''ve described, except with only 18 >>> disks) I now have. I''m frustrated at the relative unavailability of >>> PCIe SATA controller cards that are ZFS-friendly (i.e., JBOD), and the >>> relative unavailability of motherboards that support both the latest >>> CPUs as well as have a good PCI-X architecture. >> Good point - another reply I just sent noted a PCI-X sata controller >> card, but I''d prefer a PCIe card - do you have a recommendation on a >> PCIe card?Yes, I (obviously :-) recommend http://www.sun.com/storagetek/storage_networking/hba/sas/specs.xml Note: marketing still seems to have SATA-phobia, so if you search for SATA you''ll be less successful than searching for SAS. But many SAS HBAs support SATA devices, too. -- richard -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: RAIDoptimizer-24-disks.out URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070914/16aa8723/attachment.ksh>
Jonathan Loran
2007-Sep-14 18:53 UTC
[zfs-discuss] hardware sizing for a zfs-based system?
Sorry, but looking again at the RMP page, I see that the chassis I recommended is actually different than the one we have. I can''t find this chassis only online, but here''s what we bought: http://www.siliconmechanics.com/i10561/intel-storage-server.php?cat=625 From their picture gallery, you can''t see the back, but it has space for 3.5" drives in the back. You can put hot swap trays back there for your OS drives. The guys at Silicon Mechanics are great, so you could probably call them to ask who makes this chassis. They may also be able to build you a partial system, if you like. Good luck, Jon Kent Watsen wrote:> >> >> I will only comment on the chassis, as this is made by AIC (short for >> American Industrial Computer), and I have three of these in service >> at my work. These chassis are quite well made, but I have >> experienced the following two problems: >> >> <snip> > Oh my, thanks for the heads-up! Charlie at RMP said that they were > most popular - so I assumed that they were solid... > > >> For all new systems, I''ve gone with this chassis instead (I just >> noticed Rackmount Pro sells ''em also): >> >> http://rackmountpro.com/productpage.php?prodid=2043 > But I was hoping for resiliency and easy replacement for the OS drive > - hot-swap RAID1 seemed like a no-brainer... This case has 1 internal > and one external 3.5" drive bays. I could use a CF reader for > resiliency and reduce need for replacement - assuming I spool logs to > internal drive so as to not burn out the CF. Alternatively, I could > put a couple 2.5" drives into a single 3.5" bay for RAID1 resiliency, > but I''d have to shutdown to replace a drive... What do you recommend? > > >> One other thing, that you may know already. Rackmount Pro will try >> to sell you 3ware cards, which work great in the Linux/Windows >> environment, but aren''t supported in Open Solaris, even in JBOD >> mode. You will need alternate SATA host adapters for this application. > Indeed, but why pay for a RAID controller when you only need SATA > ports? - thats why I was thinking of picking up three of these bad > boys > (http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm) > for about $100 each > >> Good luck, > Getting there - can anybody clue me into how much CPU/Mem ZFS > needs? I have an old 1.2Ghz with 1Gb of mem laying around - would > it be sufficient? > > > Thanks! > Kent > > > > > >-- - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3
Won''t come cheap, but this mobo comes with 6x pci-x slots... should get the job done :) http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE-X.cfm This message posted from opensolaris.org
Go look at intel - they have a pretty decent mb with 6 sata ports Tim Cook wrote:> Won''t come cheap, but this mobo comes with 6x pci-x slots... should get the job done :) > > http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE-X.cfm > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- --will snow will.snow at sun.com Director, Web Engineering Sun Microsystems, Inc. http://www.sun.com
Kent Watsen wrote:> Getting there - can anybody clue me into how much CPU/Mem ZFS needs? > I have an old 1.2Ghz with 1Gb of mem laying around - would it be sufficient? > > >It''ll use as much memory as you can spare and it has a strong preference for 64 bit systems. Considering how much you are spending on the case and drives, it would be foolish to skimp on the motherboard CPU combination. Probably a 64 bit dual core with 4GB of (ECC) RAM would be a good starting point. Ian
On Sat, 15 Sep 2007, Ian Collins wrote:> Kent Watsen wrote: >> Getting there - can anybody clue me into how much CPU/Mem ZFS needs? >> I have an old 1.2Ghz with 1Gb of mem laying around - would it be sufficient? >> >> >> > It''ll use as much memory as you can spare and it has a strong preference > for 64 bit systems. Considering how much you are spending on the case > and drives, it would be foolish to skimp on the motherboard CPU combination. > > Probably a 64 bit dual core with 4GB of (ECC) RAM would be a good > starting point.Agreed. Or save a few $s on CPUs and install 8Gb+ of RAM. Unfortunately, if you go with Intel CPUs in a server type platform, you''ll end up with FB-DIMMs (Fully Buffered). These parts add about 5W per DIMM and it would be wise to blow air over the installed DIMMs. I notice that some of te Asus server boards come with FB-DIMM fans: http://usa.asus.com/products.aspx?l1=9&l2=39&l3=299&l4=0&model=1829&modelmenu=1 Happy Friday! Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
Tim Cook wrote:> Won''t come cheap, but this mobo comes with 6x pci-x slots... should get the job done :) > > http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE-X.cfmYes, but where do you buy SuperMicro toys? SuperMicro doesn''t sell online, anything "neat" that I''ve found is not in stock at CDW, other "preferred vendors" on their list all have nasty online catalogs (if any), yadda yadda. Rob++ -- Internet: windsor at warthog.com __o Life: Rob at Carrollton.Texas.USA.Earth _`\<,_ (_)/ (_) "They couldn''t hit an elephant at this distance." -- Major General John Sedgwick
Harold Ancell
2007-Sep-15 14:18 UTC
[zfs-discuss] good source for semi-obscure mobos and other components
At 11:23 PM 9/14/2007, Rob Windsor wrote:>Tim Cook wrote: >> Won''t come cheap, but this mobo comes with 6x pci-x slots... should get the job done :) >> >> http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE-X.cfm > >Yes, but where do you buy SuperMicro toys?Newegg is the first place to try, specifically: http://www.newegg.com/ProductSort/Brand.asp?Brand=1655&name=The-SUPERMICRO-Store-at-Newegg or http://tinyurl.com/36kza5 A specific search turned up two listings for the X7DBE-X at 500 +- 20 US$. I get my disks from ZipZoomfly since they know how to repackage them, which Newegg has been inconsistent about, but pretty much everything else but LCD displays (for which they have outrageous no return and very limited replacement policies) I get from them. You might find cheaper prices elsewhere, but as far as I know not at their level of reliability, speed of shipping, etc. and their selection is probably impossible to beat. I''ve been doing business with them since 2001, and this summer I built three machines from components bought almost entirely from them, making a total of seven over the years (this includes ones for friends and my father, all are still in service to my knowledge). FrozenCPU is a good source for some obscure things that Newegg doesn''t stock that you might want for cooling your server. They sell IR no contact hand held thermometers for not too much in case you''re wondering just how hot your chipset, on-board display adaptor, etc. are running. - Harold
Hey Adam,>> My first posting contained my use-cases, but I''d say that video >> recording/serving will dominate the disk utilization - thats why I''m >> pushing for 4 striped sets of RAIDZ2 - I think that it would be all >> around goodness > > It sounds good, that way, but (in theory), you''ll see random I/O > suffer a bit when using RAID-Z2: the extra parity will drag > performance down a bit.I know what you are saying, but I , wonder if it would be noticeable? I think my worst case scenario would be 3 myth frontends watching 1080p content while 4 tuners are recording 1080p content - with each 1080p stream being 27Mb/s, that would be 108Mb/s writes and 81Mb/s reads (all sequential I/O) - does that sound like it would even come close to pushing a 4(4+2) array?> The RAS guys will flinch at this, but have you considered 8*(2+1) > RAID-Z1?That configuration showed up in the output of the program I posted back in July (http://mail.opensolaris.org/pipermail/zfs-discuss/2007-July/041778.html): 24 bays w/ 500 GB drives having MTBF=5 years - can have 8 (2+1) w/ 0 spares providing 8000 GB with MTTDL of 95.05 years - can have 6 (2+2) w/ 0 spares providing 6000 GB with MTTDL of 28911.68 years - can have 4 (4+1) w/ 4 spares providing 8000 GB with MTTDL of 684.38 years - can have 4 (4+2) w/ 0 spares providing 8000 GB with MTTDL of 8673.50 years - can have 2 (8+1) w/ 6 spares providing 8000 GB with MTTDL of 380.21 years - can have 2 (8+2) w/ 4 spares providing 8000 GB with MTTDL of 416328.12 years But it is 91 times more likely to fail and this system will contain data that I don''t want to risk losing> I don''t want to over-pimp my links, but I do think my blogged > experiences with my server (also linked in another thread) might give > you something to think about: > http://lindsay.at/blog/archive/tag/zfs-performance/I see that you also set up a video server (myth?), from you blog, I think you are doing 5(2+1) (plus a hot-spare?) - this is what my program says about a 16-bay system: 16 bays w/ 500 GB drives having MTBF=5 years - can have 5 (2+1) w/ 1 spares providing 5000 GB with MTTDL of 1825.00 years - can have 4 (2+2) w/ 0 spares providing 4000 GB with MTTDL of 43367.51 years - can have 3 (4+1) w/ 1 spares providing 6000 GB with MTTDL of 912.50 years - can have 2 (4+2) w/ 4 spares providing 4000 GB with MTTDL of 2497968.75 years - can have 1 (8+1) w/ 7 spares providing 4000 GB with MTTDL of 760.42 years - can have 1 (8+2) w/ 6 spares providing 4000 GB with MTTDL of 832656.25 years Note that are MTTDL isn''t quite as bad as 8(2+1) since you have three less strips. Also, its interesting for me to note that have have 5 strips and my 4(4+2) setup would have just one less - so the question to answer if your extra strip is better than my 2 extra disks in each raid-set?> Testing 16 disks locally, however, I do run into noticeable I/O > bottlenecks, and I believe it''s down to the top limits of the PCI-X bus.Yes, too bad Supermicro doesn''t make a PCIe-based version... But still, the limit of a 64-bit, 133.3MHz PCI-X bus is 1067 MB/s whereas a 64-bit, 100MHz, PCI-X bus is 800MB/s - either way, its much faster than my worst case scenario from above where 7 1080p streams would be 189Mb/s...> > As far as a mobo with "good PCI-X architecture" - check out >> the latest from Tyan >> (http://tyan.com/product_board_detail.aspx?pid=523) - it has three >> 133/100MHz PCI-X slots > > I use a Tyan in my server, and have looked at a lot of variations, but > I hadn''t noticed that one. It has some potential. > > Still, though, take a look at the block diagram on the datasheet: that > actually looks like 1x PCI-X 133MHz slot and a bridge sharing 2x > 100MHz slots. My benchmarks so far show that putting a controller on a > 100MHz slot is measurably slower than 133MHz, but contention over a > single bridge can be even worse.Hmmm, I hadn''t thought about that... Here is another new mobo from Tyan (http://tyan.com/product_board_detail.aspx?pid=517) - its datasheet shows the PCI-X buses configured the same way as your S3892: Thanks! Kent
> Nit: small, random read I/O may suffer. Large random read or any random > write workloads should be ok.Given that video-serving is all sequential-read, is it correct that that raidz2, specifically 4(4+2), would be just fine?> For 24 data disks there are enough combinations that it is not easy to > pick from. The attached RAIDoptimizer output may help you decide on > the trade-offs.Wow! - thanks for running it with 24 disks!> For description of the theory behind it, see my blog > http://blogs.sun.com/rellingI used your theory to write my own program (posted in July), but your''s is way more complete> I recommend loading it into StarOfficeNice little plug ;-)> and using graphs or sorts to > reorder the data, based on your priorities.Interesting, my 4(4+2) has 282 iops, where as 8(2+1) has 565 iops - exactly double, which is kind of expected given that it has twice as many stripes)... Also, it helps to see that the iops extremes are 12(raid1) with 1694 iops and 2(10+2) with 141 iops - so 4(4+2) is not a great 24-disk performer but isn''t 282 iops is probably overkill for my home network? Yes, I (obviously :-) recommend> http://www.sun.com/storagetek/storage_networking/hba/sas/specs.xmlVery nice - think I''ll be getting 3 of these! Thanks, Kent
> > Sorry, but looking again at the RMP page, I see that the chassis I > recommended is actually different than the one we have. I can''t find > this chassis only online, but here''s what we bought: > > http://www.siliconmechanics.com/i10561/intel-storage-server.php?cat=625That is such a cool looking case!> From their picture gallery, you can''t see the back, but it has space > for 3.5" drives in the back. You can put hot swap trays back there > for your OS drives. The guys at Silicon Mechanics are great, so you > could probably call them to ask who makes this chassis. They may also > be able to build you a partial system, if you like.An excellent suggestion, but after configuring the nServ K501 (because I want quad-core AMD) the way I want it, there price is almost exactly the same a my thrifty-shopper price, unlike RackMountPro which seems to add about 20% overhead - so I''ll probably order the whole system from them, sans the Host Bus Adapter, as I''ll use the SUN card Richard suggested Thanks! Kent
Kent Watsen
2007-Sep-16 03:04 UTC
Re: [zfs-discuss] hardware sizing for a zfs-based system?
[CC-ing xen-discuss regarding question below]>> >> Probably a 64 bit dual core with 4GB of (ECC) RAM would be a good >> starting point. > > Agreed.So I was completely out of a the ball-park - I hope the ZFS Wiki can be updated to contain some sensible hardware-sizing information... One option I''m still holding on to is to also use the ZFS system as a Xen-server - that is OpenSolaris would be running in Dom0... Given that the Xen hypervisor has a pretty small cpu/memory footprint, do you think it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of Dom0 and bump the memory up 512MB? Thanks, Kent
David Edmondson
2007-Sep-16 09:34 UTC
Re: [zfs-discuss] hardware sizing for a zfs-based system?
> One option I''m still holding on to is to also use the ZFS system as a > Xen-server - that is OpenSolaris would be running in Dom0... Given > that > the Xen hypervisor has a pretty small cpu/memory footprint, do you > think > it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of > Dom0 and bump the memory up 512MB?A dom0 with 4G and 2 cores should be plenty to run ZFS and the support necessary for a reasonable (<16) paravirtualised domains. If the guest domains end up using HVM then the dom0 load is higher, but we haven''t done the work to quantify this properly yet. dme. -- David Edmondson, Solaris Engineering, http://dme.org
Heya Kent, Kent Watsen wrote:>> It sounds good, that way, but (in theory), you''ll see random I/O >> suffer a bit when using RAID-Z2: the extra parity will drag >> performance down a bit. > I know what you are saying, but I , wonder if it would be noticeable? IWell, "noticeable" again comes back to your workflow. As you point out to Richard, it''s (theoretically) 2x IOPS difference, which can be very significant for some people.> think my worst case scenario would be 3 myth frontends watching 1080p > content while 4 tuners are recording 1080p content - with each 1080p > stream being 27Mb/s, that would be 108Mb/s writes and 81Mb/s reads (all > sequential I/O) - does that sound like it would even come close to > pushing a 4(4+2) array?I would say no, not even close to pushing it. Remember, we''re measuring performance in MBytes/s, and video throughput is measured in Mbit/s (and even then, I imagine that a 27 Mbit/s stream over the air is going to be pretty rare). So I''m figuring you''re just scratching the surface of even a minimal array. Put it this way: can a single, modern hard drive keep up with an ADSL2+ (24 Mbit/s) connection? Throw 24 spindles at the problem, and I''d say you have headroom for a *lot* of streams.>> The RAS guys will flinch at this, but have you considered 8*(2+1) >> RAID-Z1? > That configuration showed up in the output of the program I posted back > in July > (http://mail.opensolaris.org/pipermail/zfs-discuss/2007-July/041778.html): > > 24 bays w/ 500 GB drives having MTBF=5 years > - can have 8 (2+1) w/ 0 spares providing 8000 GB with MTTDL of > 95.05 years > - can have 4 (4+2) w/ 0 spares providing 8000 GB with MTTDL of > 8673.50 years > > But it is 91 times more likely to fail and this system will contain data > that I don''t want to risk losingI wasn''t sure, with your workload. I know with mine, I''m seeing the data store as being mostly temporary. With that much data streaming in and out, are you planning on archiving *everything*? Cos that''s "only" one month''s worth of HD video. I''d consider tuning a portion of the array for high throughput, and another for high redundancy as an archive for whatever you don''t want to lose. Whether that''s by setting copies=2, or by having a mirrored zpool (smart for an archive, because you''ll be less sensitive to the write performance that suffers there), it''s up to you... ZFS gives us a *lot* of choices. (But then you knew that, and it''s what brought you to the list :)>> I don''t want to over-pimp my links, but I do think my blogged >> experiences with my server (also linked in another thread) might give >> you something to think about: >> http://lindsay.at/blog/archive/tag/zfs-performance/ > I see that you also set up a video server (myth?),For the uncompressed HD test case, no. It''d be for storage/playout of Ultra-Grid-like streams, and really, that''s there so our network guys can give their 10Gb links a little bit of a workout.> from you blog, I > think you are doing 5(2+1) (plus a hot-spare?) - this is what my > program says about a 16-bay system: > > 16 bays w/ 500 GB drives having MTBF=5 years > - can have 5 (2+1) w/ 1 spares providing 5000 GB with MTTDL of > 1825.00 years> [snipped some interesting numbers]> Note that are MTTDL isn''t quite as bad as 8(2+1) since you have three > less strips.I also committed to having at least one hot spare, which, after staring at relling''s graphs for days on end, seems to be the cheapest, easiest way of upping the MTTDL for any array. I''d recommend it.>Also, its interesting for me to note that have have 5 > strips and my 4(4+2) setup would have just one less - so the question to > answer if your extra strip is better than my 2 extra disks in each > raid-set?As I understand it, 5(2+1) would scale to better IOPS performance than 4(4+2), and IOPS represents the performance baseline; as you ask the array to do more and more at once, it''ll look more like random seeks. What you get from those bigger zvol groups of 4+2 is higher performance per zvol. That said, with my few datapoints on 4+1 RAID-Z groups (running on 2 controllers) suggest that that configuration runs into a bottleneck somewhere, and underperforms from what''s expected.>> Testing 16 disks locally, however, I do run into noticeable I/O >> bottlenecks, and I believe it''s down to the top limits of the PCI-X bus. > Yes, too bad Supermicro doesn''t make a PCIe-based version... But > still, the limit of a 64-bit, 133.3MHz PCI-X bus is 1067 MB/s whereas a > 64-bit, 100MHz, PCI-X bus is 800MB/s - either way, its much faster than > my worst case scenario from above where 7 1080p streams would be 189Mb/s...Oh, the bus will far exceed your needs, I think. The exercise is to specify something that handles what you need without breaking the bank, no? BTW, where are these HDTV streams coming from/going to? Ethernet? A capture card? (and which ones will work with Solaris?)>> Still, though, take a look at the block diagram on the datasheet: that >> actually looks like 1x PCI-X 133MHz slot and a bridge sharing 2x >> 100MHz slots. My benchmarks so far show that putting a controller on a >> 100MHz slot is measurably slower than 133MHz, but contention over a >> single bridge can be even worse. > Hmmm, I hadn''t thought about that... Here is another new mobo from Tyan > (http://tyan.com/product_board_detail.aspx?pid=517) - its datasheet > shows the PCI-X buses configured the same way as your S3892:Yeah, perhaps I''ve been a bit too circumspect about it, but I haven''t been all that impressed with my PCI-X bus configuration. Knowing what I know now, I might''ve spec''d something different. Of all the suggestions that''ve gone out on the list, I was most impressed with Tim Cook''s:> Won''t come cheap, but this mobo comes with 6x pci-x slots... should get the job done :) > > http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE-X.cfmThat has 3x 133MHz PCI-X slots each connected to the Southbridge via a different PCIe bus, which sounds worthy of being the core of the demi-Thumper you propose. ...But.... It all depends what you intend to spend. (This is what I was going to say in my next blog entry on the system:) We''re talking about benchmarks that are really far past what you say is your most taxing work load. I say I''m "disappointed" with the contention on my bus putting limits on maximum throughputs, but really, what I have far outstrips my ability to get data into or out of the system. So all of my "disappointment" is in theory. adam
> - can have 6 (2+2) w/ 0 spares providing 6000 GB with MTTDL of > 28911.68 yearsThis should, of course, set off one''s common-sense alert.> it is 91 times more likely to fail and this system will contain data > that I don''t want to risk losingIf you don''t want to risk losing data, you need multiple -- off-site -- copies. (Incidentally, I rarely see these discussions touch upon what sort of UPS is being used. Power fluctuations are a great source of correlated disk failures.) Anton This message posted from opensolaris.org
Kent Watsen
2007-Sep-17 03:05 UTC
Re: [zfs-discuss] hardware sizing for a zfs-based system?
David Edmondson wrote:>> One option I''m still holding on to is to also use the ZFS system as a >> Xen-server - that is OpenSolaris would be running in Dom0... Given that >> the Xen hypervisor has a pretty small cpu/memory footprint, do you think >> it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of >> Dom0 and bump the memory up 512MB? > > A dom0 with 4G and 2 cores should be plenty to run ZFS and the support > necessary for a reasonable (<16) paravirtualised domains. If the guest > domains end up using HVM then the dom0 load is higher, but we haven''t > done the work to quantify this properly yet.A tasty insight - a million thanks! I think if I get 2 quad-cores and 16Gb mem, I''d be able to stomach the overhead of 25%cpu and 25%mem going to the host - as the cost differential of have a dedicated SAN with another totally-redundant Xen box would be more expensive Cheers! Kent
>> I know what you are saying, but I , wonder if it would be noticeable? I > > Well, "noticeable" again comes back to your workflow. As you point out > to Richard, it''s (theoretically) 2x IOPS difference, which can be very > significant for some people.Yeah, but my point is if it would be noticeable to *me* (yes, I am a bit self-centered)> I would say no, not even close to pushing it. Remember, we''re > measuring performance in MBytes/s, and video throughput is measured in > Mbit/s (and even then, I imagine that a 27 Mbit/s stream over the air > is going to be pretty rare). So I''m figuring you''re just scratching > the surface of even a minimal array. > > Put it this way: can a single, modern hard drive keep up with an > ADSL2+ (24 Mbit/s) connection? > Throw 24 spindles at the problem, and I''d say you have headroom for a > *lot* of streams.Sweet! I should probably hang-up this thread now, but there are too many other juicy bits to respond too...> I wasn''t sure, with your workload. I know with mine, I''m seeing the > data store as being mostly temporary. With that much data streaming in > and out, are you planning on archiving *everything*? Cos that''s "only" > one month''s worth of HD video.Well, not to down-play the importance of my TV recordings, which is really a laugh because I''m not really a big TV watcher, I simply don''t want to ever have to think about this again after getting it setup> I''d consider tuning a portion of the array for high throughput, and > another for high redundancy as an archive for whatever you don''t want > to lose. Whether that''s by setting copies=2, or by having a mirrored > zpool (smart for an archive, because you''ll be less sensitive to the > write performance that suffers there), it''s up to you... > ZFS gives us a *lot* of choices. (But then you knew that, and it''s > what brought you to the list :)All true, but if 4(4+2) serves all my needs, I think that its simpler to administrate as I can arbitrarily allocate space as needed without needing to worry about what kind of space it is - all the space is "good and fast" space...> I also committed to having at least one hot spare, which, after > staring at relling''s graphs for days on end, seems to be the cheapest, > easiest way of upping the MTTDL for any array. I''d recommend it.No doubt that a hot-spare gives you a bump in MTTDL, but double-parity trumps it big time - check out Richard''s blog...> As I understand it, 5(2+1) would scale to better IOPS performance than > 4(4+2), and IOPS represents the performance baseline; as you ask the > array to do more and more at once, it''ll look more like random seeks. > > What you get from those bigger zvol groups of 4+2 is higher > performance per zvol. That said, with my few datapoints on 4+1 RAID-Z > groups (running on 2 controllers) suggest that that configuration runs > into a bottleneck somewhere, and underperforms from what''s expected.Er? Can anyone fill in the missing blank here?> Oh, the bus will far exceed your needs, I think. > The exercise is to specify something that handles what you need > without breaking the bank, no?Bank, smank - I build a system every 5+ years and I want it to kick ass all the way until I build the next one - cheers!> BTW, where are these HDTV streams coming from/going to? Ethernet? A > capture card? (and which ones will work with Solaris?)Glad you asked, for the lists sake, I''m using two HDHomeRun tuners (http://www.silicondust.com/wiki/products/hdhomerun) - actually, I bought 3 of them because I felt like I needed a spare :-D> Yeah, perhaps I''ve been a bit too circumspect about it, but I haven''t > been all that impressed with my PCI-X bus configuration. Knowing what > I know now, I might''ve spec''d something different. Of all the > suggestions that''ve gone out on the list, I was most impressed with > Tim Cook''s: > >> Won''t come cheap, but this mobo comes with 6x pci-x slots... should >> get the job done :) >> >> http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE-X.cfm >> > > That has 3x 133MHz PCI-X slots each connected to the Southbridge via a > different PCIe bus, which sounds worthy of being the core of the > demi-Thumper you propose.Yeah, but getting back to PCIe I see these tasty SAS/SATA HBAs from LSI: http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/lsisas3081er/index.html (note, LSI also sells matching PCI-X HBA controllers, in case you need to balance your mobo''s architecture]> ...But.... It all depends what you intend to spend. (This is what I > was going to say in my next blog entry on the system:) We''re talking > about benchmarks that are really far past what you say is your most > taxing work load. I say I''m "disappointed" with the contention on my > bus putting limits on maximum throughputs, but really, what I have far > outstrips my ability to get data into or out of the system.So moving to the PCIe-based cards should fix that - no?> So all of my "disappointment" is in theory.Seems like this should be a classic quote, but a google-search on "disappointment is in theory" only turns up this list - seriously, only one result... Best, Kent
>> - can have 6 (2+2) w/ 0 spares providing 6000 GB with MTTDL of >> 28911.68 years >> > > This should, of course, set off one''s common-sense alert. >So true, I pointed the same thing out in this list a while back [sorry, can''t find the link] where it was beyond my lifetime and folks responded that the "years" unit should not ne taken literally - there are way too many variables to cause wild mischief with these theoretical numbers> If you don''t want to risk losing data, you need multiple -- off-site -- copies. >Har, har - like I''m going to do that for my personal family archive ;)> (Incidentally, I rarely see these discussions touch upon what sort of UPS is being used. Power fluctuations are a great source of correlated disk failures.) >Glad you brought that up - I currently have an APC 2200XL (http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=SU2200XLNET) - its rated for 1600 watts, but my current case selections are saying they have a 1500W 3+1, should I be worried? Thanks! Kent
Kent Watsen wrote:> > Glad you brought that up - I currently have an APC 2200XL > (http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=SU2200XLNET) > - its rated for 1600 watts, but my current case selections are saying > they have a 1500W 3+1, should I be worried? > >Probably not, my box has 10 drives and two very thirsty FX74 processors and it draws 450W max. At 1500W, I''d be more concerned about power bills and cooling than the UPS! Ian
and it draws 450W max. At 1500W, I''d be more concerned about power bills and cooling than the UPS! Yeah - good point, but I need my TV! - or so I tell my wife so I can play with all this gear :-X Cheers, Kent _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Al Hopper
2007-Sep-17 14:42 UTC
[zfs-discuss] OT zfs system UPS sizing was Re: hardware sizing for a zfs-based system?
On Mon, 17 Sep 2007, Kent Watsen wrote: ... snip ...>> (Incidentally, I rarely see these discussions touch upon what sort of UPS is being used. Power fluctuations are a great source of correlated disk failures.) >> > > Glad you brought that up - I currently have an APC 2200XL > (http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=SU2200XLNET) > - its rated for 1600 watts, but my current case selections are saying > they have a 1500W 3+1, should I be worried?Bear in mind that you must not exceed *either* the VA or the Wattage ratings. So, for example, if your UPS is 2200VA/1600W and your combined systems consumed 2000VA and 1700W - its a no go (exceeds the wattage rating). This is usually not an issue with newer power supplies with power factor correction (PFC). If the PFC = 1.0 (ideal) then VA rating = Wattage (rating). Recommendation: Measure it with a Seasonic Power Angel (froogle for seasonic ssm-1508ra) which works well, or try the kill-a-watt (have no experience with it). Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/