Phillip Fiedler
2007-May-17 16:30 UTC
[zfs-discuss] ZFS - Use h/w raid or not? Thoughts. Considerations.
[b]Given[/b]: A Solaris 10 u3 server with an externally attached disk array with RAID controller(s) [b]Question[/b]: Is it better to create a zpool from a [u]single[/u] external LUN on an external disk array, or is it better to use no RAID on the disk array and just present individual disks to the server and let ZFS take care of the RAID? [b]Example[/b]: I have a Solaris 10 server with an attached 3320 disk array and want to use ZFS instead of UFS. Should I create a single RAID protected LUN on the 3320 and build a single disk ZFS pool on it, or should I setup each disk in the 3320 as an "NRAID" (aka individual disk) to be used by the server in a multi-disk zfs pool? [b]Thoughts[/b]: Performance, reliability, mobility? Considerations: With hardware raid, zfs is dependent upon the raid controllers. If using individual disks, the disks may be put into another disk array and imported..? If I create one or more RAID groups on the 3320 and make a single disk zpool on each RAID disk, I could still use the 3320 "hot spare" feature to help protect from a disk failure..? This message posted from opensolaris.org
Robert Milkowski
2007-May-17 22:24 UTC
[zfs-discuss] ZFS - Use h/w raid or not? Thoughts. Considerations.
Hello Phillip, Thursday, May 17, 2007, 6:30:38 PM, you wrote: PF> [b]Given[/b]: A Solaris 10 u3 server with an externally attached PF> disk array with RAID controller(s) PF> [b]Question[/b]: Is it better to create a zpool from a PF> [u]single[/u] external LUN on an external disk array, or is it PF> better to use no RAID on the disk array and just present PF> individual disks to the server and let ZFS take care of the RAID? There''s no simple answer :))) Depends on things like what kind of RAID you want to use and what is your IO characteristic. For example if you want to use RAID-5 and you are issuing lot of small asynchronous writes then RAID-Z should give you more performance than HW RAID-5. But if you want RAID-5 and lot of concurent random small reads then HW RAID-5 will be generally better than RAID-Z. If your workload is mixed then... well. However putting ZFS on top of HW RAID-5 in many cases mitigates write problem due to "converting" most random writes into sequential aggregated writes. Then other thing - do you use SATA disks? How much data loss or corruption is an issue for you? Doing software RAID in ZFS can detect AND correct such problems. HW RAID also can but to much less extent. On the other hand current hot spare support in ZFS is lacking and generally when it comes to data availability HW RAID offers better hot spare support. It''s being worked on. But you may want to buy JBOD and not spend money on controllers which can save you some money. Then once your environment grows and you end up with many arrays from different vendors going with SW solutions simplifies and unifies storage management. Se my blog for some performance comparisons for HW RAID vs. SW RAID in ZFS. PF> [b]Example[/b]: I have a Solaris 10 server with an attached 3320 PF> disk array and want to use ZFS instead of UFS. Should I create a PF> single RAID protected LUN on the 3320 and build a single disk ZFS PF> pool on it, or should I setup each disk in the 3320 as an "NRAID" PF> (aka individual disk) to be used by the server in a multi-disk zfs pool? see above Personally I would go with ZFS entirely in most cases. -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Chad Mynhier
2007-May-18 22:54 UTC
[zfs-discuss] ZFS - Use h/w raid or not? Thoughts. Considerations.
On 5/17/07, Robert Milkowski <rmilkowski at task.gda.pl> wrote:> Hello Phillip, > > Thursday, May 17, 2007, 6:30:38 PM, you wrote: > > PF> [b]Given[/b]: A Solaris 10 u3 server with an externally attached > PF> disk array with RAID controller(s) > > PF> [b]Question[/b]: Is it better to create a zpool from a > PF> [u]single[/u] external LUN on an external disk array, or is it > PF> better to use no RAID on the disk array and just present > PF> individual disks to the server and let ZFS take care of the RAID? > > > Then other thing - do you use SATA disks? How much data loss or > corruption is an issue for you? Doing software RAID in ZFS can detect > AND correct such problems. HW RAID also can but to much less extent. >I think this point needs to be emphasized. If reliability is a prime concern, you absolutely want to let ZFS handle redundancy in one way or another, either as mirrogin or as raidz. You can think of redundancy in ZFS as much the same thing as packet retransmission in TCP. If the data comes through bad the first time, checksum verification will catch it, and you get a second chance to get the correct data. A single-LUN zpool is the moral equivalent of disabling retransmission in TCP. Chad Mynhier
Phillip Fiedler
2007-May-21 20:50 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Thanks for the input. So, I''m trying to meld the two replies and come up with a direction for my case and maybe a "rule of thumb" that I can use in the future (i.e., near future until new features come out in zfs) when I have external storage arrays that have built in RAID. At the moment, I''m hearing that using h/w raid under my zfs may be better for some workloads and the h/w hot spare would be nice to have across multiple raid groups, but the checksum capabilities in zfs are basically nullified with single/multiple h/w lun''s resulting in "reduced protection." Therefore, it sounds like I should be strongly leaning towards not using the hardware raid in external disk arrays and use them like a JBOD. When will Sun have "global hot spare" capability? This message posted from opensolaris.org
Paul Armstrong
2007-May-22 03:24 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
There isn''t a global hot spare, but you can add a hot spare to multiple pools. Paul This message posted from opensolaris.org
MC
2007-May-22 03:40 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
> Personally I would go with ZFS entirely in most cases.That''s the rule of thumb :) If you have a fast enough CPU and enough RAM, do everything with ZFS. This sounds koolaid-induced, but you''ll need nothing else because ZFS does it all. My second personal rule of thumb concerns RAIDZ performance. Benchmarks that were posted here in the past showed that RAIDZ worked best with no more than 4 or 5 disks per array. After that, certain types of performance dropped off pretty hard. So if top performance matters and you can handle doing 4-5 disk arrays, that is a smart path to take. This message posted from opensolaris.org
Torrey McMahon
2007-May-22 04:13 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Phillip Fiedler wrote:> Thanks for the input. So, I''m trying to meld the two replies and come up with a direction for my case and maybe a "rule of thumb" that I can use in the future (i.e., near future until new features come out in zfs) when I have external storage arrays that have built in RAID. > > At the moment, I''m hearing that using h/w raid under my zfs may be better for some workloads and the h/w hot spare would be nice to have across multiple raid groups, but the checksum capabilities in zfs are basically nullified with single/multiple h/w lun''s resulting in "reduced protection." Therefore, it sounds like I should be strongly leaning towards not using the hardware raid in external disk arrays and use them like a JBOD.The bit ... the checksum capabilities in zfs are basically nullified with single/multiple h/w lun''s resulting in "reduced protection." is not accurate. With one large LUN, then yes, you can only detect errors. With multiple LUNs in a mirror or RAIDZ{2} then you can correct errors. The big reasons for continuing to use hw raid is speed, in some cases, and heterogeneous environments where you can''t farm out non-raid protected LUNs and raid protected LUNs from the same storage array. In some cases the array will require a raid protection setting, like the 99x0, before you can even start farming out storage.
Richard Elling
2007-May-22 04:24 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
More redundancy below... Torrey McMahon wrote:> Phillip Fiedler wrote: >> Thanks for the input. So, I''m trying to meld the two replies and come >> up with a direction for my case and maybe a "rule of thumb" that I can >> use in the future (i.e., near future until new features come out in >> zfs) when I have external storage arrays that have built in RAID. >> >> At the moment, I''m hearing that using h/w raid under my zfs may be >> better for some workloads and the h/w hot spare would be nice to have >> across multiple raid groups, but the checksum capabilities in zfs are >> basically nullified with single/multiple h/w lun''s resulting in >> "reduced protection." Therefore, it sounds like I should be strongly >> leaning towards not using the hardware raid in external disk arrays >> and use them like a JBOD. > > The bit ... > > the checksum capabilities in zfs are basically nullified with > single/multiple h/w lun''s resulting in "reduced protection." > > is not accurate. With one large LUN, then yes, you can only detect > errors. With multiple LUNs in a mirror or RAIDZ{2} then you can correct > errors.You can also add redundancy with ZFS filesystem copies parameter. This is similar to, but not the same as mirroring.> The big reasons for continuing to use hw raid is speed, in some cases, > and heterogeneous environments where you can''t farm out non-raid > protected LUNs and raid protected LUNs from the same storage array. In > some cases the array will require a raid protection setting, like the > 99x0, before you can even start farming out storage.Yes. ZFS data protection builds on top of this. You always gain a benefit when the data protection is done as close to the application as possible -- as opposed to implementing the data protection as close to the storage as possible. -- richard
Pål Baltzersen
2007-May-22 12:04 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
What if your HW-RAID-controller dies? in say 2 years or more.. What will read your disks as a configured RAID? Do you know how to (re)configure the controller or restore the config without destroying your data? Do you know for sure that a spare-part and firmware will be identical, or at least compatible? How good is your service subscription? Maybe only scrapyards and museums will have what you had. =o With ZFS/JBOD you will be safe; just get a new controller (or server) -- any kind that is protocol-compatible (and OS-complatible) you may have floating around (SATA2 | SCSI | FC..) -- and zpool import :) And you can safely buy the latest&gratest and come out with something better than you had. With ZFS I prefer JBOD. For performance you may want external HW-RAIDs (>=2) and let ZFS mirror them as virtual JBOD. -- depends where the bottleneck is; I/O or spindle. I disable any RAID-features on internal RAID-chipsets (nForce etc.). This message posted from opensolaris.org
Moore, Joe
2007-May-22 13:00 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts.Considerations.
> Therefore, it sounds like I should be strongly leaning > towards not using the hardware raid in external disk arrays > and use them like a JBOD. >Another thing to consider is the transparency that Solaris or a general-purpose operating system gives for the purpose of troubleshooting. For example, there''s no way to run Dtrace on the ASIC that''s doing your hardware RAID, to show you exactly where your bottleneck is (even per-disk iostat isn''t available in most cases) How would you determine that your application''s read stride size is causing one of the component disks to be a hot spot? Also, the more information the OS knows about the layout of the disk, the better the I/O scheduler can reorder operations to optimize seeks. These were crucial points when we were thinking about our next "big disk array" purchase. It''s for a disk-to-disk backup server (very large sequential reads and writes issued by a mostly-idle CPU) so we''re mostly limited by actual spindle throughput and (since it''s on cheaper/slower disks) seek time. --Joe
Louwtjie Burger
2007-May-22 14:01 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
On 5/22/07, P?l Baltzersen <pal at baltzersen.name> wrote:> What if your HW-RAID-controller dies? in say 2 years or more.. > What will read your disks as a configured RAID? Do you know how to (re)configure the >controller or restore the config without destroying your data? Do you know for sure that a >spare-part and firmware will be identical, or at least compatible? How good is your service >subscription? Maybe only scrapyards and museums will have what you had. =oBe careful when talking about RAID controllers in general. They are not created equal! I can remove all of my disks from my RAID controller, reshuffle them and put them back in a new random order.. my controller will continue functioning correctly. I can remove my RAID controller and replace with a similar firmware (or higher), and my volumes will continue to live with correct initiators blacklisted/not. Hardware raid controllers have done the job for many years ... I''m a little bit concerned about the new message (from some) out there that they are " no good " anymore. Given the code on those controllers are probably not as elegant as zfs ... and given my personal preference of "being in control", I cannot dismiss the fact that some of these storage units are fast as hell, especially when you start piling on the pressure! I''m also interested to see how Sun handles this phenomenon, and how they position zfs so that it doesn''t eat into their high-margin (be it low turnover) Storagetek block storage. I''m also interested to see whether they will release a product dedicated for a solaris/zfs environment. Interesting times... PS: I''ve also noticed some persperation on the heads of some Symantec account managers. :)
Toby Thain
2007-May-22 14:40 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
On 22-May-07, at 11:01 AM, Louwtjie Burger wrote:> On 5/22/07, P?l Baltzersen <pal at baltzersen.name> wrote: >> What if your HW-RAID-controller dies? in say 2 years or more.. >> What will read your disks as a configured RAID? Do you know how to >> (re)configure the >controller or restore the config without >> destroying your data? Do you know for sure that a >spare-part and >> firmware will be identical, or at least compatible? How good is >> your service >subscription? Maybe only scrapyards and museums will >> have what you had. =o > > Be careful when talking about RAID controllers in general. They are > not created equal! ... > Hardware raid controllers have done the job for many years ...Not quite the same job as ZFS, which offers integrity guarantees that RAID subsystems cannot.> I''m a > little bit concerned about the new message (from some) out there that > they are " no good " anymore. Given the code on those controllers are > probably not as elegant as zfs ... and given my personal preference of > "being in control", I cannot dismiss the fact that some of these > storage units are fast as hell, ..."Being in control" may mean *avoiding* black box RAID hardware in favour of inspectable & maintainable open source software, which was the earlier poster''s point. --Toby
Robert Milkowski
2007-May-23 07:07 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Hello Phillip, Monday, May 21, 2007, 10:50:01 PM, you wrote: PF> When will Sun have "global hot spare" capability? There''s - I mean you can add the same hot spares to different zpools. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Nicolas Williams
2007-May-23 07:09 UTC
[zfs-discuss] ZFS - Use h/w raid or not? Thoughts. Considerations.
If you''ve got the internal system bandwidth to drive all drives then RAID-Z is definitely superior to HW RAID-5. Same with mirroring. HW RAID can offload some I/O bandwidth from the system, but new systems, like Thumper, should have more than enough bandwidth, so why bother with HW RAID? Nico --
Louwtjie Burger
2007-May-23 10:26 UTC
[zfs-discuss] ZFS - Use h/w raid or not? Thoughts. Considerations.
> HW RAID can offload some I/O bandwidth from the system, but new systems, > like Thumper, should have more than enough bandwidth, so why bother with > HW RAID? >*devils advocate mode = on* Why bother you say... I''ll asked the Storagetek division this, next time they come round asking (begging?) me to sell more kit. Guys, everything is HW RAID! .. why bother.. I''ll rather sell zfs! What is the margin on zfs? Nothing!!! OMG. Let''s sell services around zfs .. err, it''s as easy to operate as switching on a light ... no takers? Hang on ... I can sell a Thumper ... but... 99% of my clients have heterougenious environments and NFS are not exactly Oracle DB material. Don''t worry .. I''l target the Solaris only players ...Anybody there? ... (insert echo here) *sigh* I wish I was a programmer working on zfs ... then all the realities of being a sun partner just breezed past. *devil''s advocate mode off* PS: Please don''t flame .. bit of a tongue in the cheek moment. *hehe*
Brad Plecs
2007-May-23 10:38 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
> > At the moment, I''m hearing that using h/w raid under my zfs may be > >better for some workloads and the h/w hot spare would be nice to > >have across multiple raid groups, but the checksum capabilities in > >zfs are basically nullified with single/multiple h/w lun''s > >resulting in "reduced protection." Therefore, it sounds like I > >should be strongly leaning towards not using the hardware raid in > >external disk arrays and use them like a JBOD.> The big reasons for continuing to use hw raid is speed, in some cases, > and heterogeneous environments where you can''t farm out non-raid > protected LUNs and raid protected LUNs from the same storage array. In > some cases the array will require a raid protection setting, like the > 99x0, before you can even start farming out storage.Just a data point -- I''ve had miserable luck with ZFS JBOD drives failing. They consistently wedge my machines (Ultra-45, E450, V880, using SATA, SCSI drives) when one of the drives fails. The system recovers okay and without data loss after a reboot, but a total drive failure (when a drive stops talking to the system) is not handled well. Therefore I would recommend a hardware raid for high-availability applications. Note, it''s not clear that this is a ZFS problem. I suspect it''s a solaris or hardware controller or driver problem, so this may not be an issue if you find a controller that doesn''t freak on a drive failure. BP -- bplecs at cs.umd.edu
Anton B. Rang
2007-May-24 02:32 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
> If you''ve got the internal system bandwidth to drive all drives then RAID-Z is definitely > superior to HW RAID-5. Same with mirroring.You''ll need twice as much I/O bandwidth as with a hardware controller, plus the redundancy, since the reconstruction is done by the host. For instance, to be equivalent to the performance of a mirrored array on a single 4 Gb FC channel, you need to use four 4 Gb FC channels, at least if you can''t tolerate a 50% degradation during reconstruction; or two 4 Gb FC channels if you don''t mind the performance loss during reconstruction. RAID-Z also uses system CPU and memory bandwidth, which is fine for file servers since they''re normally overprovisioned there anyway, but may be less appropriate for some other uses.> HW RAID can offload some I/O bandwidth from the system, but new systems, > like Thumper, should have more than enough bandwidth, so why bother with > HW RAID?Thumper seems to be designed as a file server (but curiously, not for high availability). It''s got plenty of I/O bandwidth. Mid-range and high-end servers, though, are starved of I/O bandwidth relative to their CPU & memory. This is particularly true for Sun''s hardware. Anton This message posted from opensolaris.org
Richard Elling
2007-May-24 16:10 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Anton B. Rang wrote:> Thumper seems to be designed as a file server (but curiously, not for high availability).hmmm... Often people think that because a system is not clustered, then it is not designed to be highly available. Any system which provides a single view of data (eg. a persistent storage device) must have at least one single point of failure. The 4 components in a system which break most often are: fans, power supplies, disks, and DIMMs. You will find that most servers, including thumper, has redundancy to cover these failure modes. We''ve done extensive modelling and measuring of these systems and think that we have hit a pretty good balance of availability and cost. A thumper is not a STK9990V, nor does it cost nearly as much. Incidentally, thumper field reliability is better than we expected. This is causing me to do extra work, because I have to explain why.> It''s got plenty of I/O bandwidth. Mid-range and high-end servers, though, are starved of > I/O bandwidth relative to their CPU & memory. This is particularly true for Sun''s hardware.Please tell us how many storage arrays are required to meet a theoretical I/O bandwidth of 244 GBytes/s? Note: I have to say theoretical bandwidth here because no such system has ever been built for testing, and such a system would be very, very expensive. -- richard
Dave Fisk
2007-May-24 16:57 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
> Please tell us how many storage arrays are required to meet atheoretical I/O bandwidth of 244 GBytes/s? Just considering disks, you need approximately 6,663 all streaming 50 MB/sec with RAID-5 3+1 (for example). That is assuming sustained large block sequential I/O. If you have 8 KB Random I/O you need somewhere between 284,281 and 426,421 disks each delivering between 100 and 150 IOPS. Dave Richard Elling wrote:> Anton B. Rang wrote: >> Thumper seems to be designed as a file server (but curiously, not for >> high availability). > > hmmm... Often people think that because a system is not clustered, > then it is not > designed to be highly available. Any system which provides a single > view of data > (eg. a persistent storage device) must have at least one single point > of failure. > The 4 components in a system which break most often are: fans, power > supplies, disks, > and DIMMs. You will find that most servers, including thumper, has > redundancy to > cover these failure modes. We''ve done extensive modelling and > measuring of these > systems and think that we have hit a pretty good balance of > availability and cost. > A thumper is not a STK9990V, nor does it cost nearly as much. > > Incidentally, thumper field reliability is better than we expected. > This is causing > me to do extra work, because I have to explain why. > >> It''s got plenty of I/O bandwidth. Mid-range and high-end servers, >> though, are starved of I/O bandwidth relative to their CPU & memory. >> This is particularly true for Sun''s hardware. > > Please tell us how many storage arrays are required to meet a > theoretical I/O bandwidth of > 244 GBytes/s? Note: I have to say theoretical bandwidth here because > no such system has > ever been built for testing, and such a system would be very, very > expensive. > -- richard > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- Dave Fisk, ORtera Inc. http://www.ORtera.com
Anton B. Rang
2007-May-24 17:48 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Richard wrote:> Any system which provides a single view of data (eg. a persistent storage > device) must have at least one single point of failure.Why? Consider this simple case: A two-drive mirrored array. Use two dual-ported drives, two controllers, two power supplies, arranged roughly as follows: -- controller-A <=> disk A <=> controller-B -- \ / \ / \ disk B / Remind us where the single point of failure is in this arrangement? Seriously, I think it''s pretty clear that high-end storage hardware is built to eliminate single points of failure. I don''t think that NetApp, LSI Logic, IBM, etc. would agree with your contention. But maybe I''m missing something; is there some more fundamental issue? Do you mean that the entire system is a single point of failure, if it''s the only copy of said data? That would be a tautology.... I had written:> Mid-range and high-end servers, though, are starved of I/O bandwidth > relative to their CPU & memory. This is particularly true for Sun''s hardware.and Richard had asked (rhetorically?)> Please tell us how many storage arrays are required to meet a > theoretical I/O bandwidth of 244 GBytes/s?My point is simply that, on most non-file-server hardware, the I/O bandwidth available is not sufficient to keep all CPUs busy. Host-based RAID can make things worse since it takes away from the bandwidth available for user jobs. Consider a Sun Fire 25K; the theoretical I/O bandwidth is 35 GB/sec (IIRC that''s the full-duplex number) while its 144 processors could do upwards of 259 GFlops. That''s 0.14 bytes/flop. To answer your rhetorical question, the DSC9550 does 3 GB/second for reads and writes (doing RAID 6 and with hardware parity checks on reads -- nice!), so you''d need 82 arrays. In real life (an actual file system), ASC Purple with GPFS got 102 GB/sec using 416 arrays. -- Anton This message posted from opensolaris.org
Darren J Moffat
2007-May-24 17:52 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Anton B. Rang wrote:> Richard wrote: >> Any system which provides a single view of data (eg. a persistent storage >> device) must have at least one single point of failure. > > Why? > > Consider this simple case: A two-drive mirrored array. > > Use two dual-ported drives, two controllers, two power supplies, > arranged roughly as follows: > > -- controller-A <=> disk A <=> controller-B -- > \ / > \ / > \ disk B / > > Remind us where the single point of failure is in this arrangement?The single instance of the operating system you are running if you aren''t running in a cluster. -- Darren J Moffat
Frank Fitch
2007-May-24 18:11 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Anton B. Rang wrote:> Richard wrote: >> Any system which provides a single view of data (eg. a persistent storage >> device) must have at least one single point of failure. > > Why? > > Consider this simple case: A two-drive mirrored array. > > Use two dual-ported drives, two controllers, two power supplies, > arranged roughly as follows: > > -- controller-A <=> disk A <=> controller-B -- > \ / > \ / > \ disk B / > > Remind us where the single point of failure is in this arrangement? >disk backplane? Regards, -Frank
Richard Elling
2007-May-24 19:51 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Anton B. Rang wrote:> Richard wrote: >> Any system which provides a single view of data (eg. a persistent storage >> device) must have at least one single point of failure. > > Why? > > Consider this simple case: A two-drive mirrored array. > > Use two dual-ported drives, two controllers, two power supplies, > arranged roughly as follows: > > -- controller-A <=> disk A <=> controller-B -- > \ / > \ / > \ disk B / > > Remind us where the single point of failure is in this arrangement?The software which provides the single view of the data.> Seriously, I think it''s pretty clear that high-end storage hardware is built to > eliminate single points of failure. I don''t think that NetApp, LSI Logic, IBM, > etc. would agree with your contention. But maybe I''m missing something; is > there some more fundamental issue? Do you mean that the entire system is a > single point of failure, if it''s the only copy of said data? That would be a > tautology....Does anyone believe that the software or firmware in these systems is infallible? For all possible failure modes in the system?> I had written: > >> Mid-range and high-end servers, though, are starved of I/O bandwidth >> relative to their CPU & memory. This is particularly true for Sun''s hardware. > > and Richard had asked (rhetorically?) > >> Please tell us how many storage arrays are required to meet a >> theoretical I/O bandwidth of 244 GBytes/s? > > My point is simply that, on most non-file-server hardware, the I/O bandwidth > available is not sufficient to keep all CPUs busy. Host-based RAID can make > things worse since it takes away from the bandwidth available for user jobs. > Consider a Sun Fire 25K; the theoretical I/O bandwidth is 35 GB/sec (IIRC > that''s the full-duplex number) while its 144 processors could do upwards of > 259 GFlops. That''s 0.14 bytes/flop.Consider something more current. The M9000 has 244 GBytes/s of theoretical I/O bandwidth. Its been measured at 1.228 TFlops (peak). So we see a ratio of 0.19 bytes/flop. But this ratio doesn''t mean much, since there doesn''t seem to be a storage system that big connected to a single OS instance -- yet :-) When people make this claim of bandwidth limitation, we often find that the inherent latency limitation is more problematic. For example, we can get good memory bandwidth from DDR2 DIMMs, which we collect into 8-wide banks. But we can''t get past the latency of DRAM access. Similarly, we can get upwards of 100 MBytes/s media bandwidth from a fast, large disk, but can''t get past the 4.5 ms seek or 4.1 ms rotational delay time. It is this latency issue which effectively killed software RAID-5 (read-modify-write). Fortunately, ZFS''s raidz is designed to avoid the need to do a read-modify-write.> To answer your rhetorical question, the DSC9550 does 3 GB/second for reads > and writes (doing RAID 6 and with hardware parity checks on reads -- nice!), > so you''d need 82 arrays. In real life (an actual file system), ASC Purple with > GPFS got 102 GB/sec using 416 arrays.Yeah, this impressive, but parallel (multi system/multi storage), so it is really apples and oranges. -- richard
Torrey McMahon
2007-May-25 04:22 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Toby Thain wrote:> > On 22-May-07, at 11:01 AM, Louwtjie Burger wrote: > >> On 5/22/07, P?l Baltzersen <pal at baltzersen.name> wrote: >>> What if your HW-RAID-controller dies? in say 2 years or more.. >>> What will read your disks as a configured RAID? Do you know how to >>> (re)configure the >controller or restore the config without >>> destroying your data? Do you know for sure that a >spare-part and >>> firmware will be identical, or at least compatible? How good is your >>> service >subscription? Maybe only scrapyards and museums will have >>> what you had. =o >> >> Be careful when talking about RAID controllers in general. They are >> not created equal! ... >> Hardware raid controllers have done the job for many years ... > > Not quite the same job as ZFS, which offers integrity guarantees that > RAID subsystems cannot.Depend on the guarantees. Some RAID systems have built in block checksumming.
Nathan Kroenert
2007-May-25 04:32 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Which has little benefit if it''s the HBA or the Array internals change the meaning of the message... That''s the whole point of ZFS''s checksumming - It''s end to end... Nathan. Torrey McMahon wrote:> Toby Thain wrote: >> >> On 22-May-07, at 11:01 AM, Louwtjie Burger wrote: >> >>> On 5/22/07, P?l Baltzersen <pal at baltzersen.name> wrote: >>>> What if your HW-RAID-controller dies? in say 2 years or more.. >>>> What will read your disks as a configured RAID? Do you know how to >>>> (re)configure the >controller or restore the config without >>>> destroying your data? Do you know for sure that a >spare-part and >>>> firmware will be identical, or at least compatible? How good is your >>>> service >subscription? Maybe only scrapyards and museums will have >>>> what you had. =o >>> >>> Be careful when talking about RAID controllers in general. They are >>> not created equal! ... >>> Hardware raid controllers have done the job for many years ... >> >> Not quite the same job as ZFS, which offers integrity guarantees that >> RAID subsystems cannot. > > Depend on the guarantees. Some RAID systems have built in block > checksumming. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Torrey McMahon
2007-May-25 04:42 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
I did say depends on the guarantees, right? :-) My point is that all hw raid systems are not created equally. Nathan Kroenert wrote:> Which has little benefit if it''s the HBA or the Array internals change > the meaning of the message... > > That''s the whole point of ZFS''s checksumming - It''s end to end... > > Nathan. > > Torrey McMahon wrote: >> Toby Thain wrote: >>> >>> On 22-May-07, at 11:01 AM, Louwtjie Burger wrote: >>> >>>> On 5/22/07, P?l Baltzersen <pal at baltzersen.name> wrote: >>>>> What if your HW-RAID-controller dies? in say 2 years or more.. >>>>> What will read your disks as a configured RAID? Do you know how to >>>>> (re)configure the >controller or restore the config without >>>>> destroying your data? Do you know for sure that a >spare-part and >>>>> firmware will be identical, or at least compatible? How good is >>>>> your service >subscription? Maybe only scrapyards and museums will >>>>> have what you had. =o >>>> >>>> Be careful when talking about RAID controllers in general. They are >>>> not created equal! ... >>>> Hardware raid controllers have done the job for many years ... >>> >>> Not quite the same job as ZFS, which offers integrity guarantees >>> that RAID subsystems cannot. >> >> Depend on the guarantees. Some RAID systems have built in block >> checksumming. >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Casper.Dik at Sun.COM
2007-May-25 08:27 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
>Depend on the guarantees. Some RAID systems have built in block >checksumming.But we all know that block checksums stored with the blocks do not catch a number of common errors. (Ghost writes, misdirected writes, misdirected reads) Casper
Toby Thain
2007-May-25 12:45 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
On 25-May-07, at 1:22 AM, Torrey McMahon wrote:> Toby Thain wrote: >> >> On 22-May-07, at 11:01 AM, Louwtjie Burger wrote: >> >>> On 5/22/07, P?l Baltzersen <pal at baltzersen.name> wrote: >>>> What if your HW-RAID-controller dies? in say 2 years or more.. >>>> What will read your disks as a configured RAID? Do you know how >>>> to (re)configure the >controller or restore the config without >>>> destroying your data? Do you know for sure that a >spare-part >>>> and firmware will be identical, or at least compatible? How good >>>> is your service >subscription? Maybe only scrapyards and museums >>>> will have what you had. =o >>> >>> Be careful when talking about RAID controllers in general. They are >>> not created equal! ... >>> Hardware raid controllers have done the job for many years ... >> >> Not quite the same job as ZFS, which offers integrity guarantees >> that RAID subsystems cannot. > > Depend on the guarantees. Some RAID systems have built in block > checksumming. >Which still isn''t the same. Sigh. --T
Torrey McMahon
2007-May-25 13:00 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Toby Thain wrote:> > On 25-May-07, at 1:22 AM, Torrey McMahon wrote: > >> Toby Thain wrote: >>> >>> On 22-May-07, at 11:01 AM, Louwtjie Burger wrote: >>> >>>> On 5/22/07, P?l Baltzersen <pal at baltzersen.name> wrote: >>>>> What if your HW-RAID-controller dies? in say 2 years or more.. >>>>> What will read your disks as a configured RAID? Do you know how to >>>>> (re)configure the >controller or restore the config without >>>>> destroying your data? Do you know for sure that a >spare-part and >>>>> firmware will be identical, or at least compatible? How good is >>>>> your service >subscription? Maybe only scrapyards and museums will >>>>> have what you had. =o >>>> >>>> Be careful when talking about RAID controllers in general. They are >>>> not created equal! ... >>>> Hardware raid controllers have done the job for many years ... >>> >>> Not quite the same job as ZFS, which offers integrity guarantees >>> that RAID subsystems cannot. >> >> Depend on the guarantees. Some RAID systems have built in block >> checksumming. >> > > Which still isn''t the same. Sigh.Yep.....you get what you pay for. Funny how ZFS is free to purchase isn''t it?
Toby Thain
2007-May-25 13:55 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
On 25-May-07, at 10:00 AM, Torrey McMahon wrote:> Toby Thain wrote: >> >> On 25-May-07, at 1:22 AM, Torrey McMahon wrote: >> >>> Toby Thain wrote: >>>> >>>> On 22-May-07, at 11:01 AM, Louwtjie Burger wrote: >>>> >>>>> On 5/22/07, P?l Baltzersen <pal at baltzersen.name> wrote: >>>>>> What if your HW-RAID-controller dies? in say 2 years or more.. >>>>>> What will read your disks as a configured RAID? Do you know >>>>>> how to (re)configure the >controller or restore the config >>>>>> without destroying your data? Do you know for sure that a >>>>>> >spare-part and firmware will be identical, or at least >>>>>> compatible? How good is your service >subscription? Maybe only >>>>>> scrapyards and museums will have what you had. =o >>>>> >>>>> Be careful when talking about RAID controllers in general. They >>>>> are >>>>> not created equal! ... >>>>> Hardware raid controllers have done the job for many years ... >>>> >>>> Not quite the same job as ZFS, which offers integrity guarantees >>>> that RAID subsystems cannot. >>> >>> Depend on the guarantees. Some RAID systems have built in block >>> checksumming. >>> >> >> Which still isn''t the same. Sigh. > > Yep.....you get what you pay for. Funny how ZFS is free to purchase > isn''t it? >As Nathan and others have pointed out, ZFS offers more integrity *because* it''s software. And does anyone really believe, these days, that "you get what you pay for"? ISTR Vista "Ultimate" is considerably more than a Solaris license. Rock on. --T (ending participation in thread)
Robert Milkowski
2007-May-27 13:18 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Hello Richard, Thursday, May 24, 2007, 6:10:34 PM, you wrote: RE> Incidentally, thumper field reliability is better than we expected. This is causing RE> me to do extra work, because I have to explain why. I''ve got some thumpers and there''re very reliable. Even disks aren''t failing that much - even less than I expected from observation on other arrays in the same environment. The main problems with x4500+zfs are: 1. hot spare support in zfs - right now it is far from ideal 2. raidz2 - resilver with lot of small files takes too long 3. SVM root disk mirror over jumpstart doesn''t work with x4500 (bug opened) 4. I would consider future version of x4500 to have a 2xCF card (or something similar) to boot system from - so two disk won''t be wasted just for OS (2x1TB in a few months). -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Richard Elling
2007-May-29 23:20 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Robert Milkowski wrote:> Hello Richard, > > Thursday, May 24, 2007, 6:10:34 PM, you wrote: > RE> Incidentally, thumper field reliability is better than we expected. This is causing > RE> me to do extra work, because I have to explain why. > > I''ve got some thumpers and there''re very reliable. > Even disks aren''t failing that much - even less than I expected from > observation on other arrays in the same environment.Yes, our data is consistent with your observation.> The main problems with x4500+zfs are: > > 1. hot spare support in zfs - right now it is far from idealAgree. The team is working on this, but I''m not sure of the current status.> 2. raidz2 - resilver with lot of small files takes too long > > 3. SVM root disk mirror over jumpstart doesn''t work with x4500 (bug > opened) > > 4. I would consider future version of x4500 to have a 2xCF card (or > something similar) to boot system from - so two disk won''t be > wasted just for OS (2x1TB in a few months).Current version has a CF card slot, but AFAIK, it is "not supported." We have a number of servers which do support CF for boot, and more in the pipeline (very popular with some deployment scenarios :-). But I am curious as to why you believe 2x CF are necessary? I presume this is so that you can mirror. But the remaining memory in such systems is not mirrored. Comments and experiences are welcome. -- richard
Carson Gaspar
2007-May-30 00:05 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Richard Elling wrote:> But I am curious as to why you believe 2x CF are necessary? > I presume this is so that you can mirror. But the remaining memory > in such systems is not mirrored. Comments and experiences are welcome.CF == bit-rot-prone disk, not RAM. You need to mirror it for all the same reasons you need to mirror hard disks, and then some. -- Carson
Roch - PAE
2007-May-30 15:33 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Torrey McMahon writes: > Toby Thain wrote: > > > > On 25-May-07, at 1:22 AM, Torrey McMahon wrote: > > > >> Toby Thain wrote: > >>> > >>> On 22-May-07, at 11:01 AM, Louwtjie Burger wrote: > >>> > >>>> On 5/22/07, P?l Baltzersen <pal at baltzersen.name> wrote: > >>>>> What if your HW-RAID-controller dies? in say 2 years or more.. > >>>>> What will read your disks as a configured RAID? Do you know how to > >>>>> (re)configure the >controller or restore the config without > >>>>> destroying your data? Do you know for sure that a >spare-part and > >>>>> firmware will be identical, or at least compatible? How good is > >>>>> your service >subscription? Maybe only scrapyards and museums will > >>>>> have what you had. =o > >>>> > >>>> Be careful when talking about RAID controllers in general. They are > >>>> not created equal! ... > >>>> Hardware raid controllers have done the job for many years ... > >>> > >>> Not quite the same job as ZFS, which offers integrity guarantees > >>> that RAID subsystems cannot. > >> > >> Depend on the guarantees. Some RAID systems have built in block > >> checksumming. > >> > > > > Which still isn''t the same. Sigh. > > Yep.....you get what you pay for. Funny how ZFS is free to purchase > isn''t it? > With RAID level block checksumming, if the data gets corrupted on it''s way _to_ the array, that data is lost. With ZFS and RAID-Z or Mirroring, you will recover the data. -r > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Toby Thain
2007-May-30 16:44 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
On 30-May-07, at 12:33 PM, Roch - PAE wrote:> > Torrey McMahon writes: >> Toby Thain wrote: >>> >>> On 25-May-07, at 1:22 AM, Torrey McMahon wrote: >>> >>>> Toby Thain wrote: >>>>> >>>>> On 22-May-07, at 11:01 AM, Louwtjie Burger wrote: >>>>> >>>>>> On 5/22/07, P?l Baltzersen <pal at baltzersen.name> wrote: >>>>>>> What if your HW-RAID-controller dies? in say 2 years or more.. >>>>>>> What will read your disks as a configured RAID? Do you know >>>>>>> how to >>>>>>> (re)configure the >controller or restore the config without >>>>>>> destroying your data? Do you know for sure that a >spare-part >>>>>>> and >>>>>>> firmware will be identical, or at least compatible? How good is >>>>>>> your service >subscription? Maybe only scrapyards and museums >>>>>>> will >>>>>>> have what you had. =o >>>>>> >>>>>> Be careful when talking about RAID controllers in general. >>>>>> They are >>>>>> not created equal! ... >>>>>> Hardware raid controllers have done the job for many years ... >>>>> >>>>> Not quite the same job as ZFS, which offers integrity guarantees >>>>> that RAID subsystems cannot. >>>> >>>> Depend on the guarantees. Some RAID systems have built in block >>>> checksumming. >>>> >>> >>> Which still isn''t the same. Sigh. >> >> Yep.....you get what you pay for. Funny how ZFS is free to purchase >> isn''t it? >> > > With RAID level block checksumming, if the data gets > corrupted on it''s way _to_ the array, that data is lost.Or _from_. "There''s many a slip ''twixt cup and lip." --T> > With ZFS and RAID-Z or Mirroring, you will recover the > data. > > -r > > >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Robert Milkowski
2007-Jun-01 10:15 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.
Hello Richard, RE> But I am curious as to why you believe 2x CF are necessary? RE> I presume this is so that you can mirror. But the remaining memory RE> in such systems is not mirrored. Comments and experiences are welcome. I was thinking about mirroring - it''s not clear from the comment above why it is not needed? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com