Howdy. My plan: I''m planning an ESX-iSCSI target/NFS serving box. I''m planning on using an Areca RAID card, as I''ve heard mixed things about hot-swapping with Solaris/ZFS, and I''d like the stability of a hardware RAID. My question is this: I''ll be using 8 750GB SATA drives, and I''''m trying to figure out the best method to maintain: 1) Performance 2) Hot-swap-ability 3) Disk loss. My current plan is to build two RAID-5 arrays, 4 drives each, and mirror them in ZFS and add them to the pool. This will give me 750GB*3, size wise, total. Now, here is the important question: Does mirroring provide a performance boost, or is it simply a way to provide redundancy? That is, if I go ahead and force-add the RAID-5 arrays, without mirroring them, I''ll have 6 usable drives; double the storage, but ZFS won''t see any redundancy. But if a drive fails, ZFS won''t know or care, I''ll simply go into the Areca control panel and eject the drive; voila! But, is there a performance boost with mirroring the drives? That is what I''m unsure of. Thanks for any information! This message posted from opensolaris.org
On Fri, Jul 25, 2008 at 11:02 AM, Matt Wreede <mwreede at ci.mansfield.oh.us> wrote:> But, is there a performance boost with mirroring the drives? That is what I''m unsure of.Mirroring will provide a boost on reads, since the system to read from both sides of the mirror. It will not provide an increase on writes, since the system needs to wait for both halves of the mirror to finish. It could be slightly slower than a single raid5. -B -- Brandon High bhigh at freaks.com "The good is the enemy of the best." - Nietzsche
On Fri, Jul 25, 2008 at 11:25 AM, Wreede, Matt - PC/Network Technician <mwreede at ci.mansfield.oh.us> wrote:> I don''t think read speeds are going to be a huge issue, and depending on the boost it''ll bring, it may well be worth it for me to simply add the RAID-5 arrays as a single drive, and not try to mirror them. The thought of losing a whopping 5 drives of storage makes me sad :(You could do RAID 10, which would only lose 4 drives, and have better read & write performance (no parity calculations). You could use RAID1 on the Areca and add the volumes into a zpool for best performance at the cost of ZFS''s error recovery, or RAID0 on the Areca and mirroring in ZFS, at the cost of higher resilvering times. On second thought, a zpool made out of 4 zfs mirrors would be the best of both worlds, but then you lose the hot swap advantages of using the Areca. If the Areca cards have a BBU or NVRAM they should give you good performance in any mode. -B -- Brandon High bhigh at freaks.com "The good is the enemy of the best." - Nietzsche
> > But, is there a performance boost with mirroring the drives? That is what > > I''m unsure of. > > Mirroring will provide a boost on reads, since the system to read from > both sides of the mirror. It will not provide an increase on writes, > since the system needs to wait for both halves of the mirror to > finish. It could be slightly slower than a single raid5.That''s not strictly correct. Mirroring will, in fact, deliver better IOPS for both reads and writes. For reads, as Brandon stated, mirroring will deliver better performance because it can distribute the reads between both devices. For writes, however, RAID-Z with an N+1 wide stripe will divide the the data into N+1 chunks, and reads will need to access the N chunks. This reduces the total IOPS by a factor of N+1 for reads and writes whereas mirroring reduces the IOPS by a factor of 2 for writes and not at all for reads. Adam -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl
On Fri, 25 Jul 2008, Matt Wreede wrote:> Now, here is the important question: Does mirroring provide a > performance boost, or is it simply a way to provide redundancy? ThatMirroring provides a performance boost for reads since a read can be done from either side of the mirror. Theoretically you could get twice the read performance but the actual boost depends on many factors so it is likely to be somewhat less. Since mirrors are very simple, ZFS is able to schedule reads across disks quite effectively. I see a quite considerable boost here. Besides the boost due to mirrors, ZFS load-shares across VDEVs so there is also a boost from more VDEVs.> is, if I go ahead and force-add the RAID-5 arrays, without mirroring > them, I''ll have 6 usable drives; double the storage, but ZFS won''tAlways keep in mind that the pool can be no stronger than its weakest VDEV. If any VDEV in the pool fails, then the whole pool fails. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Fri, 25 Jul 2008, Brandon High wrote:> > Mirroring will provide a boost on reads, since the system to read from > both sides of the mirror. It will not provide an increase on writes, > since the system needs to wait for both halves of the mirror to > finish. It could be slightly slower than a single raid5.Is anything really slower than a RAID5? For absolute fastest performance with least IOPS tax on the disks, the ideal configuration (for performance) is load-shared mirror pairs. As Adam Leventhal points out, there is a rather severe IOPS tax from using raidz or raidz2. Raidz saves disk space, and raidz2 saves disk space plus offers much more VDEV reliability. In today''s market, IOPS are much more expensive than raw I/O throughput or raw storage space. Disk drives have hit the wall in terms of seek times and rotational latency but still have quite a ways to go in terms of data rates. Don''t consume your precious IOPS unless you have to. I am not sure if ZFS really has to wait for both sides of a mirror to finish, but even if it does, if there are enough VDEVs then ZFS can still proceed with writing. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Fri, Jul 25, 2008 at 11:44 AM, Wreede, Matt - PC/Network Technician <mwreede at ci.mansfield.oh.us> wrote:> I''ve had hideous luck with software-RAID and hot swapping, and from what I''ve heard, Solaris is sort of iffy on support for hot swapping, so I''d like to stick with the Areca.In theory, a system built around the Marvell or LSI chipsets should be reliable and well supported since it''s what Sun is using for the x4500 and x4540. There are enough posts to this list that it seems like there are some problems with device failure on those chipsets, but I''m sure the actual rate of incidence is lower since people are more likely to report an error than success. One of the Sun guys could probably set the record straight. -B -- Brandon High bhigh at freaks.com "The good is the enemy of the best." - Nietzsche
On Fri, Jul 25, 2008 at 12:09 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> Is anything really slower than a RAID5?RAID6 might be. RAID3 can be, too. Hey, you asked.> I am not sure if ZFS really has to wait for both sides of a mirror to > finish, but even if it does, if there are enough VDEVs then ZFS can still > proceed with writing.It would have to wait on an fsync() call, since that won''t return until both halves of the mirror have completed. If the cards you''re using have NVRAM, then they could return faster. -B -- Brandon High bhigh at freaks.com "The good is the enemy of the best." - Nietzsche
Bob Friesenhahn wrote:> On Fri, 25 Jul 2008, Brandon High wrote: > >> Mirroring will provide a boost on reads, since the system to read from >> both sides of the mirror. It will not provide an increase on writes, >> since the system needs to wait for both halves of the mirror to >> finish. It could be slightly slower than a single raid5. >> > > Is anything really slower than a RAID5? >touche'' :-)> I am not sure if ZFS really has to wait for both sides of a mirror to > finish, but even if it does, if there are enough VDEVs then ZFS can > still proceed with writing. > >This is actually an interesting question. Lets put it in context. In the bad old days, logical volume managers (eg SVM) sat between a file system and physical disks. When the file system wrote a block to the pseudo-device, SVM would wait until the data protection mechanism had committed before returning to the file system. For mirrors, this meant that both sides of the mirror had committed to disk. For RAID-5, it meant that the data and its parity had been updated, which may have required read-modify-write cycles. ZFS is slightly different. ZFS will queue I/Os to devices with the redundant data being just another entry in the queue. When the transaction group commits, ZFS flushes everything to persistent storage. In the mirror case, each side of the mirror will be processing a queue and occasionally will be asked to flush. For normal writes, these won''t seem very different. For synchronous (blocking) writes, they will be different. In the SVM case, the blocking write will wait until both sides of the mirror have been written. For ZFS, the blocking write will wait until the write has been committed in the ZIL. If you do not use a separate ZIL log, then ZFS will seem similar to the SVM case. But if you do use a separate ZIL log, then you can take advantage of the various write-latency optimized storage devices. You should be able to see a big win for ZFS over the SVM case, especially for JBODs. -- richard
On Fri, 25 Jul 2008, Brandon High wrote:>> I am not sure if ZFS really has to wait for both sides of a mirror to >> finish, but even if it does, if there are enough VDEVs then ZFS can still >> proceed with writing. > > It would have to wait on an fsync() call, since that won''t return > until both halves of the mirror have completed. If the cards you''re > using have NVRAM, then they could return faster.While it is possible that the ZFS implementation does actually wait for both drives to report that the data is written, it only has to know that the data is committed to one drive in order to satisfy the synchronous write expectation. This is not the case for legacy mirrored pairs where the disks are absolutely required to contain the same content at the same logical locations. ZFS does not require that disks in a mirrored pair contain identical content at all times. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
>>>>> "bh" == Brandon High <bhigh at freaks.com> writes:bh> a system built around the Marvell or LSI chipsets according to The Blogosphere, source of all reliable information, there''s some issue with LSI, too. The driver is not available in stable Solaris nor OpenSolaris, or there are two drivers, or something. the guy is so upset, I can''t figure out wtf he''s trying to say: http://www.osnews.com/thread?317113 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080725/1e057192/attachment.bin>
On Fri, Jul 25, 2008 at 1:47 PM, Miles Nordin <carton at ivy.net> wrote:> according to The Blogosphere, source of all reliable information, > there''s some issue with LSI, too. The driver is not available in > stable Solaris nor OpenSolaris, or there are two drivers, or > something. the guy is so upset, I can''t figure out wtf he''s trying to > say:Well if you saw it on the internet ... All I know is that the x4540 uses the LSI 1068e chipset, and that the X4500 used the Marvell 88SX chipset. Since buyers of both of these systems probably have an expectation that they''ll well, work, I assume that the drivers in Solaris should be relatively stable. If that''s not the case, then I''d think Sun would want to address it. -B -- Brandon High bhigh at freaks.com "The good is the enemy of the best." - Nietzsche
Brandon High wrote:> All I know is that the x4540 uses the LSI 1068e chipset, and that the > X4500 used the Marvell 88SX chipset. Since buyers of both of these > systems probably have an expectation that they''ll well, work, I assume > that the drivers in Solaris should be relatively stable. > > If that''s not the case, then I''d think Sun would want to address it.*grumpy customer* Sadly, it is _not_ the case. And Sun is addressing it. They just aren''t sharing with the rest of the class... If you running Solaris 10, as far as I can tell you _still_ need an IDR to have NCQ enabled on the marvell chipset in an X4500 without device resets, as Sun still hasn''t released a patch that fixes the bug(s). I could be wrong, but Sun service keeps not answering our requests for clarification / a new U5 IDR. On the up side, it''s fixed in OpenSolaris, and you can probably get an IDR if you have a support contract and scream loudly enough. -- Carson
Miles Nordin wrote:>>>>>> "bh" == Brandon High <bhigh at freaks.com> writes: > > bh> a system built around the Marvell or LSI chipsets > > according to The Blogosphere, source of all reliable information, > there''s some issue with LSI, too. The driver is not available in > stable Solaris nor OpenSolaris, or there are two drivers, or > something. the guy is so upset, I can''t figure out wtf he''s trying to > say: > > http://www.osnews.com/thread?317113The driver for LSI''s MegaRAID SAS card is "mega_sas" which was integrated into snv_88. It''s planned for backporting to a Solaris 10 update. And I can''t figure out what his beef is either. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
On Fri, Jul 25, 2008 at 1:02 PM, Matt Wreede <mwreede at ci.mansfield.oh.us> wrote:> Howdy. > > My plan: > > I''m planning an ESX-iSCSI target/NFS serving box. > > I''m planning on using an Areca RAID card, as I''ve heard mixed things about hot-swapping with Solaris/ZFS, and I''d like the stability of a hardware RAID. > > My question is this: I''ll be using 8 750GB SATA drives, and I''''m trying to figure out the best method to maintain: > 1) Performance > 2) Hot-swap-ability > 3) Disk loss. > > My current plan is to build two RAID-5 arrays, 4 drives each, and mirror them in ZFS and add them to the pool. This will give me 750GB*3, size wise, total. > > Now, here is the important question: Does mirroring provide a performance boost, or is it simply a way to provide redundancy? That is, if I go ahead and force-add the RAID-5 arrays, without mirroring them, I''ll have 6 usable drives; double the storage, but ZFS won''t see any redundancy. But if a drive fails, ZFS won''t know or care, I''ll simply go into the Areca control panel and eject the drive; voila! > > But, is there a performance boost with mirroring the drives? That is what I''m unsure of. > > Thanks for any information! >I know that if your mind is made up, in terms of using the Areca, then this post is probably not going to change it, but I''d still like to give you some food for thought. If it were me, I would not add the Areca (which, BTW, is a fine piece of hardware) because: a) cost; or, put another way, those $s can be applied elsewhere with more payback in terms of performance etc. (more below) b) You''re mixing "software" (in the case of the Areca it''s more correctly called firmware) from 2 vendors - to provide a storage solution where both vendors have fundamentally different approaches to solving the same (storage) problem. c) Now you''ve got to maintain and "patch" both vendors "software" stacks. e) Fundamentally, ZFS is designed to talk *directly* to disk drives. f) the current issues/deficiencies you point out with todays ZFS implementation *will* vanish over time as ZFS is still under very active development. So you''re "solving" a problem that will solve itself in a relatively short timeframe. g) the disk drives are tied to the hardware RAID controller - you can''t migrate the disks to another box without buying another (compatible) RAID controller. If your RAID controller dies you''re SOL. h) your performance will be limited to the performance (today) of the RAID hardware - rather than to the massive performance advantage you''d gain by upgrading the system to a new motherboard/processor in a years time (Nahelem for example). I''ll assume that you''re going to spend $500 on the hardware RAID controller (because I don''t know which model/config you''re thinking of). So, the question that I propose here (and attempt to answer) is: "can those $500s be spent on a ZFS only solution to provide better value"? Proposal 1): Buy an LSI based SAS controller board and a couple of 15k RPM SAS drives (you get to pick the size) and configure them as ZFS log (slog) and cache devices. Benefit: improved NFS performance. Overall improved system performance. Proposal 2): Buy as much RAM as possible. ZFS loves RAM. How about 16Gb or more. Yep - that''ll work! :) Proposal 3) Put the $500 in the stock market and wait for Sun to release their "enterprise" RAM/Flash (or whatever it''ll be) SSD. This will provide a *huge* performance gain, especially for NFS. And this will be a simple "push in" type upgrade.[0] Proposal 4) SAS solution similar to proposal 1 - but use the 15k SAS disks to provide a ZFS mirrored pool with lots of IOPS. Remember there is *no* RAID storage configuration that is "right" for every work load and my advice is always to configure multiple RAID configs to support different workloads[1]. Also, your work load scenarios may change over time, in ways that you did''nt foresee. Proposal 5) Since you''ll be providing iSCSI, please do yourself a big favor and install an enterprise level (multiple ports??) ethernet card (Sun has one). Otherwise the tens of thousands of interrupts/Sec caused by iSCSI ops will *kill* your overall system performance. The reason why an enterprise card helps is because it''ll coalesce those interrupts and leave the system CPU cores free to do useful work. [0] and you''ll probably need to be a really good investor to be able to afford it! :) [1] on a 10-disk system here, there''s a 5-disk raidz1 pool, a 2-disk mirror and a 3-disk mirror. If I were to do it again, I''d push for a 6-disk raidz2 pool in place of the raidz1 pool. Regards, -- Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
James C. McPherson wrote:> Miles Nordin wrote: >>>>>>> "bh" == Brandon High <bhigh at freaks.com> writes: >> bh> a system built around the Marvell or LSI chipsets >> >> according to The Blogosphere, source of all reliable information, >> there''s some issue with LSI, too. The driver is not available in >> stable Solaris nor OpenSolaris, or there are two drivers, or >> something. the guy is so upset, I can''t figure out wtf he''s trying to >> say: >> >> http://www.osnews.com/thread?317113 > > The driver for LSI''s MegaRAID SAS card is "mega_sas" which > was integrated into snv_88. It''s planned for backporting to > a Solaris 10 update. > > And I can''t figure out what his beef is either.There is also a BSD-licensed driver for that hardware, called "mfi". It''s available from http://www.itee.uq.edu.au/~dlg/mfi James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
We have been using some 1068-1078 based cards (both raid:AOC-USAS-H4IR and jbod:LSISAS3801E) with b87-b90 and in s10u5 without issue for some time. Both the downloaded LSI driver and the bundled one have worked fine for us for around 6 months of moderate usage. The LSI jbod card is similar to the Sun SAS HBA card ;) -Andy -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of James C. McPherson Sent: Saturday, July 26, 2008 8:18 AM To: Miles Nordin Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Ideal Setup: RAID-5, Areca, etc! Miles Nordin wrote:>>>>>> "bh" == Brandon High <bhigh at freaks.com> writes: > > bh> a system built around the Marvell or LSI chipsets > > according to The Blogosphere, source of all reliable information, > there''s some issue with LSI, too. The driver is not available in > stable Solaris nor OpenSolaris, or there are two drivers, or > something. the guy is so upset, I can''t figure out wtf he''s trying to > say: > > http://www.osnews.com/thread?317113The driver for LSI''s MegaRAID SAS card is "mega_sas" which was integrated into snv_88. It''s planned for backporting to a Solaris 10 update. And I can''t figure out what his beef is either. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Carson Gaspar wrote:> Brandon High wrote: > > >> All I know is that the x4540 uses the LSI 1068e chipset, and that the >> X4500 used the Marvell 88SX chipset. Since buyers of both of these >> systems probably have an expectation that they''ll well, work, I assume >> that the drivers in Solaris should be relatively stable. >> >> If that''s not the case, then I''d think Sun would want to address it. >> > > *grumpy customer* > > Sadly, it is _not_ the case. And Sun is addressing it. They just aren''t > sharing with the rest of the class... > > If you running Solaris 10, as far as I can tell you _still_ need an IDR > to have NCQ enabled on the marvell chipset in an X4500 without device > resets, as Sun still hasn''t released a patch that fixes the bug(s). > > I could be wrong, but Sun service keeps not answering our requests for > clarification / a new U5 IDR. >Thats odd. Ask them to look at the (internal) patches page for the product where the details of patch releases for each OS release is detailed as well as any firmware updates with schedules. I would post the URL here, but that would confuse some poor site out in Internetland with a bunch of 404s.> On the up side, it''s fixed in OpenSolaris, and you can probably get an > IDR if you have a support contract and scream loudly enough. > >IDRs are a funny beast. They might seem like patches to you or me, but their lifecycle may lead to changes in another patch and then they die. After they die, then we really want to kill them in the wild. This is why you might have to scream loudly. When the patches are available, the SunAlerts and product patch site will also be updated. -- richard
Hello Bob, Friday, July 25, 2008, 9:00:41 PM, you wrote: BF> On Fri, 25 Jul 2008, Brandon High wrote:>>> I am not sure if ZFS really has to wait for both sides of a mirror to >>> finish, but even if it does, if there are enough VDEVs then ZFS can still >>> proceed with writing. >> >> It would have to wait on an fsync() call, since that won''t return >> until both halves of the mirror have completed. If the cards you''re >> using have NVRAM, then they could return faster.BF> While it is possible that the ZFS implementation does actually wait BF> for both drives to report that the data is written, it only has to BF> know that the data is committed to one drive in order to satisfy the BF> synchronous write expectation. This is not the case for legacy BF> mirrored pairs where the disks are absolutely required to contain the BF> same content at the same logical locations. ZFS does not require that BF> disks in a mirrored pair contain identical content at all times. AFAIK zfs does require that all writes are committed to all devices to satisfy configured redundancy unless some of devices were marked as failed. Otherwise, especially in sync case, you could loose data because of a disk failure in a redundant configuration. Not to mention other possible issues. -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com