Hello folks, I am sure this topic has been asked, but I am new to this list. I have read a ton of doc?s on the web, but wanted to get some opinions from you all. Also, if someone has a digest of the last time this was discussed, you can just send that to me. In any case, I am reading a lot of mixed reviews related to ZFS on HW RAID devices. The Sun docs seem to indicate it possible, but not a recommended course. I realize there are some advantages, such as snapshots, etc. But, the h/w raid will handle ?most? disk problems, basically reducing the great capabilities of the big reasons to deploy zfs. One suggestion would be to create the h/w RAID LUNs as usual, present them to the OS, then do simple striping with ZFS. Here are my two applications, where I am presented with this possibility: Sun Messaging Environment: We currently use EMC storage. The storage team manages all Enterprise storage. We currently have 10x300gb UFS mailstores presented to the OS. Each LUN is a HW RAID 5 device. We will be upgrading the application and doing a hardware refresh of this environment, which will give us the chance to move to ZFS, but stay on EMC storage. I am sure the storage team will not want to present us with JBOD. It is there practice to create the HW LUNs and present them to the application teams. I don?t want to end up with a complicated scenario, but would like to leverage the most I can with ZFS, but on the EMC array as I mentioned. Sun Directory Environment: The directory team is running HP DL385 G2, which also has a built-in HW RAID controller for 5 internal SAS disks. The team currently has DS5.2 deployed on RHEL3, but as we move to DS6.3.1, they may want to move to Solaris 10. We have an opportunity to move to ZFS in this environment, but am curious how to best leverage ZFS capabilities in this scenario. JBOD is very clear, but a lot of manufacturers out there are still offering HW RAID technologies, with high-speed caches. Using ZFS with these is not very clear to me, and as I mentioned, there are very mixed reviews, not on ZFS features, but how it?s used in HW RAID settings. Thanks for any observations. Lloyd -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090918/21f51bbc/attachment.html>
On Fri, 18 Sep 2009, Lloyd H. Gill wrote:> > The Sun docs seem to indicate it possible, but not a recommended course. I > realize there are some advantages, such as snapshots, etc. But, the h/w raid > will handle most disk problems, basically reducing the great capabilities > of the big reasons to deploy zfs. One suggestion would be to create the h/w > RAID LUNs as usual, present them to the OS, then do simple striping with > ZFS.ZFS will catch issues that the H/W RAID will not. Other than this, there is nothing inherently wrong with the "simple striping" with ZFS as long as you are confident about your SAN device. If your SAN device fails, the whole ZFS pool may be lost, and if the failure is temporary, then the pool will be down until the SAN is restored. If you care to keep your pool up and alive as much as possible, then mirroring across SAN devices is recommended. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Hi, see comments inline: Lloyd H. Gill wrote:> > Hello folks, > > I am sure this topic has been asked, but I am new to this list. I have > read a ton of doc?s on the web, but wanted to get some opinions from > you all. Also, if someone has a digest of the last time this was > discussed, you can just send that to me. In any case, I am reading a > lot of mixed reviews related to ZFS on HW RAID devices. > > The Sun docs seem to indicate it possible, but not a recommended > course. I realize there are some advantages, such as snapshots, etc. > But, the h/w raid will handle ?most? disk problems, basically reducing > the great capabilities of the big reasons to deploy zfs. One > suggestion would be to create the h/w RAID LUNs as usual, present them > to the OS, then do simple striping with ZFS. Here are my two > applications, where I am presented with this possibility: >Of course you can use zfs on disk arrays with RAID done in HW, and you still will be able to use most of ZFS features including snapshots, clones, compression, etc. It is not recommended in that sense that unless ZFS has a pool in redundant configuration from ZFS point of view it won''t be able to heal corrupted blocks if they occur (but will be able to detect them). Most other filesystem in a market won''t even detect such a case not to mention repair it so if you are ok with not having this great zfs feature then go-ahead. All the other features of zfs will work as expected. Now, if you want to present several LUNs with RAID done in HW, then yest the best approach usually is to add all them to a pool in a striped configuration. ZFS will always put 2 or 3 copies of metadata on different LUNs if possible so you will end-up with some protection (self-healing) from zfs - for metadata at least. Other option (more expensive) is to do raid-10 or raid-z on top of LUNs which are already protected with some RAID level on a disk array, so for example if you would present 4 luns each with RAID-5 done in HW and then create a pool ''zpool create test mirror lun1 lun2 mirror lun3 lun4'' you woule effectively end-up with RAID-50 configuration but it would of course halve available logical storage but would allow zfs to do a self-healing.> Sun Messaging Environment: > We currently use EMC storage. The storage team manages all Enterprise > storage. We currently have 10x300gb UFS mailstores presented to the > OS. Each LUN is a HW RAID 5 device. We will be upgrading the > application and doing a hardware refresh of this environment, which > will give us the chance to move to ZFS, but stay on EMC storage. I am > sure the storage team will not want to present us with JBOD. It is > there practice to create the HW LUNs and present them to the > application teams. I don?t want to end up with a complicated scenario, > but would like to leverage the most I can with ZFS, but on the EMC > array as I mentioned. >just create a pool which would stripe across such luns.> Sun Directory Environment: > The directory team is running HP DL385 G2, which also has a built-in > HW RAID controller for 5 internal SAS disks. The team currently has > DS5.2 deployed on RHEL3, but as we move to DS6.3.1, they may want to > move to Solaris 10. We have an opportunity to move to ZFS in this > environment, but am curious how to best leverage ZFS capabilities in > this scenario. JBOD is very clear, but a lot of manufacturers out > there are still offering HW RAID technologies, with high-speed caches. > Using ZFS with these is not very clear to me, and as I mentioned, > there are very mixed reviews, not on ZFS features, but how it?s used > in HW RAID settings.Here you have three options. RAID in HW with one LUN and then just create a pool on top of it. ZFS will be able to detect a corruption if it happens but won''t be able to fix it (at least not for data). Another option is to present each disk as RAID-0 LUN and then do a RAID-10 or RAID-Z in ZFS. Most RAID controllers will still use their cache in such a configuration so you would still benefit from it. And ZFS will be able to detect and fix corruption if it happens. However a procedure of replacing a failed disk drive could be more complicated or even require a downtime depending on a controller and if there is a management tool on solaris for it (otherwise if disk dies in many pci controllers with one disk in raid-0 you will have to go into its bios and re-create a failed disk with a new one). But check your controller maybe it is not an issue for you or maybe it is even acceptable approach. The last option would be to disable RAID controller and access disk directly and do raid in zfs. That way you lost your cache of course. If your applications are sensitive to a write latency to your ldap database that going with one of the first two options could actually prove to be a faster solution (assuming the volume of writes is not so big that a cache will be 100% utilized all the time as then it is down to disks). Another thing you need to consider if you want to use RAID-5 or RAID-Z is your workload. If you are going to issue lots of small random reads in parallel (from multiple threads and./or processes) then in most cases HW RAID-5 will be much faster than RAID-Z. Then going with RAID-5 in HW and striping (or mirroring) LUNs would be better option from performance point of view. However if you are going to do only random writes with so big constant throughput that your caches on RAID controllers will be fully saturated all the time then ZFS RAID-Z should prove faster. If you are somewhere in-between then... well, get it tested... :) Or maybe you are in 80% basket of environments where with modern HW the performance will be acceptable from practical point of view regardless of the approach you take and then you should focus on features and easy of management and service. -- Robert Milkowski http://milek.blogspot.com
Lloyd H. Gill wrote:> > Hello folks, > > I am sure this topic has been asked, but I am new to this list. I have > read a ton of doc''s on the web, but wanted to get some opinions from > you all. Also, if someone has a digest of the last time this was > discussed, you can just send that to me. In any case, I am reading a > lot of mixed reviews related to ZFS on HW RAID devices. > > The Sun docs seem to indicate it possible, but not a recommended > course. I realize there are some advantages, such as snapshots, etc. > But, the h/w raid will handle ''most'' disk problems, basically reducing > the great capabilities of the big reasons to deploy zfs. One > suggestion would be to create the h/w RAID LUNs as usual, present them > to the OS, then do simple striping with ZFS. Here are my two > applications, where I am presented with this possibility:Comments below from me as I am a user of both of these environments, bot with ZFS. You may also want to check the iMS archives or subscribe to the list. This is where all the Sun Messaging Server gurus hang out. (I listen mostly ;)) List is : Info-iMS at Arnold.com and you can get more info here : http://mail.arnold.com/info-ims.htmlx> > Sun Messaging Environment: > We currently use EMC storage. The storage team manages all Enterprise > storage. We currently have 10x300gb UFS mailstores presented to the > OS. Each LUN is a HW RAID 5 device. We will be upgrading the > application and doing a hardware refresh of this environment, which > will give us the chance to move to ZFS, but stay on EMC storage. I am > sure the storage team will not want to present us with JBOD. It is > there practice to create the HW LUNs and present them to the > application teams. I don''t want to end up with a complicated scenario, > but would like to leverage the most I can with ZFS, but on the EMC > array as I mentioned.In this environment I do what Bob mentioned in his reply to you and that is I prevision two LUNS for each data volume and mirror them with ZFS. The LUNS are based on RAID 5 stripes on 3510''s, 3511''s and 6140''s. Mirroring them with ZFS gives all of the niceties of ZFS and it will catch any of the silent data corruption type issues that hardware RAID will not. My reasonings for doing this way go back to Disksuite days as well. (which I no longer use, ZFS or nothing pretty much these days). My setup is based on 5 x 250 GB mirrored pairs with around 3-4 million messages per volume. The two LUNS I mirror are *always* provisioned from two separate arrays in different data centers. This also means that in the case of a massive catastrophe at one data centre, I should have a good copy from the ''mirror of last resort'' that I can get our business back up and running on quickly. Other advantages of this is that it also allows for relatively easy array maintenance and upgrades as well. ZFS only remirrors changed blocks rather than a complete block re sync like disksuite does. This allows for very fast convergence times in the likes of file servers where change is relatively light, albeit continuous. Mirrors here are super quick to re converge from my experience, a little quicker than RAIDZ''s. ( I don''t have data to back this up, just a casuall observation) In some respect being both a storage guy and a systems guy. Sometimes the storage people need to get with the program a bit. :P If you use ZFS with one of it''s redundant forms (mirrors or RAIDZ''s) then JBOD presentation will be fine.> > Sun Directory Environment: > The directory team is running HP DL385 G2, which also has a built-in > HW RAID controller for 5 internal SAS disks. The team currently has > DS5.2 deployed on RHEL3, but as we move to DS6.3.1, they may want to > move to Solaris 10. We have an opportunity to move to ZFS in this > environment, but am curious how to best leverage ZFS capabilities in > this scenario. JBOD is very clear, but a lot of manufacturers out > there are still offering HW RAID technologies, with high-speed caches. > Using ZFS with these is not very clear to me, and as I mentioned, > there are very mixed reviews, not on ZFS features, but how it''s used > in HW RAID settings.Sun Directory environment generally isn''t very IO intensive, except for in massive data reloads or indexing operations. Other than this it is an ideal candidate for ZFS and it''s rather nice ARC cache. Memory is cheap on a lot of boxes and it will make read only type file systems fly. I imagine your actual living LDAP data set on disk probably won''t be larger than 10 Gigs or so? I have around 400K objects in mine and it''s only about 2 Gigs or so including all our indexes. I tend to tune DS up so that everything it needs is in RAM anyway. As far as diectory server goes, are you using the 64 bit version on Linux? If not you should be as well.> > Thanks for any observations. > > Lloyd > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- _________________________________________________________________________ Scott Lawson Systems Architect Information Communication Technology Services Manukau Institute of Technology Private Bag 94006 South Auckland Mail Centre Manukau 2240 Auckland New Zealand Phone : +64 09 968 7611 Fax : +64 09 968 7641 Mobile : +64 27 568 7611 mailto:scott at manukau.ac.nz http://www.manukau.ac.nz __________________________________________________________________________ perl -e ''print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'' __________________________________________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090919/163df1a1/attachment.html>
Scott Lawson wrote:> Sun Directory environment generally isn''t very IO intensive, except > for in massive data reloads or indexing operations. Other than this it > is an ideal candidate for ZFS > and it''s rather nice ARC cache. Memory is cheap on a lot of boxes and > it will make read only type file systems fly. I imagine your actual > living LDAP data set on disk > probably won''t be larger than 10 Gigs or so? I have around 400K > objects in mine and it''s only about 2 Gigs or so including all our > indexes. I tend to tune DS up > so that everything it needs is in RAM anyway. As far as diectory > server goes, are you using the 64 bit version on Linux? If not you > should be as well. >From my experience enabling lzjb comprssion for DS makes it even faster and reduces disk usage by about 2x. -- Robert Milkowski http://milek.blogspot.com
On Sep 18, 2009, at 16:52, Bob Friesenhahn wrote:> If you care to keep your pool up and alive as much as possible, then > mirroring across SAN devices is recommended.One suggestion I heard was to get a LUN that''s twice the size, and set "copies=2". This way you have some redundancy for incorrect checksums. Haven''t done it myself.
On Fri, 18 Sep 2009, David Magda wrote:> >> If you care to keep your pool up and alive as much as possible, then >> mirroring across SAN devices is recommended. > > One suggestion I heard was to get a LUN that''s twice the size, and set > "copies=2". This way you have some redundancy for incorrect checksums.This only helps for block-level corruption. It does not help much at all if a whole LUN goes away. It seems best for single disk rpools. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> On Fri, 18 Sep 2009, David Magda wrote: >> >>> If you care to keep your pool up and alive as much as possible, then >>> mirroring across SAN devices is recommended. >> >> One suggestion I heard was to get a LUN that''s twice the size, and >> set "copies=2". This way you have some redundancy for incorrect >> checksums. > > This only helps for block-level corruption. It does not help much at > all if a whole LUN goes away. It seems best for single disk rpools.I second this. In my experience you are more likely to have a single LUN go missing for some reason or another and it seems most prudent to support any production data volume with at the very minimum a mirror. This also give you 2 copies in a far more resilient way generally. (and per my other post, there can be other niceties that come with it as well when couple with SAN based LUNS.)> > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, > http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
All this reminds me: how much work (if any) has been done on the "asyncronous" mirroring option? That is, for supporting mirrors with radically different access times? (useful for supporting a mirror across a WAN, where you have hundred(s)-millisecond latency to the other side of the mirror)? -Erik Scott Lawson wrote:> > > Bob Friesenhahn wrote: >> On Fri, 18 Sep 2009, David Magda wrote: >>> >>>> If you care to keep your pool up and alive as much as possible, >>>> then mirroring across SAN devices is recommended. >>> >>> One suggestion I heard was to get a LUN that''s twice the size, and >>> set "copies=2". This way you have some redundancy for incorrect >>> checksums. >> >> This only helps for block-level corruption. It does not help much at >> all if a whole LUN goes away. It seems best for single disk rpools. > I second this. In my experience you are more likely to have a single > LUN go missing for some reason or another and it seems most > prudent to support any production data volume with at the very minimum > a mirror. This also give you 2 copies in a far more resilient > way generally. (and per my other post, there can be other niceties > that come with it as well when couple with SAN based LUNS.) >> >> Bob >> -- >> Bob Friesenhahn >> bfriesen at simple.dallas.tx.us, >> http://www.simplesystems.org/users/bfriesen/ >> GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
I asked the same question about one year ago here, and the posts poured in. Search for my user id? There is more info in that thread about which is best: ZFS vs ZFS+HWraid -- This message posted from opensolaris.org
---------- Forwarded message ---------- From: Al Hopper <al at logical-approach.com> Date: Sat, Sep 19, 2009 at 5:55 PM Subject: Re: [zfs-discuss] ZFS & HW RAID To: Scott Lawson <Scott.Lawson at manukau.ac.nz> On Fri, Sep 18, 2009 at 4:38 PM, Scott Lawson <Scott.Lawson at manukau.ac.nz>wrote: ..... snip ......> > Sun Directory Environment: > The directory team is running HP DL385 G2, which also has a built-in HW > RAID controller for 5 internal SAS disks. The team currently has DS5.2 > deployed on RHEL3, but as we move to DS6.3.1, they may want to move to > Solaris 10. We have an opportunity to move to ZFS in this environment, but > am curious how to best leverage ZFS capabilities in this scenario. JBOD is > very clear, but a lot of manufacturers out there are still offering HW RAID > technologies, with high-speed caches. Using ZFS with these is not very clear > to me, and as I mentioned, there are very mixed reviews, not on ZFS > features, but how it?s used in HW RAID settings. > > Sun Directory environment generally isn''t very IO intensive, except for in > massive data reloads or indexing operations. Other than this it is an ideal > candidate for ZFS > and it''s rather nice ARC cache. Memory is cheap on a lot of boxes and it > will make read only type file systems fly. I imagine your actual living LDAP > data set on disk > probably won''t be larger than 10 Gigs or so? I have around 400K objects in > mine and it''s only about 2 Gigs or so including all our indexes. I tend to > tune DS up > so that everything it needs is in RAM anyway. As far as diectory server > goes, are you using the 64 bit version on Linux? If not you should be as > well. >It would make sense IMHO to spend your budget on enterprise grade SSDs for this use case than to use EMC based storage. Imagine a 3-way or 4-way mirror of SSDs and the I/O Ops/Sec you''d get for it!! Bear in mind that Intel will soon release their 2nd generation "E" series SSD products - which is based on SLC flash. I know that politics may get in the way - but for certain workloads, the price/performance of EMC is difficult to justify IMHO. Thanks for any observations.> > Lloyd > > ------------------------------ > > _______________________________________________ > zfs-discuss mailing listzfs-discuss at opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > -- > _________________________________________________________________________ > > Scott Lawson > Systems Architect > Information Communication Technology Services > > Manukau Institute of Technology > Private Bag 94006 > South Auckland Mail Centre > Manukau 2240 > Auckland > New Zealand > > Phone : +64 09 968 7611 > Fax : +64 09 968 7641 > Mobile : +64 27 568 7611 > mailto:scott at manukau.ac.nz <scott at manukau.ac.nz> > http://www.manukau.ac.nz > > __________________________________________________________________________ > > perl -e ''print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'' > > __________________________________________________________________________ > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-- Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ -- Al Hopper Logical Approach Inc,Plano,TX al at logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090919/e4d61644/attachment.html>