Hi folks, Myself and a colleague are currently involved in a prototyping exercise to evaluate ZFS against our current filesystem. We are looking at the best way to arrange the disks in a 3510 storage array. We have been testing with the 12 disks on the 3510 exported as "nraid" logical devices. We then configured a single ZFS pool on top of this, using two raid-z arrays. We are getting some OK numbers this way, but it seems a waste of the resources on the 3510 if we are handing everything back to the OS to handle, although I recall reading somewhere that letting ZFS handle all this jiggery-pokery was the best way to do things. I guess our question is, being new to ZFS in general and looking to optimise the kind of numbers we are getting out in terms of performance, as well as configuring a setup that will survive a disk failure, is this a sensible way of configuring a 3510 for maximum throughput and redundancy? With our current filesystem, we create two 5-disk RAID5 arrays and export these as two logical devices, with two spare disks. In a ZFS scenario, is it worth us letting the 3510 do RAID5 in the way we currently do, or should we let ZFS manage all the RAID using raid-z and treat the 3510 as 12 discrete devices? What about spare disks in a ZFS pool? Any advice is greatly appreciated. Thanks and regards, Ciaran.
Ciaran Johnston (AT/LMI) wrote:> [SNIP] > > With our current filesystem, we create two 5-disk RAID5 arrays and > export these as two logical devices, with two spare disks. In a ZFS > scenario, is it worth us letting the 3510 do RAID5 in the way we > currently do, or should we let ZFS manage all the RAID using raid-z and > treat the 3510 as 12 discrete devices? What about spare disks in a ZFS > pool? Any advice is greatly appreciated.You might want to troll through the discussion list on the opensolaris site to get some more info on your situation but, as always, it depends. Is this the only host that will be using the array? Other systems running ZFS? Are you migrating at some point? If you want to do a simple speed check then I''d create some R0 stripes, server those to the system running ZFS, and then put RAIDZ or mirrors on top of it. I''m pretty sure it isn''t possible to farm out the individual drives from the 3510. You have to overlay some sort of RAID first.
you may also want to check out Eric Kustarz''s blog,which is good to note if your''re using storage arrays: http://blogs.sun.com/erickustarz/entry/vq_max_pending Noel On Oct 16, 2006, at 11:21 AM, Torrey McMahon wrote:> Ciaran Johnston (AT/LMI) wrote: >> [SNIP] >> >> With our current filesystem, we create two 5-disk RAID5 arrays and >> export these as two logical devices, with two spare disks. In a ZFS >> scenario, is it worth us letting the 3510 do RAID5 in the way we >> currently do, or should we let ZFS manage all the RAID using raid-z >> and >> treat the 3510 as 12 discrete devices? What about spare disks in a >> ZFS >> pool? Any advice is greatly appreciated. > > > You might want to troll through the discussion list on the > opensolaris site to get some more info on your situation but, as > always, it depends. Is this the only host that will be using the > array? Other systems running ZFS? Are you migrating at some point? > > If you want to do a simple speed check then I''d create some R0 > stripes, server those to the system running ZFS, and then put RAIDZ > or mirrors on top of it. I''m pretty sure it isn''t possible to farm > out the individual drives from the 3510. You have to overlay some > sort of RAID first. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061016/ac2e4be7/attachment.html>
I''m glad you asked this question. We are currently expecting 3511 storage sub-systems for our servers. We were wondering about their configuration as well. This ZFS thing throws a wrench in the old line think ;-) Seriously, we now have to put on a new hat to figure out the best way to leverage both the storage sub-system as well as ZFS. As a sidebar if the performance of ZFS keeps improving then I can tell you the ultra expensive large arrays will be in trouble. ZFS falls in the category of ''disruptive technologies'' as discussed the book Innovator''s Dillemma. In the short run it''ll eat away at the bottom of the performance curve but will trend upwards and beat the incumbents (just like RAM took over from core memory.) This message posted from opensolaris.org
Anantha N. Srirama wrote:> I''m glad you asked this question. We are currently expecting 3511 storage sub-systems for > our servers. We were wondering about their configuration as well. This ZFS thing throws a > wrench in the old line think ;-) Seriously, we now have to put on a new hat to figure out > the best way to leverage both the storage sub-system as well as ZFS.[for the archives] There is *nothing wrong* with treating ZFS like UFS when configuring with LUNs hosted on RAID arrays. It is true that you will miss some of the self-healing features of ZFS, but at least you will know when the RAID array has munged your data -- a feature missing on UFS and most other file systems. -- richard
Richard Elling - PAE wrote:> Anantha N. Srirama wrote: >> I''m glad you asked this question. We are currently expecting 3511 >> storage sub-systems for our servers. We were wondering about their >> configuration as well. This ZFS thing throws a wrench in the old line >> think ;-) Seriously, we now have to put on a new hat to figure out >> the best way to leverage both the storage sub-system as well as ZFS. > > [for the archives] > There is *nothing wrong* with treating ZFS like UFS when configuring > with LUNs > hosted on RAID arrays. It is true that you will miss some of the > self-healing > features of ZFS, but at least you will know when the RAID array has > munged your > data -- a feature missing on UFS and most other file systems.Of you just offer ZFS multiple LUNs from the RAID array. The issue is putting ZFS on a single LUN be it a disk in a JBOD or a LUN offered from a HW RAID array. If someone goes wrong and the LUN becomes inaccessible then ... blamo! You''re toasted. If ZFS detects a data inconsistency then it can''t look to an other block for a mirrored copy, ala ZFS mirror, or to a parity block, ala RAIDZ.
Torrey McMahon wrote:> Richard Elling - PAE wrote: >> Anantha N. Srirama wrote: >>> I''m glad you asked this question. We are currently expecting 3511 >>> storage sub-systems for our servers. We were wondering about their >>> configuration as well. This ZFS thing throws a wrench in the old line >>> think ;-) Seriously, we now have to put on a new hat to figure out >>> the best way to leverage both the storage sub-system as well as ZFS. >> >> [for the archives] >> There is *nothing wrong* with treating ZFS like UFS when configuring >> with LUNs >> hosted on RAID arrays. It is true that you will miss some of the >> self-healing >> features of ZFS, but at least you will know when the RAID array has >> munged your >> data -- a feature missing on UFS and most other file systems. > > Of you just offer ZFS multiple LUNs from the RAID array. > > The issue is putting ZFS on a single LUN be it a disk in a JBOD or a LUN > offered from a HW RAID array. If someone goes wrong and the LUN becomes > inaccessible then ... blamo! You''re toasted. If ZFS detects a data > inconsistency then it can''t look to an other block for a mirrored copy, > ala ZFS mirror, or to a parity block, ala RAIDZ.Right, I think Richard''s point is that even if you just give ZFS a single LUN, ZFS is still more reliable than other filesystems (eg, due to its checksums to prevent silent data corruption and multiple copies of metadata to lessen the hurt of small amounts of data loss). --matt
[editorial comment below :-)] Matthew Ahrens wrote:> Torrey McMahon wrote: >> Richard Elling - PAE wrote: >>> Anantha N. Srirama wrote: >>>> I''m glad you asked this question. We are currently expecting 3511 >>>> storage sub-systems for our servers. We were wondering about their >>>> configuration as well. This ZFS thing throws a wrench in the old >>>> line think ;-) Seriously, we now have to put on a new hat to figure >>>> out the best way to leverage both the storage sub-system as well as >>>> ZFS. >>> >>> [for the archives] >>> There is *nothing wrong* with treating ZFS like UFS when configuring >>> with LUNs >>> hosted on RAID arrays. It is true that you will miss some of the >>> self-healing >>> features of ZFS, but at least you will know when the RAID array has >>> munged your >>> data -- a feature missing on UFS and most other file systems. >> >> Of you just offer ZFS multiple LUNs from the RAID array. >> >> The issue is putting ZFS on a single LUN be it a disk in a JBOD or a >> LUN offered from a HW RAID array. If someone goes wrong and the LUN >> becomes inaccessible then ... blamo! You''re toasted. If ZFS detects a >> data inconsistency then it can''t look to an other block for a mirrored >> copy, ala ZFS mirror, or to a parity block, ala RAIDZ. > > Right, I think Richard''s point is that even if you just give ZFS a > single LUN, ZFS is still more reliable than other filesystems (eg, due > to its checksums to prevent silent data corruption and multiple copies > of metadata to lessen the hurt of small amounts of data loss).Richard pines for ditto data blocks :-) -- richard
Hello Richard, Tuesday, October 17, 2006, 6:18:21 PM, you wrote: REP> [editorial comment below :-)] REP> Matthew Ahrens wrote:>> Torrey McMahon wrote: >>> Richard Elling - PAE wrote: >>>> Anantha N. Srirama wrote: >>>>> I''m glad you asked this question. We are currently expecting 3511 >>>>> storage sub-systems for our servers. We were wondering about their >>>>> configuration as well. This ZFS thing throws a wrench in the old >>>>> line think ;-) Seriously, we now have to put on a new hat to figure >>>>> out the best way to leverage both the storage sub-system as well as >>>>> ZFS. >>>> >>>> [for the archives] >>>> There is *nothing wrong* with treating ZFS like UFS when configuring >>>> with LUNs >>>> hosted on RAID arrays. It is true that you will miss some of the >>>> self-healing >>>> features of ZFS, but at least you will know when the RAID array has >>>> munged your >>>> data -- a feature missing on UFS and most other file systems. >>> >>> Of you just offer ZFS multiple LUNs from the RAID array. >>> >>> The issue is putting ZFS on a single LUN be it a disk in a JBOD or a >>> LUN offered from a HW RAID array. If someone goes wrong and the LUN >>> becomes inaccessible then ... blamo! You''re toasted. If ZFS detects a >>> data inconsistency then it can''t look to an other block for a mirrored >>> copy, ala ZFS mirror, or to a parity block, ala RAIDZ. >> >> Right, I think Richard''s point is that even if you just give ZFS a >> single LUN, ZFS is still more reliable than other filesystems (eg, due >> to its checksums to prevent silent data corruption and multiple copies >> of metadata to lessen the hurt of small amounts of data loss).When doing HW RAID and using ZFS as only the file system sometimes it could be worth to do striping in ZFS between LUNs (each LUN == RAID GROUP). That way file system metadata will be protected as each copy will be on different LUN (RAID group). REP> Richard pines for ditto data blocks :-) Well, I don''t know - that way you end doing RAID in ZFS anyway so probably doing just RAID-10 in ZFS without ditto block would be better. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Thanks for the stimulating exchange of ideas/thoughts. I''ve always been a believer of letting s/w do my RAID functions; for example in the old days of VxVM I always preferred to do mirroring at the s/w level. It is my belief that there is more ''meta'' information available at the OS level than at the storage level for s/w to make intelligent decisions; dynamic recordsize in ZFS is one example. Any thoughts on the following approach: 1. I''ll configure 3511 to present multiple LUNs (mirrored internally) to OS. 2. Lay down a ZFS pool/filesystem without RAID protection (RAIDZ...) in the OS With this approach I will enjoy the caching facility of 3511 and the checksum protection afforded by ZFS. This message posted from opensolaris.org
Ciaran Johnston (AT/LMI)
2006-Oct-18 12:43 UTC
[zfs-discuss] Re: Configuring a 3510 for ZFS
These are pretty much the conclusions we reached from this discussion, and thanks for all the input. On the 3510 we are configuring 12 nraid LUNs - basically presenting the 12 disks to the OS as they are. In a real scenario we will be mirroring across two 3510s anyway. We have also decided against raidz in this scenario. Regards, Ciaran. -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Anantha N. Srirama Sent: 18 October 2006 13:11 To: zfs-discuss at opensolaris.org Subject: [zfs-discuss] Re: Configuring a 3510 for ZFS Thanks for the stimulating exchange of ideas/thoughts. I''ve always been a believer of letting s/w do my RAID functions; for example in the old days of VxVM I always preferred to do mirroring at the s/w level. It is my belief that there is more ''meta'' information available at the OS level than at the storage level for s/w to make intelligent decisions; dynamic recordsize in ZFS is one example. Any thoughts on the following approach: 1. I''ll configure 3511 to present multiple LUNs (mirrored internally) to OS. 2. Lay down a ZFS pool/filesystem without RAID protection (RAIDZ...) in the OS With this approach I will enjoy the caching facility of 3511 and the checksum protection afforded by ZFS. This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Robert Milkowski wrote:> > Well, I don''t know - that way you end doing RAID in ZFS anyway so > probably doing just RAID-10 in ZFS without ditto block would be > better.The win with ditto blocks is allowing you to recover from a data inconsistency at the fs level as opposed to dealing with a block error within the raid group. The problem, as discussed previously, is in the accounting and management of the ditto blocks.