Wilkinson, Alex
2009-Apr-30 10:23 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Hi all, In terms of best practices and high performance would it be better to present a JBOD to an OpenSolaris initiator or a single MetaLUN ? The scenario is: I currently have a single 17TB MetaLUN that i am about to present to an OpenSolaris initiator and it will obviously be ZFS. However, I am constantly reading that presenting a JBOD and using ZFS to manage the RAID is best practice ? Im not really sure why ? And isn''t that a waste of a high performing RAID array (EMC) ? -aW IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email.
Darren J Moffat
2009-Apr-30 13:19 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Wilkinson, Alex wrote:> Hi all, > > In terms of best practices and high performance would it be better to present a > JBOD to an OpenSolaris initiator or a single MetaLUN ? > > The scenario is: > > I currently have a single 17TB MetaLUN that i am about to present to an > OpenSolaris initiator and it will obviously be ZFS. However, I am constantly > reading that presenting a JBOD and using ZFS to manage the RAID is best > practice ? Im not really sure why ?If you only present a single lun to ZFS it may not be able to repair any detected errors. ZFS needs mirror, raidz or raidz2 to be able to recover from checksum failure detected errors. It may be able to recover if you use copies=2 or copies=3 but that assumes that the other copies are on a part of the MetaLUN that wasn''t damaged as well. > And isn''t that a waste of a high performing RAID array (EMC) ? That assumes it is actually faster - it might not be. -- Darren J Moffat
Miles Nordin
2009-Apr-30 15:43 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
>>>>> "djm" == Darren J Moffat <darrenm at opensolaris.org> writes:djm> If you only present a single lun to ZFS it may not be able to djm> repair any detected errors. And also the problems with pools becoming corrupt and unimportable, especially when the SAN reboots or loses connectivity and the host does not, that people like to keep forgetting. :( >> And isn''t that a waste of a high performing RAID array (EMC) ? djm> That assumes it is actually faster - it might not be. IIRC in general people have found RAID5/6 delivers higher iops than raidz/raidz2 when both are in the same width. Also the EMC array will be more robust in terms of a disk failing without taking down the host than ZFS will be---in either case you''ll not lose data, but ZFS is likely to freeze for minutes or crash if a disk fails, and might take longer to notice a disk which fails by becoming 100x slower (which is not strange) than EMC. And finally it''s probably simpler to administer as a single LUN though of course one can argue pointlessly all day about what one thinks is clearly the best way to administer things. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090430/072b3c54/attachment.bin>
Bob Friesenhahn
2009-Apr-30 16:11 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
On Thu, 30 Apr 2009, Wilkinson, Alex wrote:> > I currently have a single 17TB MetaLUN that i am about to present to an > OpenSolaris initiator and it will obviously be ZFS. However, I am constantly > reading that presenting a JBOD and using ZFS to manage the RAID is best > practice ? Im not really sure why ? And isn''t that a waste of a high performing > RAID array (EMC) ?The JBOD "advantage" is that then ZFS can schedule I/O for the disks and there is less chance of an unrecoverable pool since ZFS is assured to lay out redundant data on redundant hardware and ZFS uses more robust error detection than the firmware on any array. When using mirrors there is considerable advantage since writes and reads can be concurrent. That said, your EMC hardware likely offers much nicer interfaces for indicating and replacing bad disk drives. With the ZFS JBOD approach you have to back-track from what ZFS tells you (a Solaris device ID) and figure out which physical drive is not behaving correctly. EMC tech support may not be very helpful if ZFS says there is something wrong but the raid array says there is not. Sometimes there is value with taking advantage of what you paid for. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Wilkinson, Alex
2009-Apr-30 22:03 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
0n Thu, Apr 30, 2009 at 11:11:55AM -0500, Bob Friesenhahn wrote: >On Thu, 30 Apr 2009, Wilkinson, Alex wrote: >> >> I currently have a single 17TB MetaLUN that i am about to present to an >> OpenSolaris initiator and it will obviously be ZFS. However, I am constantly >> reading that presenting a JBOD and using ZFS to manage the RAID is best >> practice ? Im not really sure why ? And isn''t that a waste of a high performing >> RAID array (EMC) ? > >The JBOD "advantage" is that then ZFS can schedule I/O for the disks >and there is less chance of an unrecoverable pool since ZFS is assured >to lay out redundant data on redundant hardware and ZFS uses more >robust error detection than the firmware on any array. When using >mirrors there is considerable advantage since writes and reads can be >concurrent. > >That said, your EMC hardware likely offers much nicer interfaces for >indicating and replacing bad disk drives. With the ZFS JBOD approach >you have to back-track from what ZFS tells you (a Solaris device ID) >and figure out which physical drive is not behaving correctly. EMC >tech support may not be very helpful if ZFS says there is something >wrong but the raid array says there is not. Sometimes there is value >with taking advantage of what you paid for. So forget ZFS and use UFS ? Or use UFS with a ZVOL ? Or Just use Vx{VM,FS} ? It kinda sux that you get no benefit from using such a killer volume manager + filesystem with an EMC array :( -aW IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email.
Scott Lawson
2009-Apr-30 23:05 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Wilkinson, Alex wrote:> 0n Thu, Apr 30, 2009 at 11:11:55AM -0500, Bob Friesenhahn wrote: > > >On Thu, 30 Apr 2009, Wilkinson, Alex wrote: > >> > >> I currently have a single 17TB MetaLUN that i am about to present to an > >> OpenSolaris initiator and it will obviously be ZFS. However, I am constantly > >> reading that presenting a JBOD and using ZFS to manage the RAID is best > >> practice ? Im not really sure why ? And isn''t that a waste of a high performing > >> RAID array (EMC) ? > > > >The JBOD "advantage" is that then ZFS can schedule I/O for the disks > >and there is less chance of an unrecoverable pool since ZFS is assured > >to lay out redundant data on redundant hardware and ZFS uses more > >robust error detection than the firmware on any array. When using > >mirrors there is considerable advantage since writes and reads can be > >concurrent. > > > >That said, your EMC hardware likely offers much nicer interfaces for > >indicating and replacing bad disk drives. With the ZFS JBOD approach > >you have to back-track from what ZFS tells you (a Solaris device ID) > >and figure out which physical drive is not behaving correctly. EMC > >tech support may not be very helpful if ZFS says there is something > >wrong but the raid array says there is not. Sometimes there is value > >with taking advantage of what you paid for. > > So forget ZFS and use UFS ? Or use UFS with a ZVOL ? Or Just use Vx{VM,FS} ? > It kinda sux that you get no benefit from using such a killer volume manager > + filesystem with an EMC array :( > > -aW >Besides the volume management aspects of ZFS and self healing etc, you still get other benefits by virtue of using ZFS. Depending on *your* requirements, they can be arguably more beneficial, if you are happy with the reliability of your underlying storage. Specifically I am talking of ZFS snapshots, rollbacks, cloning, clone promotion, file system quotas, multiple block copies, compression, (encryption soon) etc etc. I have use snapshots, rollbacks and cloning quite successfully in complex upgrades of systems with multiple packages and complex dependencies. Case in point was a Blackboard Upgrade which had two servers. Both with ZFS file systems. One for Blackboard and one for Oracle. The upgrade involved going through 3 versions of Oracle and 4 versions of blackboard where the process had potentially many places to go wrong. At every point of the way we performed a snapshot on both Oracle and Blackboard to allow us to rollback any particular part that we got wrong. This saved us an immense amount of time and money and is a good real world example of where this side of ZFS has been extremely helpful. In the Oracle side this was infinitely faster than having to rollback the database itself. BB had some very large tables! Of course to take maximum advantage of ZFS in full, then as everyone has mentioned it is a good idea to let ZFS manage the underlying raw disks if possible.> IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email. > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Wilkinson, Alex
2009-May-01 06:09 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
0n Thu, Apr 30, 2009 at 11:11:55AM -0500, Bob Friesenhahn wrote: >On Thu, 30 Apr 2009, Wilkinson, Alex wrote: >> >> I currently have a single 17TB MetaLUN that i am about to present to an >> OpenSolaris initiator and it will obviously be ZFS. However, I am constantly >> reading that presenting a JBOD and using ZFS to manage the RAID is best >> practice ? Im not really sure why ? And isn''t that a waste of a high performing >> RAID array (EMC) ? > >The JBOD "advantage" is that then ZFS can schedule I/O for the disks >and there is less chance of an unrecoverable pool since ZFS is assured >to lay out redundant data on redundant hardware and ZFS uses more >robust error detection than the firmware on any array. When using >mirrors there is considerable advantage since writes and reads can be >concurrent. > >That said, your EMC hardware likely offers much nicer interfaces for >indicating and replacing bad disk drives. With the ZFS JBOD approach >you have to back-track from what ZFS tells you (a Solaris device ID) >and figure out which physical drive is not behaving correctly. EMC >tech support may not be very helpful if ZFS says there is something >wrong but the raid array says there is not. Sometimes there is value >with taking advantage of what you paid for. So, shall I forget ZFS and use UFS ? -aW IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email.
Scott Lawson
2009-May-01 06:44 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Wilkinson, Alex wrote:> 0n Thu, Apr 30, 2009 at 11:11:55AM -0500, Bob Friesenhahn wrote: > > >On Thu, 30 Apr 2009, Wilkinson, Alex wrote: > >> > >> I currently have a single 17TB MetaLUN that i am about to present to an > >> OpenSolaris initiator and it will obviously be ZFS. However, I am constantly > >> reading that presenting a JBOD and using ZFS to manage the RAID is best > >> practice ? Im not really sure why ? And isn''t that a waste of a high performing > >> RAID array (EMC) ? > > > >The JBOD "advantage" is that then ZFS can schedule I/O for the disks > >and there is less chance of an unrecoverable pool since ZFS is assured > >to lay out redundant data on redundant hardware and ZFS uses more > >robust error detection than the firmware on any array. When using > >mirrors there is considerable advantage since writes and reads can be > >concurrent. > > > >That said, your EMC hardware likely offers much nicer interfaces for > >indicating and replacing bad disk drives. With the ZFS JBOD approach > >you have to back-track from what ZFS tells you (a Solaris device ID) > >and figure out which physical drive is not behaving correctly. EMC > >tech support may not be very helpful if ZFS says there is something > >wrong but the raid array says there is not. Sometimes there is value > >with taking advantage of what you paid for. > > So, shall I forget ZFS and use UFS ? >Can you share more of your system configuration / intended use? UFS has a limitation of 16TB max for a single filesystem and this filesystem is limited to ~1 million inodes per TB roughly. So you if want to store a lot of small files you may find you have a problem. I have certainly run into this limitation on numerous occasions. (Smaller than ~1TB has a very high limit for inodes and generally isn''t an issue) Beyond what I mentioned in my other post it is hard to recommend anything else. ZFS does tend to have higher hardware requirements than UFS and doesn''t perform particularly well with low amounts of RAM. But without more workload information it is pretty hard to advise the best path that you should take.> -aW > > IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Dale Ghent
2009-May-01 07:24 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
On May 1, 2009, at 2:09 AM, Wilkinson, Alex wrote:> > 0n Thu, Apr 30, 2009 at 11:11:55AM -0500, Bob Friesenhahn wrote: > >> On Thu, 30 Apr 2009, Wilkinson, Alex wrote: >>> >>> I currently have a single 17TB MetaLUN that i am about to present >>> to an >>> OpenSolaris initiator and it will obviously be ZFS. However, I am >>> constantly >>> reading that presenting a JBOD and using ZFS to manage the RAID is >>> best >>> practice ? Im not really sure why ? And isn''t that a waste of a >>> high performing >>> RAID array (EMC) ? >> >> The JBOD "advantage" is that then ZFS can schedule I/O for the disks >> and there is less chance of an unrecoverable pool since ZFS is >> assured >> to lay out redundant data on redundant hardware and ZFS uses more >> robust error detection than the firmware on any array. When using >> mirrors there is considerable advantage since writes and reads can be >> concurrent. >> >> That said, your EMC hardware likely offers much nicer interfaces for >> indicating and replacing bad disk drives. With the ZFS JBOD approach >> you have to back-track from what ZFS tells you (a Solaris device ID) >> and figure out which physical drive is not behaving correctly. EMC >> tech support may not be very helpful if ZFS says there is something >> wrong but the raid array says there is not. Sometimes there is value >> with taking advantage of what you paid for. > > So, shall I forget ZFS and use UFS ?Not at all. Just export lots of LUNs from your EMC to get the IO scheduling win, not one giant one, and configure the zpool as a stripe. /dale
Ian Collins
2009-May-01 08:01 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Dale Ghent wrote:> > On May 1, 2009, at 2:09 AM, Wilkinson, Alex wrote: >> >> So, shall I forget ZFS and use UFS ? > > Not at all. Just export lots of LUNs from your EMC to get the IO > scheduling win, not one giant one, and configure the zpool as a stripe.What, no redundancy? -- Ian.
Dale Ghent
2009-May-01 13:52 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
On May 1, 2009, at 4:01 AM, Ian Collins wrote:> Dale Ghent wrote: >> >> On May 1, 2009, at 2:09 AM, Wilkinson, Alex wrote: >>> >>> So, shall I forget ZFS and use UFS ? >> >> Not at all. Just export lots of LUNs from your EMC to get the IO >> scheduling win, not one giant one, and configure the zpool as a >> stripe. > > What, no redundancy?Leave that up to the array he''s getting the LUNs from. EMC. It''s where data lives. /dale.
Brian Hechinger
2009-May-01 14:04 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
On Fri, May 01, 2009 at 09:52:54AM -0400, Dale Ghent wrote:> > EMC. It''s where data lives.I thought it was, "EMC. It''s where data goes to die." :-D -brian -- "Coding in C is like sending a 3 year old to do groceries. You gotta tell them exactly what you want or you''ll end up with a cupboard full of pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
Darren J Moffat
2009-May-01 14:16 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Dale Ghent wrote:> > On May 1, 2009, at 4:01 AM, Ian Collins wrote: > >> Dale Ghent wrote: >>> >>> On May 1, 2009, at 2:09 AM, Wilkinson, Alex wrote: >>>> >>>> So, shall I forget ZFS and use UFS ? >>> >>> Not at all. Just export lots of LUNs from your EMC to get the IO >>> scheduling win, not one giant one, and configure the zpool as a stripe. >> >> What, no redundancy? > > Leave that up to the array he''s getting the LUNs from. > > EMC. It''s where data lives.Not if you want ZFS to actually be able to recover from checksum detected failures. ZFS must be in control of the redundancy, ie a mirror, raidz or raidz2. If ZFS is just given 1 or more LUNs in a stripe then it is unlikely to be able to recover from data corruption, it might be able to recover metadata because it is always stored with at least copies=2 but that is best efforts. -- Darren J Moffat
Richard Elling
2009-May-01 16:20 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Wilkinson, Alex wrote:> > So, shall I forget ZFS and use UFSI think the writing is on the wall, right next to "Romani ite domum" :-) Today, laptops have 500 GByte drives, desktops have 1.5 TByte drives. UFS really does not work well with SMI label and 1 TByte limitations. -- richard
Miles Nordin
2009-May-01 18:01 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
>>>>> "sl" == Scott Lawson <Scott.Lawson at manukau.ac.nz> writes: >>>>> "wa" == Wilkinson, Alex <alex.wilkinson at dsto.defence.gov.au> writes: >>>>> "dg" == Dale Ghent <daleg at elemental.org> writes: >>>>> "djm" == Darren J Moffat <darrenm at opensolaris.org> writes:sl> Specifically I am talking of ZFS snapshots, rollbacks, sl> cloning, clone promotion, [...] sl> Of course to take maximum advantage of ZFS in full, then as sl> everyone has mentioned it is a good idea to let ZFS manage the sl> underlying raw disks if possible. okay, but these two feature groups are completely orthogonal. You can get the ZFS revision tree which helped you so much, and all the other features you mentioned, with a single-LUN zpool. wa> So, shall I forget ZFS and use UFS ? Naturally here you will find mostly people who have chosen to use ZFS, so I think you will have to think on your own rather than taking a poll of the ZFS list. Myself, I use ZFS. I would probably use it on a single-LUN SAN pool, but only if I had a backup system onto a second zpool, and iff I could do a restore/cutover really quickly if the primary zpool became corrupt. Some people have zpools that take days to restore, and in that case I would not do it---I''d want direct-attached storage, restore-by-cutover, or at the very least zpool-level redundancy. I''m using ZFS on a SAN right now, but my SAN is just Linux iSCSI targets, and it is exporting many JBOD LUN''s with zpool-level redundancy so I''m less at risk for the single-LUN lost pool problems than you''d be with single-lun EMC. And I have a full backup onto another zpool, on a machine capable enough to assume the role of the master, albeit not automatically. For a lighter filesystem I''m looking forward to the liberation of QFS, too. And in the future I think Solaris plans to offer redundancy options above the filesystem level, like pNFS and Lustre, which may end up being the ultimate win because of the way they can move the storage mesh onto a big network switch, rather than what we have with ZFS where it''s a couple bonded gigabit ethernet cards and a single PCIe backplane. Not all of ZFS''s features will remain useful in such a world. However I don''t think there is ANY situation in which you should run UFS over a zvol (which is one of the things you mentioned). That''s only interesting for debugging or performance comparison (meaning it should always perform worse, or else there''s a bug). If you read the replies you got more carefully you''ll find doing that addresses none of the concerns people raised. dg> Not at all. Just export lots of LUNs from your EMC to get the dg> IO scheduling win, not one giant one, and configure the zpool dg> as a stripe. I''ve never heard of using multiple-LUN stripes for storage QoS before. Have you actually measured some improvement in this configuration over a single LUN? If so that''s interesting. But it''s important to understand there''s no difference between multiple LUN stripes and a single big LUN w.r.t. reliability, as far as we know to date. The advice I''ve seen here to use multiple LUN''s over SAN vendor storage is, until now, not for QoS but for one of two reasons: * availability. a zpool mirror of LUNs on physically distant, or at least separate, storage vendor gear. * avoid the lost-zpool problem when there are SAN reboots or storage fabric disruptions without a host reboot. djm> Not if you want ZFS to actually be able to recover from djm> checksum detected failures. while we agree recovering from checksum failures is an advantage of zpool-level redundancy, I don''t think it predominates the actual failures observed by people using SAN''s. The lost-my-whole-zpool failure mode predominates, and in the two or three cases when it was examined enough to recover the zpool, it didn''t look like a checksum problem. It looked like either ZFS bugs or lost writes, or one leading to the other. And having zpool-level redundancy may happen to make this failure mode much less common, but it won''t eliminate it, especially since we still haven''t tracked down the root cause. Also we need to point out there *is* an availability advantage to letting the SAN manage a layer of redundancy, because SAN''s are much better at dealing with failing disks without crashing/slowing down than ZFS, so far. I''ve never heard of anyone actually exporting JBOD from EMC yet. Is someone actually doing this? So far I''ve heard of people burning huge $$$$$$ of disk by exporting two RAID LUN''s from the SAN and then mirroring them with zpool. djm> If ZFS is just given 1 or more LUNs in a stripe then it is djm> unlikely to be able to recover from data corruption, it might djm> be able to recover metadata because it is always stored with djm> at least copies=2 but that is best efforts. okay, fine, nice feature. But this failure is not actually happening, based on reports to the list. It''s redundancy in space, while reports we''ve seen from SAN''s show what''s really needed is redundancy in time, if that''s even possible. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090501/44f559ab/attachment.bin>
Torrey McMahon
2009-May-01 18:06 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
On 5/1/2009 2:01 PM, Miles Nordin wrote:> I''ve never heard of using multiple-LUN stripes for storage QoS before. > Have you actually measured some improvement in this configuration over > a single LUN? If so that''s interesting.Because of the way queing works in the OS and in most array controllers you can get better performance in some workloads if you create more LUNs from the underlying raid set.
Erik Trimble
2009-May-01 18:23 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Has the issue with "disappearing" single-LUN zpools causing corruption been fixed? I''d have to look up the bug, but I got bitten by this last year about this time: Config: single LUN export from array to host, attached via FC. Scenario: (1) array is turned off while host is alive, but while zpool is idle (no write/reads occuring). (2) host is shutdown (3) array is turned on (4) host is turned on (5) host reports zpool is corrupted, refuses to import it, kernel panics, and goes into a reset loop. (6) cannot import zpool on another system, zpool completely hosed. Now, IIRC, the perpetual panic and reboot thing got fixed, but not the underlying cause, which was that zfs expected to be able to periodically write/read metadata from a zpool, and the disappearance of the single underlying LUN caused the zpool to be declared corrupted and dead, even though no data was actually bad. The bad part of this is that the scenario is entirely likely to happen if a bad HBA or Switch causes the disappearance of the LUN, not the array itself going bad. I _still_ don''t do single-LUN non-redundant zpools because of this. Did it get fixed, or is this still an issue? -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
Richard Elling
2009-May-06 21:45 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Miles Nordin wrote:>>>>>> "djm" == Darren J Moffat <darrenm at opensolaris.org> writes: >>>>>> > > djm> If you only present a single lun to ZFS it may not be able to > djm> repair any detected errors. > > And also the problems with pools becoming corrupt and unimportable, > especially when the SAN reboots or loses connectivity and the host > does not, that people like to keep forgetting. :( >We forget because it is no longer a problem ;-)> >> And isn''t that a waste of a high performing RAID array (EMC) ? > > djm> That assumes it is actually faster - it might not be. > > IIRC in general people have found RAID5/6 delivers higher iops than > raidz/raidz2 when both are in the same width.Raidz will likely outperform RAID-5 on small, random writes. RAID-5 will likely outperform raidz for small, random reads. If you want your cake, and want to eat it, too, then you''ll probably not look to RAID-5 or raidz.> Also the EMC array will > be more robust in terms of a disk failing without taking down the host > than ZFS will be---in either case you''ll not lose data, but ZFS is > likely to freeze for minutes or crash if a disk fails, and might take > longer to notice a disk which fails by becoming 100x slower (which is > not strange) than EMC.I think it is disingenuous to compare an enterprise-class RAID array with the random collection of hardware on which Solaris runs. There is a damn good reason why an enterprise-class array vendor can offer such high data availability and it is the same reason why their products cost so much -- they can tightly control and integrate the components.> And finally it''s probably simpler to > administer as a single LUN though of course one can argue pointlessly > all day about what one thinks is clearly the best way to administer > things. >+1 -- richard
Miles Nordin
2009-May-06 22:22 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes:re> We forget because it is no longer a problem ;-) bug number? re> I think it is disingenuous to compare an enterprise-class RAID re> array with the random collection of hardware on which Solaris re> runs. compare with a Sun-integrated Solaris system, then. The availability problems still exist according to reports on the list. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090506/57a8dd93/attachment.bin>
Robert Milkowski
2009-May-07 08:36 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
On Wed, 6 May 2009, Miles Nordin wrote:>>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes: > > re> We forget because it is no longer a problem ;-) > > bug number? > > re> I think it is disingenuous to compare an enterprise-class RAID > re> array with the random collection of hardware on which Solaris > re> runs. > > compare with a Sun-integrated Solaris system, then. The availability > problems still exist according to reports on the list.With 7000 series (aka Amber Road)?
Robert Milkowski
2009-May-07 08:42 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
On Thu, 7 May 2009, Robert Milkowski wrote:> > > On Wed, 6 May 2009, Miles Nordin wrote: > >>>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes: >> >> re> We forget because it is no longer a problem ;-) >> >> bug number? >> >> re> I think it is disingenuous to compare an enterprise-class RAID >> re> array with the random collection of hardware on which Solaris >> re> runs. >> >> compare with a Sun-integrated Solaris system, then. The availability >> problems still exist according to reports on the list. > > > With 7000 series (aka Amber Road)? >and I had my issues both with EMC Symmetrix and EMC Clariion series, including unexpected downtimes, couldn''t access data on Symmetrix and it took some time for EMC engeeneers to "unstuck" some IOs in their firmware, then an endless fsck loop on a nas version of clariion, then a data loss on clarrion with sata drives... some other stability issues with celerra... all in all I like their products, they are really good. But it doesn''t mean they are bug free and don''t have their issues - they definitely do. If a reliability is your top priority you want a nice end-to-end integration, validation, etc. It always has been like that - storage or not. What zfs allows you is to (carefully) take some relatively cheap HW and provide a reliability sometimes even better than much more expensive solutions. But still it doesn''t mean you''ll get there with whatever hw junk you put together - you won''t.
Richard Elling
2009-May-07 14:45 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Miles Nordin wrote:>>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes: >>>>>> > > re> We forget because it is no longer a problem ;-) > > bug number? >PSARC 2007/567> re> I think it is disingenuous to compare an enterprise-class RAID > re> array with the random collection of hardware on which Solaris > re> runs. > > compare with a Sun-integrated Solaris system, then. The availability > problems still exist according to reports on the list. >URL? -- richard
Miles Nordin
2009-May-07 20:38 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes:re> PSARC 2007/567 oh, failmode? We were not talking about panics. We''re talking about corrupted pools. Many of the systems in bugs related to this PSARC are not even using a SAN and are not reporting problems simliar to the one I described. Remember when I said the SAN corruption issue was not root-caused? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090507/0d39770f/attachment.bin>
Richard Elling
2009-May-08 17:38 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Miles Nordin wrote:>>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes: >>>>>> > > re> PSARC 2007/567 > > oh, failmode? We were not talking about panics. We''re talking about > corrupted pools. Many of the systems in bugs related to this PSARC > are not even using a SAN and are not reporting problems simliar to the > one I described. >The failmode solved an event scenario where if a SAN device restarted during a ZFS write, ZFS would panic the host, and the pool was left in an failed state. With failmode, ZFS will patiently wait and continue when the restart is completed.> Remember when I said the SAN corruption issue was not root-caused? >If your SAN corrupts data, how can you blame ZFS? -- richard
Miles Nordin
2009-May-08 19:07 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes:>> Remember when I said the SAN corruption issue was not >> root-caused? re> If your SAN corrupts data, how can you blame ZFS? (a) the fault has not been isolated to the SAN. Reading some pretty-printed message from ZFS saying ``it''s not my fault it''s his fault'''' is not the same as isolating the problem. especially since all the ZFS error messages say that. But, rather than ``blame'''' maybe I could say less-loadedly ``suggest opportunity for improvement in''''? (b) other filesystems have less problems with the same SAN''s. so, even if the fault is in the SAN, which is not known yet, the work for ZFS is not finished. We need one or probably both of the following: (1) to discover what is the actual problem, even if it turns out to be with the SAN. If the problem is not with the SAN, great, fix it. If it is, how can we test for it to qualify a SAN as not having the problem, other than waiting for lost pools which is a non-answer. This is called ``integration''''---I thought everyone was a fan of it! (2) either a fix or workaround so ZFS works better with the equipment we actually have available It''s not the first time I''ve made either point. surprised it''s still being denied since I thought James was working on (b)(2) but I think people who piped up (including me) were just curious if something else had been found, silently finished. You made it sound pretty unambigiously like ``yes'''' when you said the problem does not exist any more, but I think the real answer is ``no, it has not been silently finished'''' since James began his work long after failmode was finished. It''s frustrating to keep going in circles. Also I think advising people they no longer need to avoid single-LUN SAN pools is a bad idea. And blaming the SAN problems in silent bit-flips when it looks pretty clearly that they actually lie elsewhere is dishonest and contributes to a widening credibility gap. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090508/6ce4477d/attachment.bin>
Bob Friesenhahn
2009-May-08 19:20 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
On Fri, 8 May 2009, Miles Nordin wrote:> > It''s frustrating to keep going in circles. Also I think advising > people they no longer need to avoid single-LUN SAN pools is a bad > idea. And blaming the SAN problems in silent bit-flips when it looks > pretty clearly that they actually lie elsewhere is dishonest and > contributes to a widening credibility gap.Miles, Maybe I was not paying attention or maybe my SPAM filter is over-aggressive since I seem to have lost track of the discussion. Could you remind us of the problem you are trying to solve? Has anyone else but yourself encountered it? Thanks, Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Richard Elling
2009-May-08 19:41 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Bob Friesenhahn wrote:> On Fri, 8 May 2009, Miles Nordin wrote: >> >> It''s frustrating to keep going in circles. Also I think advising >> people they no longer need to avoid single-LUN SAN pools is a bad >> idea. And blaming the SAN problems in silent bit-flips when it looks >> pretty clearly that they actually lie elsewhere is dishonest and >> contributes to a widening credibility gap. > > Miles, > > Maybe I was not paying attention or maybe my SPAM filter is > over-aggressive since I seem to have lost track of the discussion. > Could you remind us of the problem you are trying to solve? Has > anyone else but yourself encountered it?If I may speak for Miles, he''s pining for the forensics tool to replace the current, manual method for attempting to recover a borked pool by using old metadata. He''s also concerned that people trust their SAN too much. I agree, it is best if ZFS can manage data redundancy. You will find similar recommendations in the appropriate docs. -- richard
Erik Trimble
2009-May-08 20:14 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
I also think I re-started this thread. Mea culpa. The original comment from me was that I wasn''t certain that the bug I tripped over last year this time (a single-LUN zpool is declared corrupt if the underlying LUN goes away, usually due to SAN issues) was fixed. I did see that the host reset cycle issue with this was fixed, but I was wondering if we''re still concerned with "phantom" unrecoverable zpool corruption when a quiet single-LUN zpool loses it''s vdev. Am I correct in hearing that we''ve fixed this issue? Or not? -Erik On Fri, 2009-05-08 at 12:41 -0700, Richard Elling wrote:> Bob Friesenhahn wrote: > > On Fri, 8 May 2009, Miles Nordin wrote: > >> > >> It''s frustrating to keep going in circles. Also I think advising > >> people they no longer need to avoid single-LUN SAN pools is a bad > >> idea. And blaming the SAN problems in silent bit-flips when it looks > >> pretty clearly that they actually lie elsewhere is dishonest and > >> contributes to a widening credibility gap. > > > > Miles, > > > > Maybe I was not paying attention or maybe my SPAM filter is > > over-aggressive since I seem to have lost track of the discussion. > > Could you remind us of the problem you are trying to solve? Has > > anyone else but yourself encountered it? > > If I may speak for Miles, he''s pining for the forensics tool to replace > the current, manual method for attempting to recover a borked pool > by using old metadata. He''s also concerned that people trust their > SAN too much. I agree, it is best if ZFS can manage data redundancy. > You will find similar recommendations in the appropriate docs. > -- richard > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
Victor Latushkin
2009-May-09 14:17 UTC
[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)
Erik Trimble wrote:> I also think I re-started this thread. Mea culpa. > > The original comment from me was that I wasn''t certain that the bug I > tripped over last year this time (a single-LUN zpool is declared corrupt > if the underlying LUN goes away, usually due to SAN issues) was fixed. II do not recall such bug. There had been a bunch of bugs related to panics due to critical reads and writes failures, which were addressed with introduction of ''failmode'' property (and related fixes). Could you please provide exact bug number?> did see that the host reset cycle issue with this was fixed, but I was > wondering if we''re still concerned with "phantom" unrecoverable zpool > corruption when a quiet single-LUN zpool loses it''s vdev. > > Am I correct in hearing that we''ve fixed this issue? Or not?Without exact bug number it is impossible to answer your question. Cheers, Victor> > -Erik > > > > On Fri, 2009-05-08 at 12:41 -0700, Richard Elling wrote: >> Bob Friesenhahn wrote: >>> On Fri, 8 May 2009, Miles Nordin wrote: >>>> It''s frustrating to keep going in circles. Also I think advising >>>> people they no longer need to avoid single-LUN SAN pools is a bad >>>> idea. And blaming the SAN problems in silent bit-flips when it looks >>>> pretty clearly that they actually lie elsewhere is dishonest and >>>> contributes to a widening credibility gap. >>> Miles, >>> >>> Maybe I was not paying attention or maybe my SPAM filter is >>> over-aggressive since I seem to have lost track of the discussion. >>> Could you remind us of the problem you are trying to solve? Has >>> anyone else but yourself encountered it? >> If I may speak for Miles, he''s pining for the forensics tool to replace >> the current, manual method for attempting to recover a borked pool >> by using old metadata. He''s also concerned that people trust their >> SAN too much. I agree, it is best if ZFS can manage data redundancy. >> You will find similar recommendations in the appropriate docs. >> -- richard >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss