thr3ads.net - zfs discuss - [zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?) [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Wilkinson, Alex

2009-Apr-30 10:23 UTC

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Hi all,

In terms of best practices and high performance would it be better to present a
JBOD to an OpenSolaris initiator or a single MetaLUN ?

The scenario is:

I currently have a single 17TB MetaLUN that i am about to present to an
OpenSolaris initiator and it will obviously be ZFS. However, I am constantly
reading that presenting a JBOD and using ZFS to manage the RAID is best
practice ? Im not really sure why ? And isn''t that a waste of a high
performing
RAID array (EMC) ?

 -aW

IMPORTANT: This email remains the property of the Australian Defence
Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT
1914.  If you have received this email in error, you are requested to contact
the sender and delete the email.

Darren J Moffat

2009-Apr-30 13:19 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Wilkinson, Alex wrote:> Hi all,
> 
> In terms of best practices and high performance would it be better to
present a
> JBOD to an OpenSolaris initiator or a single MetaLUN ?
> 
> The scenario is:
> 
> I currently have a single 17TB MetaLUN that i am about to present to an
> OpenSolaris initiator and it will obviously be ZFS. However, I am
constantly
> reading that presenting a JBOD and using ZFS to manage the RAID is best
> practice ? Im not really sure why ? 
If you only present a single lun to ZFS it may not be able to repair any 
detected errors.  ZFS needs mirror, raidz or raidz2 to be able to 
recover from checksum failure detected errors.  It may be able to 
recover if you use copies=2 or copies=3 but that assumes that the other 
copies are on a part of the MetaLUN that wasn''t damaged as well.

 > And isn''t that a waste of a high performing RAID array (EMC) ?

That assumes it is actually faster - it might not be.

--
Darren J Moffat

Miles Nordin

2009-Apr-30 15:43 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

>>>>> "djm" == Darren J Moffat <darrenm at
opensolaris.org> writes:
   djm> If you only present a single lun to ZFS it may not be able to
   djm> repair any detected errors.

And also the problems with pools becoming corrupt and unimportable,
especially when the SAN reboots or loses connectivity and the host
does not, that people like to keep forgetting. :(

    >> And isn''t that a waste of a high performing RAID array
(EMC) ?

   djm> That assumes it is actually faster - it might not be.

IIRC in general people have found RAID5/6 delivers higher iops than
raidz/raidz2 when both are in the same width.  Also the EMC array will
be more robust in terms of a disk failing without taking down the host
than ZFS will be---in either case you''ll not lose data, but ZFS is
likely to freeze for minutes or crash if a disk fails, and might take
longer to notice a disk which fails by becoming 100x slower (which is
not strange) than EMC.  And finally it''s probably simpler to
administer as a single LUN though of course one can argue pointlessly
all day about what one thinks is clearly the best way to administer
things.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090430/072b3c54/attachment.bin>

Bob Friesenhahn

2009-Apr-30 16:11 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

On Thu, 30 Apr 2009, Wilkinson, Alex wrote:>
> I currently have a single 17TB MetaLUN that i am about to present to an
> OpenSolaris initiator and it will obviously be ZFS. However, I am
constantly
> reading that presenting a JBOD and using ZFS to manage the RAID is best
> practice ? Im not really sure why ? And isn''t that a waste of a
high performing
> RAID array (EMC) ?
The JBOD "advantage" is that then ZFS can schedule I/O for the disks 
and there is less chance of an unrecoverable pool since ZFS is assured 
to lay out redundant data on redundant hardware and ZFS uses more 
robust error detection than the firmware on any array.  When using 
mirrors there is considerable advantage since writes and reads can be 
concurrent.

That said, your EMC hardware likely offers much nicer interfaces for 
indicating and replacing bad disk drives.  With the ZFS JBOD approach 
you have to back-track from what ZFS tells you (a Solaris device ID) 
and figure out which physical drive is not behaving correctly.  EMC 
tech support may not be very helpful if ZFS says there is something 
wrong but the raid array says there is not. Sometimes there is value 
with taking advantage of what you paid for.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Wilkinson, Alex

2009-Apr-30 22:03 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

0n Thu, Apr 30, 2009 at 11:11:55AM -0500, Bob Friesenhahn wrote: 

    >On Thu, 30 Apr 2009, Wilkinson, Alex wrote:
    >>
    >> I currently have a single 17TB MetaLUN that i am about to present
to an
    >> OpenSolaris initiator and it will obviously be ZFS. However, I am
constantly
    >> reading that presenting a JBOD and using ZFS to manage the RAID is
best
    >> practice ? Im not really sure why ? And isn''t that a waste
of a high performing
    >> RAID array (EMC) ?
    >
    >The JBOD "advantage" is that then ZFS can schedule I/O for the
disks
    >and there is less chance of an unrecoverable pool since ZFS is assured 
    >to lay out redundant data on redundant hardware and ZFS uses more 
    >robust error detection than the firmware on any array.  When using 
    >mirrors there is considerable advantage since writes and reads can be 
    >concurrent.
    >
    >That said, your EMC hardware likely offers much nicer interfaces for 
    >indicating and replacing bad disk drives.  With the ZFS JBOD approach 
    >you have to back-track from what ZFS tells you (a Solaris device ID) 
    >and figure out which physical drive is not behaving correctly.  EMC 
    >tech support may not be very helpful if ZFS says there is something 
    >wrong but the raid array says there is not. Sometimes there is value 
    >with taking advantage of what you paid for.

So forget ZFS and use UFS ? Or use UFS with a ZVOL ? Or Just use Vx{VM,FS} ?
It kinda sux that you get no benefit from using such a killer volume manager
+ filesystem with an EMC array :(

 -aW

IMPORTANT: This email remains the property of the Australian Defence
Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT
1914.  If you have received this email in error, you are requested to contact
the sender and delete the email.

Scott Lawson

2009-Apr-30 23:05 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Wilkinson, Alex wrote:>     0n Thu, Apr 30, 2009 at 11:11:55AM -0500, Bob Friesenhahn wrote: 
>
>     >On Thu, 30 Apr 2009, Wilkinson, Alex wrote:
>     >>
>     >> I currently have a single 17TB MetaLUN that i am about to
present to an
>     >> OpenSolaris initiator and it will obviously be ZFS. However, I
am constantly
>     >> reading that presenting a JBOD and using ZFS to manage the
RAID is best
>     >> practice ? Im not really sure why ? And isn''t that a
waste of a high performing
>     >> RAID array (EMC) ?
>     >
>     >The JBOD "advantage" is that then ZFS can schedule I/O
for the disks
>     >and there is less chance of an unrecoverable pool since ZFS is
assured
>     >to lay out redundant data on redundant hardware and ZFS uses more 
>     >robust error detection than the firmware on any array.  When using 
>     >mirrors there is considerable advantage since writes and reads can
be
>     >concurrent.
>     >
>     >That said, your EMC hardware likely offers much nicer interfaces
for
>     >indicating and replacing bad disk drives.  With the ZFS JBOD
approach
>     >you have to back-track from what ZFS tells you (a Solaris device
ID)
>     >and figure out which physical drive is not behaving correctly.  EMC
>     >tech support may not be very helpful if ZFS says there is something
>     >wrong but the raid array says there is not. Sometimes there is
value
>     >with taking advantage of what you paid for.
>
> So forget ZFS and use UFS ? Or use UFS with a ZVOL ? Or Just use Vx{VM,FS}
?
> It kinda sux that you get no benefit from using such a killer volume
manager
> + filesystem with an EMC array :(
>
>  -aW
>   Besides the volume management aspects of ZFS and self healing etc, you 
still get other benefits
by virtue of using ZFS. Depending on *your* requirements, they can be 
arguably more beneficial,
if you are happy with the reliability of your underlying storage.

Specifically I am talking of ZFS snapshots, rollbacks, cloning, clone 
promotion, file system quotas, multiple
block copies, compression, (encryption soon) etc etc.

I have use snapshots, rollbacks and cloning quite successfully in 
complex upgrades of systems with multiple
packages and complex dependencies.

Case in point was a Blackboard Upgrade which had two servers. Both with 
ZFS file systems. One for
Blackboard and one for Oracle. The upgrade involved going through 3 
versions of
Oracle and 4 versions of blackboard where the process had potentially 
many places to go wrong. At every
point of the way we performed a snapshot on both Oracle and Blackboard 
to allow us to
rollback any particular part that we got wrong.  This saved us an 
immense amount of time and
money and is a good real world example of where this side of ZFS has 
been extremely helpful.

In the Oracle side this was infinitely faster than having to rollback 
the database itself. BB had some very large
tables!

Of course to take maximum advantage of ZFS in full, then as everyone has 
mentioned it is a good idea
 to let ZFS manage the underlying raw disks if possible.
> IMPORTANT: This email remains the property of the Australian Defence
Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT
1914.  If you have received this email in error, you are requested to contact
the sender and delete the email.
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Wilkinson, Alex

2009-May-01 06:09 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

0n Thu, Apr 30, 2009 at 11:11:55AM -0500, Bob Friesenhahn wrote: 

    >On Thu, 30 Apr 2009, Wilkinson, Alex wrote:
    >>
    >> I currently have a single 17TB MetaLUN that i am about to present
to an
    >> OpenSolaris initiator and it will obviously be ZFS. However, I am
constantly
    >> reading that presenting a JBOD and using ZFS to manage the RAID is
best
    >> practice ? Im not really sure why ? And isn''t that a waste
of a high performing
    >> RAID array (EMC) ?
    >
    >The JBOD "advantage" is that then ZFS can schedule I/O for the
disks
    >and there is less chance of an unrecoverable pool since ZFS is assured 
    >to lay out redundant data on redundant hardware and ZFS uses more 
    >robust error detection than the firmware on any array.  When using 
    >mirrors there is considerable advantage since writes and reads can be 
    >concurrent.
    >
    >That said, your EMC hardware likely offers much nicer interfaces for 
    >indicating and replacing bad disk drives.  With the ZFS JBOD approach 
    >you have to back-track from what ZFS tells you (a Solaris device ID) 
    >and figure out which physical drive is not behaving correctly.  EMC 
    >tech support may not be very helpful if ZFS says there is something 
    >wrong but the raid array says there is not. Sometimes there is value 
    >with taking advantage of what you paid for.

So, shall I forget ZFS and use UFS ?

 -aW

IMPORTANT: This email remains the property of the Australian Defence
Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT
1914.  If you have received this email in error, you are requested to contact
the sender and delete the email.

Scott Lawson

2009-May-01 06:44 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Wilkinson, Alex wrote:>     0n Thu, Apr 30, 2009 at 11:11:55AM -0500, Bob Friesenhahn wrote: 
>
>     >On Thu, 30 Apr 2009, Wilkinson, Alex wrote:
>     >>
>     >> I currently have a single 17TB MetaLUN that i am about to
present to an
>     >> OpenSolaris initiator and it will obviously be ZFS. However, I
am constantly
>     >> reading that presenting a JBOD and using ZFS to manage the
RAID is best
>     >> practice ? Im not really sure why ? And isn''t that a
waste of a high performing
>     >> RAID array (EMC) ?
>     >
>     >The JBOD "advantage" is that then ZFS can schedule I/O
for the disks
>     >and there is less chance of an unrecoverable pool since ZFS is
assured
>     >to lay out redundant data on redundant hardware and ZFS uses more 
>     >robust error detection than the firmware on any array.  When using 
>     >mirrors there is considerable advantage since writes and reads can
be
>     >concurrent.
>     >
>     >That said, your EMC hardware likely offers much nicer interfaces
for
>     >indicating and replacing bad disk drives.  With the ZFS JBOD
approach
>     >you have to back-track from what ZFS tells you (a Solaris device
ID)
>     >and figure out which physical drive is not behaving correctly.  EMC
>     >tech support may not be very helpful if ZFS says there is something
>     >wrong but the raid array says there is not. Sometimes there is
value
>     >with taking advantage of what you paid for.
>
> So, shall I forget ZFS and use UFS ?
>   
Can you share more of your system configuration / intended use?

UFS has a limitation of 16TB max for a single filesystem and this 
filesystem is limited to ~1 million inodes per TB roughly. So you
if want to store a lot of small files you may find you have a problem. I 
have certainly run into this limitation on numerous occasions.
(Smaller than ~1TB has a very high limit for inodes and generally isn''t
an issue)

Beyond what I mentioned in my other post it is hard to recommend 
anything else. ZFS does tend to have higher hardware
requirements than UFS and doesn''t perform particularly well with low 
amounts of RAM. But without more workload
information it is pretty hard to advise the best path that you should take.
>  -aW
>
> IMPORTANT: This email remains the property of the Australian Defence
Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT
1914.  If you have received this email in error, you are requested to contact
the sender and delete the email.
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Dale Ghent

2009-May-01 07:24 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

On May 1, 2009, at 2:09 AM, Wilkinson, Alex wrote:
>
>    0n Thu, Apr 30, 2009 at 11:11:55AM -0500, Bob Friesenhahn wrote:
>
>> On Thu, 30 Apr 2009, Wilkinson, Alex wrote:
>>>
>>> I currently have a single 17TB MetaLUN that i am about to present  
>>> to an
>>> OpenSolaris initiator and it will obviously be ZFS. However, I am  
>>> constantly
>>> reading that presenting a JBOD and using ZFS to manage the RAID is
>>> best
>>> practice ? Im not really sure why ? And isn''t that a waste
of a
>>> high performing
>>> RAID array (EMC) ?
>>
>> The JBOD "advantage" is that then ZFS can schedule I/O for
the disks
>> and there is less chance of an unrecoverable pool since ZFS is  
>> assured
>> to lay out redundant data on redundant hardware and ZFS uses more
>> robust error detection than the firmware on any array.  When using
>> mirrors there is considerable advantage since writes and reads can be
>> concurrent.
>>
>> That said, your EMC hardware likely offers much nicer interfaces for
>> indicating and replacing bad disk drives.  With the ZFS JBOD approach
>> you have to back-track from what ZFS tells you (a Solaris device ID)
>> and figure out which physical drive is not behaving correctly.  EMC
>> tech support may not be very helpful if ZFS says there is something
>> wrong but the raid array says there is not. Sometimes there is value
>> with taking advantage of what you paid for.
>
> So, shall I forget ZFS and use UFS ?
Not at all. Just export lots of LUNs from your EMC to get the IO  
scheduling win, not one giant one, and configure the zpool as a stripe.

/dale

Ian Collins

2009-May-01 08:01 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Dale Ghent wrote:>
> On May 1, 2009, at 2:09 AM, Wilkinson, Alex wrote:
>>
>> So, shall I forget ZFS and use UFS ?
>
> Not at all. Just export lots of LUNs from your EMC to get the IO 
> scheduling win, not one giant one, and configure the zpool as a stripe.
What, no redundancy?

-- 
Ian.

Dale Ghent

2009-May-01 13:52 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

On May 1, 2009, at 4:01 AM, Ian Collins wrote:
> Dale Ghent wrote:
>>
>> On May 1, 2009, at 2:09 AM, Wilkinson, Alex wrote:
>>>
>>> So, shall I forget ZFS and use UFS ?
>>
>> Not at all. Just export lots of LUNs from your EMC to get the IO  
>> scheduling win, not one giant one, and configure the zpool as a  
>> stripe.
>
> What, no redundancy?
Leave that up to the array he''s getting the LUNs from.

EMC. It''s where data lives.

/dale.

Brian Hechinger

2009-May-01 14:04 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

On Fri, May 01, 2009 at 09:52:54AM -0400, Dale Ghent
wrote:> 
> EMC. It''s where data lives.
I thought it was, "EMC. It''s where data goes to die."  :-D

-brian
-- 
"Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you''ll end up with a cupboard full
of
pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)

Darren J Moffat

2009-May-01 14:16 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Dale Ghent wrote:> 
> On May 1, 2009, at 4:01 AM, Ian Collins wrote:
> 
>> Dale Ghent wrote:
>>>
>>> On May 1, 2009, at 2:09 AM, Wilkinson, Alex wrote:
>>>>
>>>> So, shall I forget ZFS and use UFS ?
>>>
>>> Not at all. Just export lots of LUNs from your EMC to get the IO 
>>> scheduling win, not one giant one, and configure the zpool as a
stripe.
>>
>> What, no redundancy?
> 
> Leave that up to the array he''s getting the LUNs from.
> 
> EMC. It''s where data lives.
Not if you want ZFS to actually be able to recover from checksum 
detected failures.   ZFS must be in control of the redundancy, ie a 
mirror, raidz or raidz2.  If ZFS is just given 1 or more LUNs in a 
stripe then it is unlikely to be able to recover from data corruption, 
it might be able to recover metadata because it is always stored with at 
least copies=2 but that is best efforts.

--
Darren J Moffat

Richard Elling

2009-May-01 16:20 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Wilkinson, Alex wrote:>    
> So, shall I forget ZFS and use UFS 
I think the writing is on the wall, right next to "Romani ite domum"
:-)
Today, laptops have 500 GByte drives, desktops have 1.5 TByte drives.
UFS really does not work well with SMI label and 1 TByte limitations.
-- richard

Miles Nordin

2009-May-01 18:01 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

>>>>> "sl" == Scott Lawson <Scott.Lawson at
manukau.ac.nz> writes:
>>>>> "wa" == Wilkinson, Alex <alex.wilkinson at
dsto.defence.gov.au> writes:
>>>>> "dg" == Dale Ghent <daleg at elemental.org>
writes:
>>>>> "djm" == Darren J Moffat <darrenm at
opensolaris.org> writes:
sl> Specifically I am talking of ZFS snapshots, rollbacks,
sl> cloning, clone promotion,

[...]

sl> Of course to take maximum advantage of ZFS in full, then as
sl> everyone has mentioned it is a good idea to let ZFS manage the
sl> underlying raw disks if possible.

okay, but these two feature groups are completely orthogonal. You can
get the ZFS revision tree which helped you so much, and all the other
features you mentioned, with a single-LUN zpool.

wa> So, shall I forget ZFS and use UFS ?

Naturally here you will find mostly people who have chosen to use ZFS,
so I think you will have to think on your own rather than taking a
poll of the ZFS list.

Myself, I use ZFS. I would probably use it on a single-LUN SAN pool,
but only if I had a backup system onto a second zpool, and iff I could
do a restore/cutover really quickly if the primary zpool became
corrupt. Some people have zpools that take days to restore, and in
that case I would not do it---I''d want direct-attached storage,
restore-by-cutover, or at the very least zpool-level redundancy. I''m
using ZFS on a SAN right now, but my SAN is just Linux iSCSI targets,
and it is exporting many JBOD LUN''s with zpool-level redundancy so
I''m
less at risk for the single-LUN lost pool problems than you''d be with
single-lun EMC. And I have a full backup onto another zpool, on a
machine capable enough to assume the role of the master, albeit not
automatically.

For a lighter filesystem I''m looking forward to the liberation of QFS,
too. And in the future I think Solaris plans to offer redundancy
options above the filesystem level, like pNFS and Lustre, which may
end up being the ultimate win because of the way they can move the
storage mesh onto a big network switch, rather than what we have with
ZFS where it''s a couple bonded gigabit ethernet cards and a single
PCIe backplane. Not all of ZFS''s features will remain useful in such
a world.

However I don''t think there is ANY situation in which you should run
UFS over a zvol (which is one of the things you mentioned). That''s
only interesting for debugging or performance comparison (meaning it
should always perform worse, or else there''s a bug). If you read the
replies you got more carefully you''ll find doing that addresses none
of the concerns people raised.

dg> Not at all. Just export lots of LUNs from your EMC to get the
dg> IO scheduling win, not one giant one, and configure the zpool
dg> as a stripe.

I''ve never heard of using multiple-LUN stripes for storage QoS before.
Have you actually measured some improvement in this configuration over
a single LUN? If so that''s interesting.

But it''s important to understand there''s no difference between
multiple LUN stripes and a single big LUN w.r.t. reliability, as far
as we know to date. The advice I''ve seen here to use multiple
LUN''s
over SAN vendor storage is, until now, not for QoS but for one of two
reasons:

* availability. a zpool mirror of LUNs on physically distant, or at
least separate, storage vendor gear.

* avoid the lost-zpool problem when there are SAN reboots or storage
fabric disruptions without a host reboot.

djm> Not if you want ZFS to actually be able to recover from
djm> checksum detected failures.

while we agree recovering from checksum failures is an advantage of
zpool-level redundancy, I don''t think it predominates the actual
failures observed by people using SAN''s. The lost-my-whole-zpool
failure mode predominates, and in the two or three cases when it was
examined enough to recover the zpool, it didn''t look like a checksum
problem. It looked like either ZFS bugs or lost writes, or one
leading to the other. And having zpool-level redundancy may happen to
make this failure mode much less common, but it won''t eliminate it,
especially since we still haven''t tracked down the root cause.

Also we need to point out there *is* an availability advantage to
letting the SAN manage a layer of redundancy, because SAN''s are much
better at dealing with failing disks without crashing/slowing down
than ZFS, so far.

I''ve never heard of anyone actually exporting JBOD from EMC yet. Is
someone actually doing this? So far I''ve heard of people burning huge
$$$$$$ of disk by exporting two RAID LUN''s from the SAN and then
mirroring them with zpool.

djm> If ZFS is just given 1 or more LUNs in a stripe then it is
djm> unlikely to be able to recover from data corruption, it might
djm> be able to recover metadata because it is always stored with
djm> at least copies=2 but that is best efforts.

okay, fine, nice feature. But this failure is not actually happening,
based on reports to the list. It''s redundancy in space, while reports
we''ve seen from SAN''s show what''s really needed is
redundancy in time,
if that''s even possible.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090501/44f559ab/attachment.bin>

Torrey McMahon

2009-May-01 18:06 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

On 5/1/2009 2:01 PM, Miles Nordin wrote:> I''ve never heard of using multiple-LUN stripes for storage QoS
before.
> Have you actually measured some improvement in this configuration over
> a single LUN?  If so that''s interesting.
Because of the way queing works in the OS and in most array controllers 
you can get better performance in some workloads if you create more LUNs 
from the underlying raid set.

Erik Trimble

2009-May-01 18:23 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Has the issue with "disappearing" single-LUN zpools causing corruption
been fixed?

I''d have to look up the bug, but I got bitten by this last year about 
this time:

Config:

single LUN export from array to host, attached via FC.

Scenario:

(1) array is turned off while host is alive, but while zpool is idle (no 
write/reads occuring).
(2) host is shutdown
(3) array is turned on
(4) host is turned on
(5) host reports zpool is corrupted, refuses to import it, kernel 
panics, and goes into a reset loop.
(6) cannot import zpool on another system, zpool completely hosed.


Now, IIRC, the perpetual panic and reboot thing got fixed, but not the 
underlying cause, which was that zfs expected to be able to periodically 
write/read metadata from a zpool, and the disappearance of the single 
underlying LUN caused the zpool to be declared corrupted and dead, even 
though no data was actually bad.   The bad part of this is that the 
scenario is entirely likely to happen if a bad HBA or Switch causes the 
disappearance of the LUN, not the array itself going bad.

I _still_ don''t do single-LUN non-redundant zpools because of this. Did
it get fixed, or is this still an issue?

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Richard Elling

2009-May-06 21:45 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Miles Nordin wrote:>>>>>> "djm" == Darren J Moffat <darrenm at
opensolaris.org> writes:
>>>>>>             
>
>    djm> If you only present a single lun to ZFS it may not be able to
>    djm> repair any detected errors.
>
> And also the problems with pools becoming corrupt and unimportable,
> especially when the SAN reboots or loses connectivity and the host
> does not, that people like to keep forgetting. :(
>   
We forget because it is no longer a problem ;-)
>     >> And isn''t that a waste of a high performing RAID
array (EMC) ?
>
>    djm> That assumes it is actually faster - it might not be.
>
> IIRC in general people have found RAID5/6 delivers higher iops than
> raidz/raidz2 when both are in the same width.  
Raidz will likely outperform RAID-5 on small, random writes.
RAID-5 will likely outperform raidz for small, random reads.
If you want your cake, and want to eat it, too, then you''ll probably
not look to RAID-5 or raidz.
> Also the EMC array will
> be more robust in terms of a disk failing without taking down the host
> than ZFS will be---in either case you''ll not lose data, but ZFS is
> likely to freeze for minutes or crash if a disk fails, and might take
> longer to notice a disk which fails by becoming 100x slower (which is
> not strange) than EMC.  
I think it is disingenuous to compare an enterprise-class RAID
array with the random collection of hardware on which Solaris
runs.  There is a damn good reason why an enterprise-class
array vendor can offer such high data availability and it is the
same reason why their products cost so much -- they can
tightly control and integrate the components.
> And finally it''s probably simpler to
> administer as a single LUN though of course one can argue pointlessly
> all day about what one thinks is clearly the best way to administer
> things.
>   
+1
 -- richard

Miles Nordin

2009-May-06 22:22 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

>>>>> "re" == Richard Elling <richard.elling at
gmail.com> writes:
    re> We forget because it is no longer a problem ;-)

bug number?

    re> I think it is disingenuous to compare an enterprise-class RAID
    re> array with the random collection of hardware on which Solaris
    re> runs.

compare with a Sun-integrated Solaris system, then.  The availability
problems still exist according to reports on the list.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090506/57a8dd93/attachment.bin>

Robert Milkowski

2009-May-07 08:36 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

On Wed, 6 May 2009, Miles Nordin wrote:
>>>>>> "re" == Richard Elling <richard.elling at
gmail.com> writes:
>
>    re> We forget because it is no longer a problem ;-)
>
> bug number?
>
>    re> I think it is disingenuous to compare an enterprise-class RAID
>    re> array with the random collection of hardware on which Solaris
>    re> runs.
>
> compare with a Sun-integrated Solaris system, then.  The availability
> problems still exist according to reports on the list.

With 7000 series (aka Amber Road)?

Robert Milkowski

2009-May-07 08:42 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

On Thu, 7 May 2009, Robert Milkowski wrote:
>
>
> On Wed, 6 May 2009, Miles Nordin wrote:
>
>>>>>>> "re" == Richard Elling <richard.elling
at gmail.com> writes:
>> 
>>    re> We forget because it is no longer a problem ;-)
>> 
>> bug number?
>> 
>>    re> I think it is disingenuous to compare an enterprise-class
RAID
>>    re> array with the random collection of hardware on which Solaris
>>    re> runs.
>> 
>> compare with a Sun-integrated Solaris system, then.  The availability
>> problems still exist according to reports on the list.
>
>
> With 7000 series (aka Amber Road)?
>

and I had my issues both with EMC Symmetrix and EMC Clariion series, 
including unexpected downtimes, couldn''t access data on Symmetrix and
it
took some time for EMC engeeneers to "unstuck" some IOs in their
firmware,
then an endless fsck loop on a nas version of clariion, then a data loss 
on clarrion with sata drives... some other stability issues with 
celerra...

all in all I like their products, they are really good.
But it doesn''t mean they are bug free and don''t have their
issues - they
definitely do.

If a reliability is your top priority you want a nice end-to-end 
integration, validation, etc. It always has been like that - storage or 
not.

What zfs allows you is to (carefully) take some relatively cheap HW and 
provide a reliability sometimes even better than much more expensive 
solutions. But still it doesn''t mean you''ll get there with
whatever hw
junk you put together - you won''t.

Richard Elling

2009-May-07 14:45 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Miles Nordin wrote:>>>>>> "re" == Richard Elling <richard.elling at
gmail.com> writes:
>>>>>>             
>
>     re> We forget because it is no longer a problem ;-)
>
> bug number?
>   
PSARC 2007/567
>     re> I think it is disingenuous to compare an enterprise-class RAID
>     re> array with the random collection of hardware on which Solaris
>     re> runs.
>
> compare with a Sun-integrated Solaris system, then.  The availability
> problems still exist according to reports on the list.
>   
URL?
 -- richard

Miles Nordin

2009-May-07 20:38 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

>>>>> "re" == Richard Elling <richard.elling at
gmail.com> writes:
    re> PSARC 2007/567

oh, failmode?  We were not talking about panics.  We''re talking about
corrupted pools.  Many of the systems in bugs related to this PSARC
are not even using a SAN and are not reporting problems simliar to the
one I described.

Remember when I said the SAN corruption issue was not root-caused?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090507/0d39770f/attachment.bin>

Richard Elling

2009-May-08 17:38 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Miles Nordin wrote:>>>>>> "re" == Richard Elling <richard.elling at
gmail.com> writes:
>>>>>>             
>
>     re> PSARC 2007/567
>
> oh, failmode?  We were not talking about panics.  We''re talking
about
> corrupted pools.  Many of the systems in bugs related to this PSARC
> are not even using a SAN and are not reporting problems simliar to the
> one I described.
>   
The failmode solved an event scenario where if a SAN device restarted
during a ZFS write, ZFS would panic the host, and the pool was left in
an failed state.  With failmode, ZFS will patiently wait and continue
when the restart is completed.
> Remember when I said the SAN corruption issue was not root-caused?
>   
If your SAN corrupts data, how can you blame ZFS?
 -- richard

Miles Nordin

2009-May-08 19:07 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

>>>>> "re" == Richard Elling <richard.elling at
gmail.com> writes:
>> Remember when I said the SAN corruption issue was not
>> root-caused?

re> If your SAN corrupts data, how can you blame ZFS?

(a) the fault has not been isolated to the SAN.

Reading some pretty-printed message from ZFS saying ``it''s not my
fault it''s his fault'''' is not the same as
isolating the problem.
especially since all the ZFS error messages say that.

But, rather than ``blame'''' maybe I could say less-loadedly
``suggest
opportunity for improvement in''''?

(b) other filesystems have less problems with the same SAN''s. so,
even if the fault is in the SAN, which is not known yet, the work
for ZFS is not finished. We need one or probably both of the
following:

(1) to discover what is the actual problem, even if it turns out
to be with the SAN. If the problem is not with the SAN,
great, fix it. If it is, how can we test for it to qualify a
SAN as not having the problem, other than waiting for lost
pools which is a non-answer.

This is called ``integration''''---I thought everyone
was a fan
of it!

(2) either a fix or workaround so ZFS works better with the
equipment we actually have available

It''s not the first time I''ve made either point. surprised
it''s still
being denied since I thought James was working on (b)(2) but I think
people who piped up (including me) were just curious if something else
had been found, silently finished. You made it sound pretty
unambigiously like ``yes'''' when you said the problem does not
exist
any more, but I think the real answer is ``no, it has not been
silently finished'''' since James began his work long after
failmode was
finished.

It''s frustrating to keep going in circles. Also I think advising
people they no longer need to avoid single-LUN SAN pools is a bad
idea. And blaming the SAN problems in silent bit-flips when it looks
pretty clearly that they actually lie elsewhere is dishonest and
contributes to a widening credibility gap.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090508/6ce4477d/attachment.bin>

Bob Friesenhahn

2009-May-08 19:20 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

On Fri, 8 May 2009, Miles Nordin wrote:>
> It''s frustrating to keep going in circles.  Also I think advising
> people they no longer need to avoid single-LUN SAN pools is a bad
> idea.  And blaming the SAN problems in silent bit-flips when it looks
> pretty clearly that they actually lie elsewhere is dishonest and
> contributes to a widening credibility gap.
Miles,

Maybe I was not paying attention or maybe my SPAM filter is 
over-aggressive since I seem to have lost track of the discussion. 
Could you remind us of the problem you are trying to solve?  Has 
anyone else but yourself encountered it?

Thanks,

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Richard Elling

2009-May-08 19:41 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Bob Friesenhahn wrote:> On Fri, 8 May 2009, Miles Nordin wrote:
>>
>> It''s frustrating to keep going in circles.  Also I think
advising
>> people they no longer need to avoid single-LUN SAN pools is a bad
>> idea.  And blaming the SAN problems in silent bit-flips when it looks
>> pretty clearly that they actually lie elsewhere is dishonest and
>> contributes to a widening credibility gap.
>
> Miles,
>
> Maybe I was not paying attention or maybe my SPAM filter is 
> over-aggressive since I seem to have lost track of the discussion. 
> Could you remind us of the problem you are trying to solve?  Has 
> anyone else but yourself encountered it?
If I may speak for Miles, he''s pining for the forensics tool to replace
the current, manual method for attempting to recover a borked pool
by using old metadata.  He''s also concerned that people trust their
SAN too much.  I agree, it is best if ZFS can manage data redundancy.
You will find similar recommendations in the appropriate docs.
 -- richard

Erik Trimble

2009-May-08 20:14 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

I also think I re-started this thread. Mea culpa.

The original comment from me was that I wasn''t certain that the bug I
tripped over last year this time (a single-LUN zpool is declared corrupt
if the underlying LUN goes away, usually due to SAN issues) was fixed. I
did see that the host reset cycle issue with this was fixed, but I was
wondering if we''re still concerned with "phantom"
unrecoverable zpool
corruption when a quiet single-LUN zpool loses it''s vdev.

Am I correct in hearing that we''ve fixed this issue? Or not?

-Erik



On Fri, 2009-05-08 at 12:41 -0700, Richard Elling wrote:> Bob Friesenhahn wrote:
> > On Fri, 8 May 2009, Miles Nordin wrote:
> >>
> >> It''s frustrating to keep going in circles.  Also I think
advising
> >> people they no longer need to avoid single-LUN SAN pools is a bad
> >> idea.  And blaming the SAN problems in silent bit-flips when it
looks
> >> pretty clearly that they actually lie elsewhere is dishonest and
> >> contributes to a widening credibility gap.
> >
> > Miles,
> >
> > Maybe I was not paying attention or maybe my SPAM filter is 
> > over-aggressive since I seem to have lost track of the discussion. 
> > Could you remind us of the problem you are trying to solve?  Has 
> > anyone else but yourself encountered it?
> 
> If I may speak for Miles, he''s pining for the forensics tool to
replace
> the current, manual method for attempting to recover a borked pool
> by using old metadata.  He''s also concerned that people trust
their
> SAN too much.  I agree, it is best if ZFS can manage data redundancy.
> You will find similar recommendations in the appropriate docs.
>  -- richard
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Victor Latushkin

2009-May-09 14:17 UTC

head link

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

Erik Trimble wrote:> I also think I re-started this thread. Mea culpa.
> 
> The original comment from me was that I wasn''t certain that the
bug I
> tripped over last year this time (a single-LUN zpool is declared corrupt
> if the underlying LUN goes away, usually due to SAN issues) was fixed. I
I do not recall such bug. There had been a bunch of bugs related to 
panics due to critical reads and writes failures, which were addressed 
with introduction of ''failmode'' property (and related fixes).
Could you
please provide exact bug number?
> did see that the host reset cycle issue with this was fixed, but I was
> wondering if we''re still concerned with "phantom"
unrecoverable zpool
> corruption when a quiet single-LUN zpool loses it''s vdev.
> 
> Am I correct in hearing that we''ve fixed this issue? Or not?
Without exact bug number it is impossible to answer your question.

Cheers,
Victor
> 
> -Erik
> 
> 
> 
> On Fri, 2009-05-08 at 12:41 -0700, Richard Elling wrote:
>> Bob Friesenhahn wrote:
>>> On Fri, 8 May 2009, Miles Nordin wrote:
>>>> It''s frustrating to keep going in circles.  Also I
think advising
>>>> people they no longer need to avoid single-LUN SAN pools is a
bad
>>>> idea.  And blaming the SAN problems in silent bit-flips when it
looks
>>>> pretty clearly that they actually lie elsewhere is dishonest
and
>>>> contributes to a widening credibility gap.
>>> Miles,
>>>
>>> Maybe I was not paying attention or maybe my SPAM filter is 
>>> over-aggressive since I seem to have lost track of the discussion. 
>>> Could you remind us of the problem you are trying to solve?  Has 
>>> anyone else but yourself encountered it?
>> If I may speak for Miles, he''s pining for the forensics tool
to replace
>> the current, manual method for attempting to recover a borked pool
>> by using old metadata.  He''s also concerned that people trust
their
>> SAN too much.  I agree, it is best if ZFS can manage data redundancy.
>> You will find similar recommendations in the appropriate docs.
>>  -- richard
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Apr 2009 - ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)

[zfs-discuss] ZFS + EMC Cx310 Array (JBOD ? Or Singe MetaLUN ?)