thr3ads.net - zfs discuss - [zfs-discuss] Does ZFS work with SAN-attached devices? [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Shawn Joy

2009-Oct-09 22:53 UTC

[zfs-discuss] Does ZFS work with SAN-attached devices?

Hi All, 

Its been a while since I touched zfs. Is the below still the case with zfs and
hardware raid array? Do we still need to provide two luns from the hardware raid
then zfs mirror those two luns?

http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid

Thanks, 
Shawn
-- 
This message posted from opensolaris.org

Ian Collins

2009-Oct-09 23:04 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

Shawn Joy wrote:> Hi All, 
>
> Its been a while since I touched zfs. Is the below still the case with zfs
and hardware raid array? Do we still need to provide two luns from the hardware
raid then zfs mirror those two luns?
>
> http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid
>
>   Need, no.  Should, yes.

The last two points on that page are key:

"Overall, ZFS functions as designed with SAN-attached devices, but if 
you expose simpler devices to ZFS, you can better leverage all available 
features.

In summary, if you use ZFS with SAN-attached devices, you can take 
advantage of the self-healing features of ZFS by configuring redundancy 
in your ZFS storage pools even though redundancy is available at a lower 
hardware level."

If you don''t give ZFS any redundancy, you risk loosing you pool if
there
is data corruption.

-- 
Ian.

Shawn Joy

2009-Oct-09 23:59 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

>If you don''t give ZFS any redundancy, you risk loosing you pool if there is data corruption.

Is this the same risk for data corruption as  UFS on hardware based luns?

If we present one LUN to ZFS and choose not to ZFS mirror or do a raidz 
pool of that LUN is ZFS able to handle disk or raid controllers failures 
on the hardware array?

Does ZFS handle intermittent controller outages on the raid controllers 
the same as what UFS would?

Thanks,
Shawn

Ian Collins wrote:> Shawn Joy wrote:
>> Hi All,
>> Its been a while since I touched zfs. Is the below still the case 
>> with zfs and hardware raid array? Do we still need to provide two 
>> luns from the hardware raid then zfs mirror those two luns?
>>
>> http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid
>>
>>   
> Need, no.  Should, yes.
>
> The last two points on that page are key:
>
> "Overall, ZFS functions as designed with SAN-attached devices, but if 
> you expose simpler devices to ZFS, you can better leverage all 
> available features.
>
> In summary, if you use ZFS with SAN-attached devices, you can take 
> advantage of the self-healing features of ZFS by configuring 
> redundancy in your ZFS storage pools even though redundancy is 
> available at a lower hardware level."
>
> If you don''t give ZFS any redundancy, you risk loosing you pool if
> there is data corruption.
>

Shawn Joy

2009-Oct-10 00:12 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

>If you don''t give ZFS any redundancy, you risk loosing you pool if
there is data corruption.
Is this the same risk for data corruption as  UFS on hardware based luns?

If we present one LUN to ZFS and choose not to ZFS mirror or do a raidz pool of
that LUN is ZFS able to handle disk or raid controllers failures on the hardware
array?

Does ZFS handle intermittent controller outages on the raid controllers the same
as what UFS would?

Thanks,
Shawn
-- 
This message posted from opensolaris.org

Ian Collins

2009-Oct-10 00:16 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

Shawn Joy wrote:>
>
> Ian Collins wrote:
>> Shawn Joy wrote:
>>> Hi All,
>>> Its been a while since I touched zfs. Is the below still the case 
>>> with zfs and hardware raid array? Do we still need to provide two 
>>> luns from the hardware raid then zfs mirror those two luns?
>>>
>>> http://www.opensolaris.org/os/community/zfs/faq/#hardwareraid
>>>
>>>   
>> Need, no.  Should, yes.
>>
>> The last two points on that page are key:
>>
>> "Overall, ZFS functions as designed with SAN-attached devices, but
if
>> you expose simpler devices to ZFS, you can better leverage all 
>> available features.
>>
>> In summary, if you use ZFS with SAN-attached devices, you can take 
>> advantage of the self-healing features of ZFS by configuring 
>> redundancy in your ZFS storage pools even though redundancy is 
>> available at a lower hardware level."
>>
>> If you don''t give ZFS any redundancy, you risk loosing you
pool if
>> there is data corruption.
>
> Is this the same risk for data corruption as  UFS on hardware based luns?
>Not really, UFS wouldn''t notice, ZFS would and the single device pool 
would be enter a faulted state.
> If we present one LUN to ZFS and choose not to ZFS mirror or do a 
> raidz pool of that LUN is ZFS able to handle disk or raid controllers 
> failures on the hardware array?
>I guess the only answer id "it depends".  A LUN is in effect just 
another drive, so if the failure is managed by the SAN, ZFS wouldn''t
know.
> Does ZFS handle intermittent controller outages on the raid 
> controllers the same as what UFS would?
>I haven''t used ZFS with a SAN device, but pulling a drive causes ZFS to
mark it unavailable and the pool degraded.

-- 
Ian.

Erik Trimble

2009-Oct-10 05:26 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

ZFS no longer has the issue where loss of a single device (even 
intermittently) causes pool corruption. That''s been fixed.

That is, there used to be an issue in this scenario:

(1) zpool constructed from a single LUN on a SAN device
(2) SAN experiences temporary outage, while ZFS host remains running.
(3) zpool is permanently corrupted, even if no I/O occured during outage

This is fixed. (around b101, IIRC)

However, ZFS remains much more sensitive to loss of the underlying LUN 
than UFS, and has a tendency to mark such a LUN as defective during any 
such SAN outage.  It''s much more recoverable nowdays, though.  Just to 
be clear, this occasionally occurs when something such as a SAN switch 
dies, or there is a temporary hiccup in the SAN infrastructure, causing 
some small (i.e. < minute) loss of connectivity to the underlying LUN.

RAIDZ and mirrored zpools are still the preferred method of arranging 
things in ZFS, even with hardware raid backing the underlying LUN 
(whether the LUN is from a SAN or local HBA doesn''t matter).

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Erik Trimble

2009-Oct-10 15:19 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

Shawn Joy wrote:> >If you don''t give ZFS any redundancy, you risk loosing you
pool if
> there is data corruption.
>
> Is this the same risk for data corruption as  UFS on hardware based luns?It''s a tradeoff. ZFS has more issues with loss of connectivity to the 
underlying LUN than UFS, while UFS has issues with inability to detect 
silent data corruption due to faulty HW writing/reading.  I can''t 
quantify the actual occurrence of either to any specific number, so I 
can''t say the risk is more or less.
> If we present one LUN to ZFS and choose not to ZFS mirror or do a 
> raidz pool of that LUN is ZFS able to handle disk or raid controllers 
> failures on the hardware array?
>No. But neither is /any/ other filesystem.  At best, if you lose a 
(non-redundant) hardware raid controller, you may be able to recover the 
volume upon replacement of the raid controller.  Maybe.  It depends on 
the failure mode of the raid controller. If the LUN itself is lost (i.e. 
a single-disk LUN where the disk goes bad, or a HW-RAID volume where 
failures exceed redundancy), no filesystem in the universe will help 
you.  I don''t see any real difference between UFS and ZFS in these
cases.

> Does ZFS handle intermittent controller outages on the raid 
> controllers the same as what UFS would?
>No. This is an issue with ZFS, as I noted in a previous post. 
Intermittent outages on the SAN have a tendency to cause ZFS to mark the 
LUN as "failed" - remember that ZFS acts both as a file system, and as
a
software volume manager.  Right now, I''m not aware of ways to make ZFS 
more resilient to a "flakey" SAN infrastructure. Since UFS has no 
awareness of a LUN''s reliability (that would be in  SVM), it will 
happily try to use a device whose underlying LUN has gone away, 
eventually reporting an inability to complete the relevant transaction 
to the calling software.



-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Richard Elling

2009-Oct-10 16:27 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

On Oct 10, 2009, at 8:19 AM, Erik Trimble wrote:
> Shawn Joy wrote:
>> >If you don''t give ZFS any redundancy, you risk loosing you
pool if
>> there is data corruption.
>>
>> Is this the same risk for data corruption as  UFS on hardware based  
>> luns?
> It''s a tradeoff. ZFS has more issues with loss of connectivity to
> the underlying LUN than UFS, while UFS has issues with inability to  
> detect silent data corruption due to faulty HW writing/reading.  I  
> can''t quantify the actual occurrence of either to any specific  
> number, so I can''t say the risk is more or less.
>
>> If we present one LUN to ZFS and choose not to ZFS mirror or do a  
>> raidz pool of that LUN is ZFS able to handle disk or raid  
>> controllers failures on the hardware array?
>>
> No. But neither is /any/ other filesystem.  At best, if you lose a  
> (non-redundant) hardware raid controller, you may be able to recover  
> the volume upon replacement of the raid controller.  Maybe.  It  
> depends on the failure mode of the raid controller. If the LUN  
> itself is lost (i.e. a single-disk LUN where the disk goes bad, or a  
> HW-RAID volume where failures exceed redundancy), no filesystem in  
> the universe will help you.  I don''t see any real difference
between
> UFS and ZFS in these cases.
>
>
>> Does ZFS handle intermittent controller outages on the raid  
>> controllers the same as what UFS would?
>>
> No. This is an issue with ZFS, as I noted in a previous post.  
> Intermittent outages on the SAN have a tendency to cause ZFS to mark  
> the LUN as "failed" - remember that ZFS acts both as a file
system,
> and as a software volume manager.  Right now, I''m not aware of
ways
> to make ZFS more resilient to a "flakey" SAN infrastructure.
Since
> UFS has no awareness of a LUN''s reliability (that would be in 
SVM),
> it will happily try to use a device whose underlying LUN has gone  
> away, eventually reporting an inability to complete the relevant  
> transaction to the calling software.
By default, ZFS won''t see a SAN outage less than 3 minutes long. It
will
patiently wait for the [s]sd driver to timeout... which takes 3 minutes.
UFS is the same.

OTOH, if the "outage" is not a complete outage, then error messages
will be handled according to the error received. If a sudden batch of
I/O failures is received, IIRC the default is 10 in 10 minutes, then the
vdev can be marked as degraded.  You can see this in the FMA logs.

NB  definitions of the pool states, including "degraded" are in the  
zpool(1m)
man page.
  -- richard

David Magda

2009-Oct-10 17:56 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

On Oct 10, 2009, at 01:26, Erik Trimble wrote:
> That is, there used to be an issue in this scenario:
>
> (1) zpool constructed from a single LUN on a SAN device
> (2) SAN experiences temporary outage, while ZFS host remains running.
> (3) zpool is permanently corrupted, even if no I/O occured during  
> outage
>
> This is fixed. (around b101, IIRC)
Was this fix ever back-ported to Solaris 10?

Victor Latushkin

2009-Oct-10 19:00 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

Erik Trimble wrote:> ZFS no longer has the issue where loss of a single device (even 
> intermittently) causes pool corruption. That''s been fixed.
Erik, it does not help at all when you are talking about some issue 
being fixed and does not provide corresponding CR number. It does not 
allow interested observer to go have a look what exactly that issue was, 
how it''s been fixed, does not allow to track it presence or absence in 
other releases.

So could you please provide CR number for an issue you are talking about?

> That is, there used to be an issue in this scenario:
> 
> (1) zpool constructed from a single LUN on a SAN device
> (2) SAN experiences temporary outage, while ZFS host remains running.
> (3) zpool is permanently corrupted, even if no I/O occured during outage
> 
> This is fixed. (around b101, IIRC)
You see - you cannot tell exactly when it was fixed yourself. Besides, 
in the scenario you describe above a whole lot can be hidden in the "SAN 
experiences temporary outage". It can be as simple as wrong fiber cable 
being unplugged, and as complex as some storage array failing, rebooting 
  and loosing its entire cache content as a result.

In the former case I do not see how it could badly affect ZFS pool. It 
may cause panic, if ''failmode'' is set to panic (or software
release is
too old and does not support this property), it may require 
administrator intervention to do ''zpool clear''.

In the latter case consequences can really be bad - pool may be 
corrupted and unopenable. And there are several examples of this in the 
archives, as well as success stories of successful recovery.

And there''s recovery project to provide support for pool recovery 
resulting from these corruptions.
> However, ZFS remains much more sensitive to loss of the underlying
> LUN  than UFS, and has a tendency to mark such a LUN as defective > during any such SAN outage. It''s much more recoverable nowdays,
 > though. Just to be clear, this occasionally occurs when something such
 > as a SAN switch dies, or there is a temporary hiccup in the SAN
 > infrastructure, causing some small (i.e. < minute) loss of
 > connectivity to the underlying LUN.

Again, SANs are very complex structures, and perceived small loss of 
connectivity may in reality be very complex event with difficult to 
predict consequences.

With non-COW filesystems (like UFS) it is indeed less likely to 
experience consequences of small outage immediately (though they can 
still manifest itself much much later).

ZFS tends to uncover presence of the consequences much earlier 
(immediately?). But that does not immediately mean there''s an issue
with
ZFS. There may be issue somewhere within SAN infrastructure which was 
only unavailable for less than a minute.
> RAIDZ and mirrored zpools are still the preferred method of arranging 
> things in ZFS, even with hardware raid backing the underlying LUN 
> (whether the LUN is from a SAN or local HBA doesn''t matter).
Fully support this - without redundancy at the ZFS level there''s no
such
benefit as self-healing...

regards,
victor

Erik Trimble

2009-Oct-11 03:43 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

Victor Latushkin wrote:> Erik Trimble wrote:
>> ZFS no longer has the issue where loss of a single device (even 
>> intermittently) causes pool corruption. That''s been fixed.
>
> Erik, it does not help at all when you are talking about some issue 
> being fixed and does not provide corresponding CR number. It does not 
> allow interested observer to go have a look what exactly that issue 
> was, how it''s been fixed, does not allow to track it presence or 
> absence in other releases.
>
> So could you please provide CR number for an issue you are talking about?
>I went back and dug through some of my email, and the issue showed up as 
CR 6565042.

That was fixed in b77 and s10 update 6.

I''m looking for the related issues of timeout failures and kernel panic
before import for missing zpools.

>> That is, there used to be an issue in this scenario:
>>
>> (1) zpool constructed from a single LUN on a SAN device
>> (2) SAN experiences temporary outage, while ZFS host remains running.
>> (3) zpool is permanently corrupted, even if no I/O occured during
outage
>>
>> This is fixed. (around b101, IIRC)
>
> You see - you cannot tell exactly when it was fixed yourself. Besides, 
> in the scenario you describe above a whole lot can be hidden in the 
> "SAN experiences temporary outage". It can be as simple as wrong
fiber
> cable being unplugged, and as complex as some storage array failing, 
> rebooting  and loosing its entire cache content as a result.
>
> In the former case I do not see how it could badly affect ZFS pool. It 
> may cause panic, if ''failmode'' is set to panic (or
software release is
> too old and does not support this property), it may require 
> administrator intervention to do ''zpool clear''.
>
> In the latter case consequences can really be bad - pool may be 
> corrupted and unopenable. And there are several examples of this in 
> the archives, as well as success stories of successful recovery.
>
> And there''s recovery project to provide support for pool recovery 
> resulting from these corruptions.
>
>> However, ZFS remains much more sensitive to loss of the underlying
>> LUN  than UFS, and has a tendency to mark such a LUN as defective
> > during any such SAN outage. It''s much more recoverable
nowdays,
> > though. Just to be clear, this occasionally occurs when something such
> > as a SAN switch dies, or there is a temporary hiccup in the SAN
> > infrastructure, causing some small (i.e. < minute) loss of
> > connectivity to the underlying LUN.
>
> Again, SANs are very complex structures, and perceived small loss of 
> connectivity may in reality be very complex event with difficult to 
> predict consequences.
>
> With non-COW filesystems (like UFS) it is indeed less likely to 
> experience consequences of small outage immediately (though they can 
> still manifest itself much much later).
>
> ZFS tends to uncover presence of the consequences much earlier 
> (immediately?). But that does not immediately mean there''s an
issue
> with ZFS. There may be issue somewhere within SAN infrastructure which 
> was only unavailable for less than a minute.
>I''m not saying that it''s ZFS''s fault. I''m
saying that ZFS is more
sensitive to SAN issues than UFS.   As Richard pointed out earlier,
it''s
unlikely that very small hiccups will have an impact - generally, 
timeout stops have to be hit on the various underlying drivers.

>> RAIDZ and mirrored zpools are still the preferred method of arranging 
>> things in ZFS, even with hardware raid backing the underlying LUN 
>> (whether the LUN is from a SAN or local HBA doesn''t matter).
>
> Fully support this - without redundancy at the ZFS level there''s
no
> such benefit as self-healing...
>
> regards,
> victor

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Shawn Joy

2009-Oct-12 00:07 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

>I went back and dug through some of my email, and the issue showed up as
>CR 6565042.
>
>That was fixed in b77 and s10 update 6.
I looked at this CR, forgive me but I am not a ZFS engineer. Can you explain in,
simple terms, how ZFS now reacts to this? If it does not panic how does it
insure data is save?

Also, just want to ensure everyone is on the same page here. There seems to be
some mixed messages in this thread about how sensitive ZFS is to SAN issues.

Do we all agree that creating a zpool out of one device in a SAN environment is
not recommended. One should always constructs a zfs mirror or raidz device out
of SAN attached devices, as posted in the ZFS FAQ?
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Oct-12 00:42 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

On Sun, 11 Oct 2009, Shawn Joy wrote:>
> Do we all agree that creating a zpool out of one device in a SAN 
> environment is not recommended. One should always constructs a zfs 
> mirror or raidz device out of SAN attached devices, as posted in the 
> ZFS FAQ?
No, not everyone agrees.  Not even the zfs inventors.  As with most 
things, it is not a black/white issue and there are plenty of valid 
reasons to put zfs on a big-LUN SAN device.  It does not necessarily 
end badly.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Shawn Joy

2009-Oct-12 00:50 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

>I went back and dug through some of my email, and the issue showed up as
>CR 6565042.
>
>That was fixed in b77 and s10 update 6.
>
>I looked at this CR, forgive me but I am not a ZFS engineer. Can you explain
in, >simple terms, how ZFS now reacts to this? If it does not panic how does
it insure >data is save?
Found some  conflicting information

Infodoc: 211349 Solaris[TM] ZFS & Write Failure. 

"ZFS will handle the drive failures gracefully as part of the BUG 6322646
fix in the case of non-redundant configurations by degrading the pool instead of
initiating a system panic with the help of Solaris[TM] FMA framework."
>From Richards post above."NB definitions of the pool states, including "degraded" are in
the
zpool(1m)
man page.
-- richard"
>From zpool man page located below.http://docs.sun.com/app/docs/doc/819-2240/zpool-1m?l=en&a=view&q=zpool

"Device Failure and Recovery

      ZFS supports a rich set of mechanisms for handling device failure and data
corruption. All metadata and data is checksummed, and ZFS automatically repairs
bad data from a good copy when corruption is detected.

      In order to take advantage of these features, a pool must make use of some
form of redundancy, using either mirrored or raidz groups. While ZFS supports
running in a non-redundant configuration, where each root vdev is simply a disk
or file, this is strongly discouraged. A single case of bit corruption can
render some or all of your data unavailable.

      A pool''s health status is described by one of three states:
online, degraded, or faulted. An online pool has all devices operating normally.
A degraded pool is one in which one or more devices have failed, but the data is
still available due to a redundant configuration. A faulted pool has corrupted
metadata, or one or more faulted devices, and insufficient replicas to continue
functioning.

      The health of the top-level vdev, such as mirror or raidz device, is
potentially impacted by the state of its associated vdevs, or component devices.
A top-level vdev or component device is in one of the following states:"

So from the zpool man page it seems that it is not possible to put a single
device zpool in a degraded state. Is this correct or does the fix in Bugs
6565042 and 6322646 change this behavior.

>
>Also, just want to ensure everyone is on the same page here. There seems to
be >some mixed messages in this thread about how sensitive ZFS is to SAN
issues.
>
>Do we all agree that creating a zpool out of one device in a SAN environment
is >not recommended. One should always constructs a zfs mirror or raidz
device out >of SAN attached devices, as posted in the ZFS FAQ?
The zpool man page seem to agree with this. Is this correct?
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Oct-12 01:31 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

On Sun, 11 Oct 2009, Shawn Joy wrote:>
> So from the zpool man page it seems that it is not possible to put a 
> single device zpool in a degraded state. Is this correct or does the 
> fix in Bugs 6565042 and 6322646 change this behavior.
It is true that it is not possible to use the pool if the device that 
it is based on is inaccessible or missing.  If the device totally 
fails or scrambles its data, then the pool is permanently lost.
>> Also, just want to ensure everyone is on the same page here. There 
>> seems to be some mixed messages in this thread about how sensitive 
>> ZFS is to SAN issues.
>>
>> Do we all agree that creating a zpool out of one device in a SAN 
>> environment is not recommended. One should always constructs a zfs 
>> mirror or raidz device out of SAN attached devices, as posted in 
>> the ZFS FAQ?
>
> The zpool man page seem to agree with this. Is this correct?
In life there are many things that we "should do" (but often
don''t).
There are always trade-offs.  If you need your pool to be able to 
operate with a device missing, then the pool needs to have sufficient 
redundancy to keep working.  If you want your pool to survive if a 
disk gets crushed by a wayward fork lift, then you need to have 
redundant storage so that the data continues to be available.

If the devices are on a SAN and you want to be able to continue 
operating while there is a SAN failure, then you need to have 
redundant SAN switches, redundant paths, and redundant storage 
devices, preferably in a different chassis.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Miles Nordin

2009-Oct-12 17:58 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

>>>>> "sj" == Shawn Joy <shawn.joy at sun.com>
writes:
sj> Can you explain in, simple terms, how ZFS now reacts
sj> to this?

I can''t. :) I think Victor''s long message made a lot of
sense. The
failure modes with a SAN are not simple. At least there is the
difference of whether the target''s write buffer was lost after a
transient failure or not, and the current storage stack assumes it''s
never lost.

IMHO, SAN''s are in general broken by design because their software
stacks don''t deal predictably with common network failure modes (like
the target rebooting, but the initiator staying up). The standard
that would qualify to me as ``deal predictably'''' would be what
NFS
provides:

* writes are double-cached on client and server, so the client can
replay them if the server crashes. To my limited knowledge, no SAN
stack does this. Expensive SAN''s can limit the amount of data at
risk with NVRAM, but it seems like there would always be a little
bit of data in-flight.

A cost-conscious Solaris iSCSI target will put a quite large amount
of data at risk between sync-cache commands.

This is okay, just as it''s ok for NFS servers, but only if all the
initiators reboot whenver the target reboots.

Doing the client side part of the double-caching is a little tricky
because I think you really want to do it pretty high in the storage
stack, maybe in ZFS rather than in the initiator, or else you will
be triple-caching a TXG (twice on the client, once on the server)
which can be pretty big. This means introducing the idea that a
sync-cache command can fail, and that when it does, none/some/all
of the writes between the last sync-cache that succeeded and the
current one that failed may have been silently lost even if those
write commands were ack''d successful when they were issued.

* the bcp for NFS mount type is ''hard,intr'' meaning, retry
forever if
there is a failure. If you want to stop retrying, whatever app was
doing the writing gets killed. This rule means any database file
that got ``intr''d'''' will be crash-consistent.

The SAN equivalent of ''intr'' would be force-unmounting the
filesystem (and force-unmounting implies either killing processes
with open files or giving persistent errors to any open
filehandles). I''m pretty sure no SAN stack does this intentionally
whenever it''s needed---rather it just sort of happens sometimes
depending on how errors percolate upwards through various
nested cargo-cult timeouts.

I guess it would be easy to add to a first order---just make SAN
targets stay down forever after they bounce until ZFS marks them
offline. The tricky part is the complaints you get after: ``how do
I add this target back without rebooting?'''', ``do I really
have to
resilver? It''s happening daily so I''m basically always
resilvering.'''', ``we are going down twice a day because of
harmless
SAN glitches that we never noticed before---is this really
necessary?'''' I think I remember some post that made it
sound like
people were afraid to touch any of the storage exception handling
because no one knows what cases are really captured by the many
stupid levels of timeouts and retries.

In short, to me it sounds like the retry state machines of SAN
initiators are broken by design, across the board. They make the same
assumption they did for local storage: the only time data in a target
write buffer will get lost is during a crash-reboot. This is wrong
not only for SAN''s but also for hot-pluggable drives which can have
power sags that get wrongly treated the same way as CRC errors on the
data cable. It''s possible to get it right, like NFS is right, but
instead the popular fix with most people is to leave the storage stack
broken but make ZFS more resilient to this type of corruption, like
other filesystems are, because resilience is good, and people are
always twitchy and frightened and not expecting strictly consistent
behavior around their SAN''s anyway so the problem is rare.

So far SAN targets have been proprietary, so vendors are free to
conceal this problem with protocol tweaks, expensive NVRAM''s, and
giving undefended or fuzzed advice through their support channels to
their paranoid, accepting sysadmins. Whatever free and open targets
behaved differently were assumed to be ``immature.''''
Hopefully now
that SAN''s are opening up this SAN write hole will finally get plugged
somehow,

...maybe with one of the two * points above, and if we were to pick
the second * then we''d probably need some notion of a ``target boot
cookie'''' so we only take the ''intr''-like
force-unmount path in the
cases where it''s really needed.

sj> Do we all agree that creating a zpool out of one device in a
sj> SAN environment is not recommended.

This is still a good question. The stock response is ``ZFS needs to
manage at least one layer of <blah blah>'''', but this
problem (SAN
target reboots while initiator does not) isn''t unexplained
storagechaos or cosmic bitflip gremlins. Does anyone know if however
much zpool redundancy helps with this type of event has changed
before/after b77?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091012/f96209f9/attachment.bin>

Shawn Joy

2009-Oct-13 12:29 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

>In life there are many things that we "should do" (but often
don''t).
>There are always trade-offs. If you need your pool to be able to
>operate with a device missing, then the pool needs to have sufficient
>redundancy to keep working. If you want your pool to survive if a
>disk gets crushed by a wayward fork lift, then you need to have
>redundant storage so that the data continues to be available.
>
>If the devices are on a SAN and you want to be able to continue
>operating while there is a SAN failure, then you need to have
>redundant SAN switches, redundant paths, and redundant storage
>devices, preferably in a different chassis.
Yes, of course. This is part of normal SAN design. 

The ZFS file systems is what is different here. If a either a HBA, fibre cable,
redundant controller fail or firmware issues on a array redundant controller
occur then SSTM (MPXIO) will see the issue and try and fail things over to the
other controller. Of course this reaction at the SSTM level takes time. UFS
simply allows this to happen. It is my understanding ZFS can have issues with
this hence the reason why a zfs mirror or raidz device is required.

Still not clear how the above mentioned BUGS change the behavior of zfs and if
they change the recommendations of the zpool man page.
>
>Bob
>--
>Bob Friesenhahn
>bfriesen at simple dot dallas dot tx dot us,
http://www.simplesystems.org/users/bfriesen/
>GraphicsMagick Maintainer, http://www.GraphicsMagick.org/


_______________________________________________
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Oct-13 14:21 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

On Tue, 13 Oct 2009, Shawn Joy wrote:
> The ZFS file systems is what is different here. If a either a HBA, 
> fibre cable, redundant controller fail or firmware issues on a array 
> redundant controller occur then SSTM (MPXIO) will see the issue and 
> try and fail things over to the other controller. Of course this 
> reaction at the SSTM level takes time. UFS simply allows this to 
> happen. It is my understanding ZFS can have issues with this hence 
> the reason why a zfs mirror or raidz device is required.
ZFS does not seem so different than UFS when it comes to a SAN.  ZFS 
depends on the underlying device drivers to detect and report 
problems.  UFS does the same.  MPXIO''s response will also depend on 
the underlying device drivers.

My own reliability concerns regarding a "SAN" are due to the big-LUN 
that SAN hardware usually emulates and not due to communications in 
the "SAN".  A big-LUN is comprised of multiple disk drives.  If the 
SAN storage array has an error, then it is possible that the data on 
one of these disk drives will be incorrect, and it will be hidden 
somewhere in that big LUN.  The data could be old data rather than 
just being "corrupted".  Without redundancy ZFS will detect this 
corruption but will be unable to repair it.  The difference from UFS 
is that UFS might not even notice the corruption, or fsck will just 
paper it over.  UFS filesystems are usually much smaller than ZFS 
pools.

There are performance concerns when using a big-LUN because ZFS won''t 
be able to intelligently schedule I/O for multiple drives, so 
performance is reduced.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Andrew Gabriel

2009-Oct-13 14:49 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

Bob Friesenhahn wrote:> My own reliability concerns regarding a "SAN" are due to the
big-LUN
> that SAN hardware usually emulates and not due to communications in the 
> "SAN".  A big-LUN is comprised of multiple disk drives.  If the
SAN
> storage array has an error, then it is possible that the data on one of 
> these disk drives will be incorrect, and it will be hidden somewhere in 
> that big LUN.  The data could be old data rather than just being 
> "corrupted".  Without redundancy ZFS will detect this corruption
but
> will be unable to repair it.  The difference from UFS is that UFS might 
> not even notice the corruption, or fsck will just paper it over.  UFS 
> filesystems are usually much smaller than ZFS pools.
> 
> There are performance concerns when using a big-LUN because ZFS
won''t be
> able to intelligently schedule I/O for multiple drives, so performance 
> is reduced.
Also, ZFS does things like putting the ZIL data (when not on a dedicated 
device) at the outer edge of disks, that being faster. When you have a 
LUN which doesn''t map on to standard performance profile of a disk,
this
optimsation is lost.

I give talks on ZFS to Enterprise customers, and this area is something 
I cover. Where possible, give ZFS visibility of redundancy, and as many 
LUNs as you can. However, we have to recognise that this isn''t always 
possible. In many enterprises, storage is managed by separate teams from 
servers (this is a legal requirement in some industry sectors in some 
countries, typically finance), often with very little cooperation 
between teams, indeed even rivalry. If we said ZFS _had_ to handle lots 
of LUNs and the data redundancy, it would never get through many data 
centre doors, so we do have to work in this environment.

Even where customers can''t make use of some of the features such as
self
healing data corruptions, I/O scheduling, etc, because of their company 
storage infrastructure limitations, there''s still a ton of other 
goodness in there too with ease of creating filesystems, snapshots, etc. 
and we will at least let them know when their multi-million dollar 
storage system silently drops a bit, which they tend to far more often 
than most customers realise.

-- 
Andrew

Neil Perrin

2009-Oct-13 18:15 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

> Also, ZFS does things like putting the ZIL data (when not on a dedicated 
> device) at the outer edge of disks, that being faster.
No, ZFS does not do that. It will chain the intent log from blocks allocated
from the same metaslabs that the pool is allocating from.
This actually works out well because there isn''t a large seek back to
the
beginning of the device. When the pool gets near full then there will be
a noticeable slowness - but then all file systems performance suffer
when searching for space.

When the log is on a separate device it uses the same allocation scheme but
those blocks will tend to be allocated at the outer edge of the disk.
They only exist for a short time before getting freed, so the same
blocks gets re-used.

Neil.

Shawn Joy

2009-Oct-14 14:05 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

Just to put closure to this discussion about how CR  6565042 and 6322646 change
how ZFS functions with in the below scenario.
>ZFS no longer has the issue where loss of a single device (even
>intermittently) causes pool corruption. That''s been fixed.
>
>That is, there used to be an issue in this scenario:
>
>(1) zpool constructed from a single LUN on a SAN device
>(2) SAN experiences temporary outage, while ZFS host remains running.
>(3) zpool is permanently corrupted, even if no I/O occured during outage
>
>This is fixed. (around b101, IIRC)
>
>I went back and dug through some of my email, and the issue showed up as
>CR 6565042.
>
>That was fixed in b77 and s10 update 6." 
After doing further research, and speaking with the CR engineers,  the CR
changes seem to be included in an overall fix for ZFS panic situations. The
Zpool can still go into a degraded or faulted state, which will require manual
intervention by the user.

This fix was discussed above in information from infodoc 211349 Solaris[TM] ZFS
& Write Failure

 "ZFS will handle the drive failures gracefully as part of the BUG 6322646
fix in the case of non-redundant configurations by degrading the pool instead of
initiating a system panic with the help of Solaris[TM] FMA framework."
-- 
This message posted from opensolaris.org

Miles Nordin

2009-Oct-14 14:52 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

>>>>> "sj" == Shawn Joy <shawn.joy at sun.com>
writes:
    sj>  "ZFS will handle the drive failures gracefully as part of the
    sj> BUG 6322646 fix in the case of non-redundant configurations by
    sj> degrading the pool instead of initiating a system panic with
    sj> the help of Solaris[TM] FMA

The problem was not system panics.  It was lost pools.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091014/b01e04ff/attachment.bin>

Shawn Joy

2009-Oct-15 14:29 UTC

head link

[zfs-discuss] Does ZFS work with SAN-attached devices?

Prior to this fix ZFS would panic the systems in order to avoid data corruption
and loss of the zpool.

Now the pool goes into a degraded or faulted state and one can "try"
the zpool clear command to correct the issue. If this does not succeed a reboot
is required.
-- 
This message posted from opensolaris.org

Reasonably Related Threads

Search for more reasonably related threads

zfs discuss - Oct 2009 - Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

[zfs-discuss] Does ZFS work with SAN-attached devices?

Reasonably Related Threads