thr3ads.net - zfs discuss - [zfs-discuss] Split responsibility for data with ZFS [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Gary Mills

2008-Dec-10 18:46 UTC

[zfs-discuss] Split responsibility for data with ZFS

Large sites that have centralized their data with a SAN typically have
a storage device export block-oriented storage to a server, with a
fibre-channel or Iscsi connection between the two.  The server sees
this as a single virtual disk.  On the storage device, the blocks of
data may be spread across many physical disks.  The storage device
looks after redundancy and management of the physical disks.  It may
even phone home when a disk fails and needs to be replaced.  The
storage device provides reliability and integrity for the blocks of
data that it serves, and does this well.

On the server, a variety of filesystems can be created on this virtual
disk.  UFS is most common, but ZFS has a number of advantages over
UFS.  Two of these are dynamic space management and snapshots.  There
are also a number of objections to employing ZFS in this manner.
``ZFS cannot correct errors'''', and ``you will lose all of your
data''''
are two of the alarming ones.  Isn''t ZFS supposed to ensure that data
written to the disk are always correct?  What''s the real problem here?

This is a split responsibility configuration where the storage device
is responsible for integrity of the storage and ZFS is responsible for
integrity of the filesystem.  How can it be made to behave in a
reliable manner?  Can ZFS be better than UFS in this configuration?
Is a different form of communication between the two components
necessary in this case?

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

Bob Friesenhahn

2008-Dec-10 19:08 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Wed, 10 Dec 2008, Gary Mills wrote:>
> This is a split responsibility configuration where the storage device
> is responsible for integrity of the storage and ZFS is responsible for
> integrity of the filesystem.  How can it be made to behave in a
> reliable manner?  Can ZFS be better than UFS in this configuration?
> Is a different form of communication between the two components
> necessary in this case?
The issue is really that the SAN device error detection and correction 
is not as robust as what is used by ZFS.  The vast majority of SAN 
devices do not do 100% data error detection.  ZFS is in a position to 
detect errors that the SAN devices can not detect.

I doubt that ZFS is any more likely to lose your data than UFS is, but 
ZFS is vastly more likely to detect if there is a problem with the 
data that your SAN device is returning.

For my own situation, I configured my SAN array to be a fiber channel 
"JBOD" and ZFS handles all the data integrity issues associated with 
the disks.  After 10 months I have yet to encounter any issue and 
performance is excellent.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Will Murnane

2008-Dec-10 19:10 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Wed, Dec 10, 2008 at 18:46, Gary Mills <mills at cc.umanitoba.ca>
wrote:> The
> storage device provides reliability and integrity for the blocks of
> data that it serves, and does this well.But not well enough.  Even if the storage does a perfect job keeping
its bits correct on disk, there are a lot of steps between the array
and the CPU.  If one of those steps is faulty, data could be silently
corrupted.  ZFS checks data all the way from the CPU to the disk and
back to the CPU, and the storage array fundamentally cannot do that.
> a number of objections to employing ZFS in this manner.
> ``ZFS cannot correct errors'''', and ``you will lose all of
your data''''
> are two of the alarming ones.  Isn''t ZFS supposed to ensure that
data
> written to the disk are always correct?  What''s the real problem
here?These problems are caused by not letting ZFS handle a level of
redundancy.  If you export raid-0 (or raid-5, or single-disk) LUNs
from the array and mirror them on the host side, this will solve the
problem.

This does mean that your array is doing extra work that''s not getting
used.  I don''t see any way around it.  It also means that you need
twice the bandwidth to the storage array.  I don''t see any way around
that either.  ZFS really loves high bandwidth, which gives the
advantage to direct-connected storage: SAS arrays, and so forth.
> the storage device
> is responsible for integrity of the storage and ZFS is responsible for
> integrity of the filesystem.Turning off checksumming on the ZFS side may ``solve'''' the
problem.

Will

Nicolas Williams

2008-Dec-10 19:30 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Wed, Dec 10, 2008 at 12:46:40PM -0600, Gary Mills
wrote:> On the server, a variety of filesystems can be created on this virtual
> disk.  UFS is most common, but ZFS has a number of advantages over
> UFS.  Two of these are dynamic space management and snapshots.  There
> are also a number of objections to employing ZFS in this manner.
> ``ZFS cannot correct errors'''', and ``you will lose all of
your data''''
> are two of the alarming ones.  Isn''t ZFS supposed to ensure that
data
> written to the disk are always correct?  What''s the real problem
here?
ZFS has very strong error detection built-in, and for mirrored and
RAID-Zed pools can recover from errors automatically as long as there''s
a mirror left or enough disks in RAID-Z left to complete the recovery.

ZFS can also store multiple copies of data and metadata even in
non-mirrored/non-RAID-Z pools.

ZFS always leaves the filesystem in a consistent state, provided the
drives aren''t lying.

Whoever is making those objections is misinformed.
> This is a split responsibility configuration where the storage device
> is responsible for integrity of the storage and ZFS is responsible for
> integrity of the filesystem.  How can it be made to behave in a
> reliable manner?  Can ZFS be better than UFS in this configuration?
It does.  It is.
> Is a different form of communication between the two components
> necessary in this case?
No.

Note that you''ll generally be better off using RAID-Z than HW RAID-5.

Nico
--

Ross

2008-Dec-10 20:16 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

I agree completely with your assessment of the problems Gary, when ZFS
can''t correct your data you do seem to be at high risk of loosing data,
although some people are able to recover it with the help of a couple of helpful
souls on this forum.

I can think of one scenario where you might be able to turn this configuration
to your advantage though.  If you have two SAN''s for redundancy, it
would be possible to link each to your ZFS server and create a ZFS mirror.  That
then gives you the best of both worlds (while potentially avoiding SAN remote
mirroring licences which tend to be expensive).

You could also potentially mirror to a local disk, although I suspect that will
have a noticeable impact on performance in most situations.

Failing that, as others have suggested, export multiple LUN''s from your
SAN and create a ZFS raid array or mirror.
-- 
This message posted from opensolaris.org

Nicolas Williams

2008-Dec-10 20:26 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Wed, Dec 10, 2008 at 01:30:30PM -0600, Nicolas Williams
wrote:> On Wed, Dec 10, 2008 at 12:46:40PM -0600, Gary Mills wrote:
> > On the server, a variety of filesystems can be created on this virtual
> > disk.  UFS is most common, but ZFS has a number of advantages over
> > UFS.  Two of these are dynamic space management and snapshots.  There
> > are also a number of objections to employing ZFS in this manner.
> > ``ZFS cannot correct errors'''', and ``you will lose
all of your data''''
> > are two of the alarming ones.  Isn''t ZFS supposed to ensure
that data
> > written to the disk are always correct?  What''s the real
problem here?
> 
> ZFS has very strong error detection built-in, and for mirrored and
> RAID-Zed pools can recover from errors automatically as long as
there''s
> a mirror left or enough disks in RAID-Z left to complete the recovery.
Oh, but I get it: all the redundancy here would be in the SAN, and the
ZFS pools would have no mirrors, no RAID-Z.

As I said:
> Note that you''ll generally be better off using RAID-Z than HW
RAID-5.
Precisely because ZFS can reconstruct the correct data if it''s
responsible for redundancy.

But note that the setup you describe puts ZFS in no worse a situation
than any other filesystem.

Richard Elling

2008-Dec-10 20:58 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

Nicolas Williams wrote:> On Wed, Dec 10, 2008 at 01:30:30PM -0600, Nicolas Williams wrote:
>   
>> On Wed, Dec 10, 2008 at 12:46:40PM -0600, Gary Mills wrote:
>>     
>>> On the server, a variety of filesystems can be created on this
virtual
>>> disk.  UFS is most common, but ZFS has a number of advantages over
>>> UFS.  Two of these are dynamic space management and snapshots. 
There
>>> are also a number of objections to employing ZFS in this manner.
>>> ``ZFS cannot correct errors'''', and ``you will
lose all of your data''''
>>> are two of the alarming ones.  Isn''t ZFS supposed to
ensure that data
>>> written to the disk are always correct?  What''s the real
problem here?
>>>       
>> ZFS has very strong error detection built-in, and for mirrored and
>> RAID-Zed pools can recover from errors automatically as long as
there''s
>> a mirror left or enough disks in RAID-Z left to complete the recovery.
>>     
>
> Oh, but I get it: all the redundancy here would be in the SAN, and the
> ZFS pools would have no mirrors, no RAID-Z.
>
> As I said:
>
>   
>> Note that you''ll generally be better off using RAID-Z than HW
RAID-5.
>>     
>
> Precisely because ZFS can reconstruct the correct data if it''s
> responsible for redundancy.
>
> But note that the setup you describe puts ZFS in no worse a situation
> than any other filesystem.
>   
Well, actually, it does.  ZFS is susceptible to a class of failure modes
I classify as "kill the canary" types.  ZFS will detect errors and
complain
about them, which results in people blaming ZFS (the canary).  If you
follow this forum, you''ll see a "kill the canary" post about
every month
or so. 

By default, ZFS implements the policy that uncorrectable, but important
failures may cause it to do an armadillo impression (staying with the
animal theme ;-) but for which some other file systems, like UFS, will
blissfully ignore -- putting data at risk.  Occasionally, arguments will
arise over whether this is the best default policy, though most folks
seem to agree that data corruption is a bad thing.  Later versions of
ZFS, particularly that available in Solaris 10 10/08 and all OpenSolaris
releases, allow system admins to have better control over these policies.
 -- richard

Nicolas Williams

2008-Dec-10 21:11 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Wed, Dec 10, 2008 at 12:58:48PM -0800, Richard Elling
wrote:> Nicolas Williams wrote:
> >But note that the setup you describe puts ZFS in no worse a situation
> >than any other filesystem.
> 
> Well, actually, it does.  ZFS is susceptible to a class of failure modes
> I classify as "kill the canary" types.  ZFS will detect errors
and complain
> about them, which results in people blaming ZFS (the canary).  If you
> follow this forum, you''ll see a "kill the canary" post
about every month
> or so. 
> 
> By default, ZFS implements the policy that uncorrectable, but important
> failures may cause it to do an armadillo impression (staying with the
> animal theme ;-) but for which some other file systems, like UFS, will
> blissfully ignore -- putting data at risk.  Occasionally, arguments will
> arise over whether this is the best default policy, though most folks
> seem to agree that data corruption is a bad thing.  Later versions of
> ZFS, particularly that available in Solaris 10 10/08 and all OpenSolaris
> releases, allow system admins to have better control over these policies.
I''ve seen many of those threads.  ZFS won''t put your data at
risk, but
the user is used to UFS (and others) doing so, and they tend to prefer
that to ZFS panics.  It''s not that ZFS puts your data at risk in this
scenario, but your operations, which for many is actually much worse
than risking their data.

Here''s hoping for the end of HW RAID.

Nico
--

Miles Nordin

2008-Dec-10 21:37 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

>>>>> "nw" == Nicolas Williams <Nicolas.Williams at
sun.com> writes:
>>>>> "wm" == Will Murnane <will.murnane at
gmail.com> writes:
nw> ZFS has very strong error detection built-in,

nw> ZFS can also store multiple copies of data and metadata even
nw> in non-mirrored/non-RAID-Z pools.

nw> Whoever is making those objections is misinformed.

The objection, to review, is that people are losing entire ZFS pools
on SAN''s more often than UFS pools on the same SAN. This is
experience. One might start trying to infer the reason, from the
manual recovery workarounds that have worked: using an older
ueberblock.

wm> Turning off checksumming on the ZFS side may
``solve'''' the
wm> problem.

That wasn''t the successful answer for people who lost pools and then
recovered them. Based on my limited understanding I don''t think it
would help a pool that was recovered by using an older ueberblock.

Also to pick a nit, AIUI certain checksums on the metadata can''t be
disabled because they''re used in place of write-barriered commit
sectors. I might be wrong though.

nw> ZFS always leaves the filesystem in a consistent state,
nw> provided the drives aren''t lying.

ZFS needs to give similar reliability performance to competing
filesystems while running on the drives and SANs that exist now.

Alternatively, if you want to draw a line in the sand on the ``blame
the device'''' position, the problems causing lost pools have to
be
actually tracked down and definitively blamed on misimplemented
devices, and we need to develop a procedure to identify and disqualify
the misimplemented devices. When we follow the qualification
procedure before loading data into the pool, you''re no longer allowed
to blame devices with hindsight after the pool''s lost by pointing at
self-exhonerating error messages or telling stories about theoretical
capabilities of the on-disk format. We also need to develop a list of
broken devices so we can avoid buying them, and the list needs not to
be a secret list rumored to contain ``drives from major
vendors'''' for
fear of vendors retaliating by repealing discounts or whatever. I
kind of prefer this approach, but the sloppier approach of working
around the problem (``working around'''' meaing automatically,
safely,
somewhat-quickly, hopefully not silently, recovering from often-seen
kinds of corruption without rigorously identifying their root causes,
just like fsck does on other filesystems) is probably easier to
implement.

Other filesystems like ext3 and XFS on Linux have gone through the
same process of figuring out why corruption was happening and working
around it through changing the way they write, sending drives
STOP_UNIT commands before ACPI powerdown, the rumored ``about to lose
power'''' interrupt on SGI that makes Irix cancel DMA, and
mostly adding
special cases to fsck, and so on.

I think the obstructionist recitations of on-disk-format feature lists
explaining why this ``shouldn''t be happening'''' reduce
confidence in
ZFS. They don''t improve it.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081210/9783e4a3/attachment.bin>

Miles Nordin

2008-Dec-10 21:54 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

>>>>> "re" == Richard Elling <Richard.Elling at
Sun.COM> writes:
re> ZFS will detect errors and complain about them, which results
re> in people blaming ZFS (the canary).

this is some really sketchy spin.

Sometimes you will say ZFS stores multiple copies of metadata, so even
on an unredundant pool a few unreadable sectors usually affects only a
few files, _and_ ZFS will tell you which files they are. This is the
appropriate response to corruption: notice it, while saving as much
uncorrupt data as possible. Both pieces are advertised as ZFS
features but you stress the second by saying how redundant metdata is.

Later you say ZFS is functioning as a canary when it notices
corruption and informs you by throwing away your whole pool.

Yes, I can pedantically see how sometimes it''d be better to lose a
whole pool than have files within it silently corrupted, but if you''re
comfortable living with that, it should be presented just-so to
potential users, not hidden inside this canary spin.

Let''s go with the canary analogy. Living with the current behavior
is, at best (assuming you buy the device-blaming explanations which I
don''t), more like loading up the whole mine with strategically-placed
explosives and connecting them to poison-gas detectors. If there''s
any posion gas, they destroy the entire mine. Sure, all the workers
die, but (a) they MIGHT have died anyway from the poison gas, (b) it''s
for the best because no one will mistakenly wander into the remaining
pile of rubble and be harmed by the gas that was in there, (c) you
need to have mine-collapse insurance anyway.

I don''t think the SAN corruption problems are adequately explained,
and even if the party line that they''re caused by mysterious
bit-flipping gremlins in DRAM or over FC circuits, throwing out the
whole pool isn''t an acceptable kind of warning.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081210/2862f33f/attachment.bin>

Bob Friesenhahn

2008-Dec-10 22:32 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Wed, 10 Dec 2008, Miles Nordin wrote:> The objection, to review, is that people are losing entire ZFS pools
> on SAN''s more often than UFS pools on the same SAN.  This is
It sounds like you have access to a source of information that the 
rest of us don''t have access to.  Perhaps it is a secret university 
study which is not yet published.  Can you please share this source of 
information so that we may all analyze it and draw our own 
conclusions?

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Anton B. Rang

2008-Dec-11 04:17 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

>It sounds like you have access to a source of information that the 
>rest of us don''t have access to.
I think if you read the archives of this mailing list, and compare it to the
discussions on the other Solaris mailing lists re UFS, it''s a
reasonable conclusion.
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2008-Dec-11 16:53 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Wed, 10 Dec 2008, Anton B. Rang wrote:
>> It sounds like you have access to a source of information that the
>> rest of us don''t have access to.
>
> I think if you read the archives of this mailing list, and compare 
> it to the discussions on the other Solaris mailing lists re UFS, 
> it''s a reasonable conclusion.
I don''t think drawing conclusions based on observing the zfs nerve 
center is a scientific approach for these reasons:

   * UFS is expected to fail.

   * ZFS is expected to never fail.

   * UFS has a small maximum volume size.

   * ZFS allows building massive storage pools into the hundreds of
     terrabytes and beyond.

   * UFS has only rudimentary error checking.

   * ZFS has exotic error checking.

So basically UFS volumes are small (most are 100GB or less) and when 
they fail (and someone actually notices) it is not worth mentioning 
since they were expected to eventually fail and they can easily be 
restored from backup.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Robert Milkowski

2008-Dec-11 17:28 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

Hello Anton,

Thursday, December 11, 2008, 4:17:15 AM, you wrote:
>>It sounds like you have access to a source of information that the 
>>rest of us don''t have access to.
ABR> I think if you read the archives of this mailing list, and
ABR> compare it to the discussions on the other Solaris mailing lists
ABR> re UFS, it''s a reasonable conclusion.

Well, by following that logic one might deduct that there are much
more ZFS installations than UFS - because people are talking more
about ZFS than UFS these days. But hey, I doubt it is actually true.

ZFS is very active when it comes to development and its development is
happening pretty much in public - that''s why you can see much more
problems with ZFS - but I would argue that it is mostly perception if
anything else.

-- 
Best regards,
 Robert Milkowski                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

Toby Thain

2008-Dec-11 22:24 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On 11-Dec-08, at 12:28 PM, Robert Milkowski wrote:
> Hello Anton,
>
> Thursday, December 11, 2008, 4:17:15 AM, you wrote:
>
>>> It sounds like you have access to a source of information that the
>>> rest of us don''t have access to.
>
> ABR> I think if you read the archives of this mailing list, and
> ABR> compare it to the discussions on the other Solaris mailing lists
> ABR> re UFS, it''s a reasonable conclusion.
>
> Well, by following that logic one might deduct that there are much
> more ZFS installations than UFS - because people are talking more
> about ZFS than UFS these days. But hey, I doubt it is actually true.
>
> ZFS is very active when it comes to development and its development is
> happening pretty much in public

And that perceived (or real) immaturity attracts blame (warranted or  
not).

I think we have to assume Anton was joking - otherwise his measure is  
uselessly unscientific.

--Toby
> - that''s why you can see much more
> problems with ZFS - but I would argue that it is mostly perception if
> anything else.
>
> -- 
> Best regards,
>  Robert Milkowski                            mailto:milek at task.gda.pl
>                                        http://milek.blogspot.com
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Gary Mills

2008-Dec-12 03:26 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Wed, Dec 10, 2008 at 12:58:48PM -0800, Richard Elling
wrote:> Nicolas Williams wrote:
> >On Wed, Dec 10, 2008 at 01:30:30PM -0600, Nicolas Williams wrote:
> >  
> >>On Wed, Dec 10, 2008 at 12:46:40PM -0600, Gary Mills wrote:
> >>    
> >>>On the server, a variety of filesystems can be created on this
virtual
> >>>disk.  UFS is most common, but ZFS has a number of advantages
over
> >>>UFS.  Two of these are dynamic space management and snapshots. 
There
> >>>are also a number of objections to employing ZFS in this
manner.
> >>>      
> >>ZFS has very strong error detection built-in, and for mirrored and
> >>RAID-Zed pools can recover from errors automatically as long as
there''s
> >>a mirror left or enough disks in RAID-Z left to complete the
recovery.
> >
> >Oh, but I get it: all the redundancy here would be in the SAN, and the
> >ZFS pools would have no mirrors, no RAID-Z.
> >  
> >>Note that you''ll generally be better off using RAID-Z than
HW RAID-5.
> >
> >Precisely because ZFS can reconstruct the correct data if it''s
> >responsible for redundancy.
> >
> >But note that the setup you describe puts ZFS in no worse a situation
> >than any other filesystem.
> 
> Well, actually, it does.  ZFS is susceptible to a class of failure modes
> I classify as "kill the canary" types.  ZFS will detect errors
and complain
> about them, which results in people blaming ZFS (the canary).  If you
> follow this forum, you''ll see a "kill the canary" post
about every month
> or so. 
> 
> By default, ZFS implements the policy that uncorrectable, but important
> failures may cause it to do an armadillo impression (staying with the
> animal theme ;-) but for which some other file systems, like UFS, will
> blissfully ignore -- putting data at risk.  Occasionally, arguments will
> arise over whether this is the best default policy, though most folks
> seem to agree that data corruption is a bad thing.  Later versions of
> ZFS, particularly that available in Solaris 10 10/08 and all OpenSolaris
> releases, allow system admins to have better control over these policies.
Yes, that''s what I was getting at.  Without redundancy at the ZFS
level, ZFS can report errors but not correct them.  Of course, with a
reliable SAN and storage device, those errors will never happen.
Certainly, vendors of these products will claim that they have
extremely high standards of data integrity.  Data corruption is the
worst nightmare of storage designers, after all.  It rarely happens,
although I have seen it on one occasion in a high-quality storage
device.

The split responsibility model is quite appealing.  I''d like to see
ZFS address this model.  Is there not a way that ZFS could delegate
responsibility for both error detection and correction to the storage
device, at least one more sophisticated than a physical disk?

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

Ian Collins

2008-Dec-12 03:30 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

Gary Mills wrote:> The split responsibility model is quite appealing.  I''d like to
see
> ZFS address this model.  Is there not a way that ZFS could delegate
> responsibility for both error detection and correction to the storage
> device, at least one more sophisticated than a physical disk?
>
>   Surely that removes one of ZFS''s  greatest features: end to end
checksums.  All you''d end up with is yet another volume manager.

No matter how good your SAN is, it won''t spot a flaky cable or bad RAM.

-- 
Ian.

Bob Friesenhahn

2008-Dec-12 04:41 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Thu, 11 Dec 2008, Gary Mills wrote:> The split responsibility model is quite appealing.  I''d like to
see
> ZFS address this model.  Is there not a way that ZFS could delegate
> responsibility for both error detection and correction to the storage
> device, at least one more sophisticated than a physical disk?
Why is split responsibility appealing?  In almost any complex system 
whether it be government or computing, split responsibility results in 
indecision and confusion.  Heirarchical decision making based on 
common rules is another matter entirely.  Unfortunately SAN equipment 
is still based on technology developed in the early ''80s and simply 
tries to behave like a more reliable disk drive rather than a 
participating intelligent component in a system which may detect, 
tolerate, and spontaneously correct any faults.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Richard Elling

2008-Dec-12 05:54 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

Gary Mills wrote:> On Wed, Dec 10, 2008 at 12:58:48PM -0800, Richard Elling wrote:
>   
>> Nicolas Williams wrote:
>>     
>>> On Wed, Dec 10, 2008 at 01:30:30PM -0600, Nicolas Williams wrote:
>>>  
>>>       
>>>> On Wed, Dec 10, 2008 at 12:46:40PM -0600, Gary Mills wrote:
>>>>    
>>>>         
>>>>> On the server, a variety of filesystems can be created on
this virtual
>>>>> disk.  UFS is most common, but ZFS has a number of
advantages over
>>>>> UFS.  Two of these are dynamic space management and
snapshots.  There
>>>>> are also a number of objections to employing ZFS in this
manner.
>>>>>      
>>>>>           
>>>> ZFS has very strong error detection built-in, and for mirrored
and
>>>> RAID-Zed pools can recover from errors automatically as long as
there''s
>>>> a mirror left or enough disks in RAID-Z left to complete the
recovery.
>>>>         
>>> Oh, but I get it: all the redundancy here would be in the SAN, and
the
>>> ZFS pools would have no mirrors, no RAID-Z.
>>>  
>>>       
>>>> Note that you''ll generally be better off using RAID-Z
than HW RAID-5.
>>>>         
>>> Precisely because ZFS can reconstruct the correct data if
it''s
>>> responsible for redundancy.
>>>
>>> But note that the setup you describe puts ZFS in no worse a
situation
>>> than any other filesystem.
>>>       
>> Well, actually, it does.  ZFS is susceptible to a class of failure
modes
>> I classify as "kill the canary" types.  ZFS will detect
errors and complain
>> about them, which results in people blaming ZFS (the canary).  If you
>> follow this forum, you''ll see a "kill the canary"
post about every month
>> or so. 
>>
>> By default, ZFS implements the policy that uncorrectable, but important
>> failures may cause it to do an armadillo impression (staying with the
>> animal theme ;-) but for which some other file systems, like UFS, will
>> blissfully ignore -- putting data at risk.  Occasionally, arguments
will
>> arise over whether this is the best default policy, though most folks
>> seem to agree that data corruption is a bad thing.  Later versions of
>> ZFS, particularly that available in Solaris 10 10/08 and all
OpenSolaris
>> releases, allow system admins to have better control over these
policies.
>>     
>
> Yes, that''s what I was getting at.  Without redundancy at the ZFS
> level, ZFS can report errors but not correct them.  Of course, with a
> reliable SAN and storage device, those errors will never happen.
>   
"those errors will never happen" are famous last words.  If you search
the
archives here, you will find stories of bad cables, SAN switches with 
downrev
firmware, HBA, and RAM problems which were detected by ZFS.
> Certainly, vendors of these products will claim that they have
> extremely high standards of data integrity.  Data corruption is the
> worst nightmare of storage designers, after all.  It rarely happens,
> although I have seen it on one occasion in a high-quality storage
> device.
>
> The split responsibility model is quite appealing.  I''d like to
see
> ZFS address this model.  Is there not a way that ZFS could delegate
> responsibility for both error detection and correction to the storage
> device, at least one more sophisticated than a physical disk?
>
>   
I''m not really sure what you mean by "split responsibility
model." I
think you
will find that previous designs have more (blind?) trust in the underlying
infrastructure. ZFS is designed to trust, but verify.
 -- richard

Nicolas Williams

2008-Dec-12 05:56 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Thu, Dec 11, 2008 at 09:54:36PM -0800, Richard Elling
wrote:> I''m not really sure what you mean by "split responsibility
model." I
> think you will find that previous designs have more (blind?) trust in
> the underlying infrastructure. ZFS is designed to trust, but verify.
I think he means ZFS w/ HW RAID (even if it''s SW RAID< if
it''s behind
the SAN then it''s as if it were HW RAID from ZFS'' p.o.v.).

Nico
--

Gary Mills

2008-Dec-12 19:42 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Thu, Dec 11, 2008 at 10:41:26PM -0600, Bob Friesenhahn
wrote:> On Thu, 11 Dec 2008, Gary Mills wrote:
> >The split responsibility model is quite appealing.  I''d like
to see
> >ZFS address this model.  Is there not a way that ZFS could delegate
> >responsibility for both error detection and correction to the storage
> >device, at least one more sophisticated than a physical disk?
> 
> Why is split responsibility appealing?  In almost any complex system 
> whether it be government or computing, split responsibility results in 
> indecision and confusion.  Heirarchical decision making based on 
> common rules is another matter entirely.
Now this becomes semantics.  There still has to be a hierarchy, but
it''s split into areas of responsibility.  In the case of ZFS over SAN
storage, the area boundary now is the SAN cable.
> Unfortunately SAN equipment 
> is still based on technology developed in the early ''80s and
simply
> tries to behave like a more reliable disk drive rather than a 
> participating intelligent component in a system which may detect, 
> tolerate, and spontaneously correct any faults.
That''s exactly what I''m asking.  How can ZFS and SAN equipment
be
improved so that they cooperate to make the whole system more
reliable?  Converting the SAN storage into a JBOD is not a valid
solution.

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

Gary Mills

2008-Dec-12 19:52 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Fri, Dec 12, 2008 at 04:30:51PM +1300, Ian Collins
wrote:> Gary Mills wrote:
> > The split responsibility model is quite appealing.  I''d like
to see
> > ZFS address this model.  Is there not a way that ZFS could delegate
> > responsibility for both error detection and correction to the storage
> > device, at least one more sophisticated than a physical disk?
> >
> >   
> Surely that removes one of ZFS''s  greatest features: end to end
> checksums.  All you''d end up with is yet another volume manager.
> 
> No matter how good your SAN is, it won''t spot a flaky cable or bad
RAM.
Of course it will.  There''s an error-checking protocol that runs over
the SAN cable.  Memory will detect errors as well.  There''s error
checking, or checking and correction, every step of the way.  Better
integration of all of this error checking could be an improvement,
though.

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

Ross

2008-Dec-12 20:08 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

It really comes down to how much you trust the SAN and transport technology.  If
you''re happy that you''ve got a good SAN, and you have a
transport that guarantees the integrity of the data then there''s no
reason ZFS shouldn''t be reliable.

Personally I''d be happier once some of the recovery tools that have
been discussed here have been made available, but then I don''t often
play with high end kit.  With good quality kit I suspect your risk of data loss
is pretty low.
-- 
This message posted from opensolaris.org

Nicolas Williams

2008-Dec-12 20:09 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

On Fri, Dec 12, 2008 at 01:52:54PM -0600, Gary Mills
wrote:> On Fri, Dec 12, 2008 at 04:30:51PM +1300, Ian Collins wrote:
> > No matter how good your SAN is, it won''t spot a flaky cable
or bad RAM.
> 
> Of course it will.  There''s an error-checking protocol that runs
over
> the SAN cable.  Memory will detect errors as well.  There''s error
> checking, or checking and correction, every step of the way.  Better
> integration of all of this error checking could be an improvement,
> though.
If you can fully trust the SAN then there''s no reason not to run ZFS on
top of it with no ZFS mirrors and no RAID-Z.  Yet at the same time we
see posters worried about ZFS failure modes in the face of corrupted
data.

Which is it: do you trust the SAN, yes or no?  If you do then you''re
saying that you trust your filesystems not to have any failure modes
upon SAN data corruption because you trust SAN data corruption to be
impossible.

Nico
--

Richard Elling

2008-Dec-12 20:24 UTC

head link

[zfs-discuss] Split responsibility for data with ZFS

Gary Mills wrote:> On Thu, Dec 11, 2008 at 10:41:26PM -0600, Bob Friesenhahn wrote:
>   
>> On Thu, 11 Dec 2008, Gary Mills wrote:
>>     
>>> The split responsibility model is quite appealing.  I''d
like to see
>>> ZFS address this model.  Is there not a way that ZFS could delegate
>>> responsibility for both error detection and correction to the
storage
>>> device, at least one more sophisticated than a physical disk?
>>>       
>> Why is split responsibility appealing?  In almost any complex system 
>> whether it be government or computing, split responsibility results in 
>> indecision and confusion.  Heirarchical decision making based on 
>> common rules is another matter entirely.
>>     
>
> Now this becomes semantics.  There still has to be a hierarchy, but
> it''s split into areas of responsibility.  In the case of ZFS over
SAN
> storage, the area boundary now is the SAN cable
I think I see where you are coming from.  Suppose we make an operational
definition that says a SAN is a transport for block-level data.  Then...
>> Unfortunately SAN equipment 
>> is still based on technology developed in the early ''80s and
simply
>> tries to behave like a more reliable disk drive rather than a 
>> participating intelligent component in a system which may detect, 
>> tolerate, and spontaneously correct any faults.
>>     
>
> That''s exactly what I''m asking.  How can ZFS and SAN
equipment be
> improved so that they cooperate to make the whole system more
> reliable?  Converting the SAN storage into a JBOD is not a valid
> solution.
>
>   
ZFS only knows about block devices.  It really doesn''t care if that
block
device is an IDE disk, USB disk, or something on the SAN.  If you want
ZFS to be able to repair damage that it detects, then ZFS needs to manage
the data redundancy.  If you don''t care that ZFS may not be able to
repair
damage, then don''t configure ZFS with redundancy. It really is that
simple.
The stack looks something like:

    application
    ---- read(), write(), mmap(), etc. ----
    ZFS
    ---- read(), write(), ioctl(), etc. ----
    block device

Ideally, applications would manage their data integrity, but developers
tend to let file systems or block-level systems manage data integrity.
>> No matter how good your SAN is, it won''t spot a flaky cable or
bad RAM.
>>     
>
> Of course it will.  There''s an error-checking protocol that runs
over
> the SAN cable.  Memory will detect errors as well.  There''s error
> checking, or checking and correction, every step of the way.  Better
> integration of all of this error checking could be an improvement,
> though.
>   However, there are a number of failure modes which cannot be detected
by such things.  By implementing more end-to-end checking, you can
see when your SAN switch firmware stuffs nulls into your data stream
or your disk reads the data from the wrong sector (for example).  No
matter how much reliability is built into each step of the way, you must
trust the subsystem at each step, and anecdotally, there are many 
subsystems
which cannot be trusted: disks, arrays, switches, HBAs, memory, etc.
You will find similar end-to-end design elsewhere, particularly in the
security field.
 -- richard

zfs discuss - Dec 2008 - Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS

[zfs-discuss] Split responsibility for data with ZFS