thr3ads.net - zfs discuss - [zfs-discuss] do zfs filesystems isolate corruption? [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Chris J

2007-Aug-11 11:45 UTC

[zfs-discuss] do zfs filesystems isolate corruption?

In the old days of UFS, on occasion one might create multiple file systems
(using multiple partitions) of a large LUN if filesystem corruption was a
concern.  It didn?t happen often  but filesystem corruption has happened.  So,
if filesystem X was corrupt filesystem Y would be just fine.

With ZFS, does the same logic hold true for two filesystems coming from the same
pool?

Said slightly differently, I?m assuming that if the pool becomes mangled some
how then all filesystems will be toast ? but is it possible to have one
filesystem be corrupted while the other filesystems are fine?

Hmmm, does the answer depend on if the filesystems are nested
ex: 1      /my_fs_1          /my_fs_2
ex: 2      /home_dirs    /home_dirs/chris

TIA!
 
 
This message posted from opensolaris.org

Richard L. Hamilton

2007-Aug-11 12:23 UTC

head link

[zfs-discuss] do zfs filesystems isolate corruption?

> In the old days of UFS, on occasion one might create
> multiple file systems (using multiple partitions) of
> a large LUN if filesystem corruption was a concern.
> It didn?t happen often  but filesystem corruption
> has happened.  So, if filesystem X was corrupt
>  filesystem Y would be just fine.
> 
> With ZFS, does the same logic hold true for two
> filesystems coming from the same pool?
> 
> Said slightly differently, I?m assuming that if the
> pool becomes mangled some how then all filesystems
> will be toast ? but is it possible to have one
> filesystem be corrupted while the other filesystems
> are fine?
> 
> Hmmm, does the answer depend on if the filesystems
> are nested
> ex: 1      /my_fs_1          /my_fs_2
> ex: 2      /home_dirs    /home_dirs/chris
> 
> TIA!

If they''re always consistent on-disk, and the checksumming catches
storage
subsystem errors out to almost 100% certainty, then the only corruption can
come from bugs in the code, or uncaught non-storage (i.e. CPU, memory)
bugs perhaps.

So I suppose the answer would depend on where in the code things
went astray; but that you probably could not expect any sort of isolation
or even sanity at that point; if privileged code is running amok, anything
could happen, and that would be true with two distinct ufs filesystems too,
I would think.  Perhaps one might guess that it might be more likely
for corruption not to be isolated to a single zfs filesystem (given how
lightweight a zfs filesystem is).  OTOH, since zfs catches errors other
filesystems don''t, think of how many ufs filesystems may well be
corrupt
for a very long time before causing a panic and having that get discovered
by fsck.  Ideally, if zfs code passes its test suites, you''re safer
with it than
with most anything else, even if it isn''t perfect.

But I''m way out on a limb here; no doubt the experts will correct and
amend what I''ve said...
 
 
This message posted from opensolaris.org

Blake

2007-Aug-11 15:35 UTC

head link

[zfs-discuss] do zfs filesystems isolate corruption?

Is it possible that a faulty disk controller could cause corruption to a
zpool?  I think I had this experience recently when doing a ''zpool
replace''
with both the old/new device attached to a controller that I discovered was
faulty (because I got data checksum errors, and had to dig for backups).

Blake

On 8/11/07, Richard L. Hamilton <rlhamil at smart.net>
wrote:>
> > In the old days of UFS, on occasion one might create
> > multiple file systems (using multiple partitions) of
> > a large LUN if filesystem corruption was a concern.
> > It didn''t happen often  but filesystem corruption
> > has happened.  So, if filesystem X was corrupt
> >  filesystem Y would be just fine.
> >
> > With ZFS, does the same logic hold true for two
> > filesystems coming from the same pool?
> >
> > Said slightly differently, I''m assuming that if the
> > pool becomes mangled some how then all filesystems
> > will be toast ? but is it possible to have one
> > filesystem be corrupted while the other filesystems
> > are fine?
> >
> > Hmmm, does the answer depend on if the filesystems
> > are nested
> > ex: 1      /my_fs_1          /my_fs_2
> > ex: 2      /home_dirs    /home_dirs/chris
> >
> > TIA!
>
>
> If they''re always consistent on-disk, and the checksumming catches
storage
> subsystem errors out to almost 100% certainty, then the only corruption
> can
> come from bugs in the code, or uncaught non-storage (i.e. CPU, memory)
> bugs perhaps.
>
> So I suppose the answer would depend on where in the code things
> went astray; but that you probably could not expect any sort of isolation
> or even sanity at that point; if privileged code is running amok, anything
> could happen, and that would be true with two distinct ufs filesystems
> too,
> I would think.  Perhaps one might guess that it might be more likely
> for corruption not to be isolated to a single zfs filesystem (given how
> lightweight a zfs filesystem is).  OTOH, since zfs catches errors other
> filesystems don''t, think of how many ufs filesystems may well be
corrupt
> for a very long time before causing a panic and having that get discovered
> by fsck.  Ideally, if zfs code passes its test suites, you''re
safer with
> it than
> with most anything else, even if it isn''t perfect.
>
> But I''m way out on a limb here; no doubt the experts will correct
and
> amend what I''ve said...
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070811/d6c76764/attachment.html>

Stan Seibert

2007-Aug-11 17:12 UTC

head link

[zfs-discuss] do zfs filesystems isolate corruption?

I did some tests with zfs-fuse where I created a pool with two vdevs (no mirror
or raid-z) and filled it up with files.  Then I deliberately corrupted bytes on
the vdev and scrubbed the pool to see what happened.  ZFS was able to pinpoint
exactly which files were corrupted and reported their full path, so you could in
principle go recover them from a backup.  Part of this robustness is due to the
use of ditto blocks for storing filesystem metadata, making it is very hard to
destroy directory information.

I''m not sure if that answers the question you were asking, but
generally I found that damage to a zpool was very well confined.
 
 
This message posted from opensolaris.org

Jim Dunham

2007-Aug-11 18:22 UTC

head link

[zfs-discuss] do zfs filesystems isolate corruption?

Chris,
> In the old days of UFS, on occasion one might create multiple file  
> systems (using multiple partitions) of a large LUN if filesystem  
> corruption was a concern.  It didn?t happen often  but filesystem  
> corruption has happened.  So, if filesystem X was corrupt  
> filesystem Y would be just fine.
>
> With ZFS, does the same logic hold true for two filesystems coming  
> from the same pool?
For the purposes of isolating corruption, the separation of two or  
more filesystems coming from the same ZFS storage pool does not help.  
An entire ZFS storage pool is the unit of I/O consistency, as all ZFS  
filesystems created within this single storage pool share the same  
physical storage.

When configuring a ZFS storage pool the [poor] decision of choosing a  
non-redundant (single or concatenation of disks) verses redundant  
(mirror, raidz, raidz2) storage pool, offers no means for ZFS to  
automatically recover for some forms of corruption.

Even when using a redundant storage pool, there are scenarios in  
which this is not good enough. This is when filesystem needs  
transitions into availability, such as when the loss or accessibility  
of two or more disks, causes mirroring or raidz to be ineffective.

As of Solaris Express build 68, Availability Suite [http:// 
www.opensolaris.org/os/project/avs/] is part of base Solaris,  
offering both local snapshots and remote mirrors, both of which work  
with ZFS.

Locally on a single Solaris host, snapshots of the entire ZFS storage  
pool can be taken at intervals of ones choosing, and with multiple  
snapshots of a single master, collections of snapshots, say at  
intervals of one hour, can be retained. Options allow for 100%  
independent snapshots (much like your UFS analogy above), dependent  
where only the Copy-On-Write data is retained, or compact dependent  
where the snapshots physical storage is some percentage of the master.

Remotely between to or more Solaris hosts, remote mirrors of  the  
entire ZFS storage pool can be configured, where synchronous  
replication can offer zero data loss, or asynchronous replication can  
offer near zero data loss, but both offering write-order, on disk  
consistency. A key aspect of remote replication with Availability  
Suite, is that the replicated ZFS storage pool can be quiesced on the  
remote node and accessed, or in a disaster recover scenario, take  
over instantly where the primary left off. When the primary site is  
restored, the MTTR (Mean Time To Recovery) is essentially zero, since  
Availability Suite supports on-demand pull, so yet to be replicated  
blocks are retrieved synchronously, allowing the ZFS filesystem and  
applications to be resumed without waiting for a potentially length  
resynchronization.

>
> Said slightly differently, I?m assuming that if the pool becomes  
> mangled some how then all filesystems will be toast ? but is it  
> possible to have one filesystem be corrupted while the other  
> filesystems are fine?
>
> Hmmm, does the answer depend on if the filesystems are nested
> ex: 1      /my_fs_1          /my_fs_2
> ex: 2      /home_dirs    /home_dirs/chris
>
> TIA!
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Jim Dunham
Solaris, Storage Software Group

Sun Microsystems, Inc.
1617 Southwood Drive
Nashua, NH 03063
Email: James.Dunham at Sun.COM
http://blogs.sun.com/avs

Chris J

2007-Aug-11 20:10 UTC

head link

[zfs-discuss] do zfs filesystems isolate corruption?

Thanks for the info folks.   

In addition to the 2 replies shown above I got the following  very knowledgeable
reply from Jim Dunham (for some reason it has not shown up here yet so
I''m going to paste it in).

----
Chris,

For the purposes of isolating corruption, the separation of two or more
filesystems coming from the same ZFS storage pool does not help. An entire ZFS
storage pool is the unit of I/O consistency, as all ZFS filesystems created
within this single storage pool share the same physical storage.

When configuring a ZFS storage pool the [poor] decision of choosing a
non-redundant (single or concatenation of disks) verses redundant (mirror,
raidz, raidz2) storage pool, offers no means for ZFS to automatically recover
for some forms of corruption.

Even when using a redundant storage pool, there are scenarios in which this is
not good enough. This is when filesystem needs transitions into availability,
such as when the loss or accessibility of two or more disks, causes mirroring or
raidz to be ineffective.

As of Solaris Express build 68, Availability Suite
[http://www.opensolaris.org/os/project/avs/] is part of base Solaris, offering
both local snapshots and remote mirrors, both of which work with ZFS.

Locally on a single Solaris host, snapshots of the entire ZFS storage pool can
be taken at intervals of ones choosing, and with multiple snapshots of a single
master, collections of snapshots, say at intervals of one hour, can be retained.
Options allow for 100% independent snapshots (much like your UFS analogy above),
dependent where only the Copy-On-Write data is retained, or compact dependent
where the snapshots physical storage is some percentage of the master.

Remotely between to or more Solaris hosts, remote mirrors of  the entire ZFS
storage pool can be configured, where synchronous replication can offer zero
data loss, or asynchronous replication can offer near zero data loss, but both
offering write-order, on disk consistency. A key aspect of remote replication
with Availability Suite, is that the replicated ZFS storage pool can be quiesced
on the remote node and accessed, or in a disaster recover scenario, take over
instantly where the primary left off. When the primary site is restored, the
MTTR (Mean Time To Recovery) is essentially zero, since Availability Suite
supports on-demand pull, so yet to be replicated blocks are retrieved
synchronously, allowing the ZFS filesystem and applications to be resumed
without waiting for a potentially length resynchronization.
----

Thanks Jim!
 
 
This message posted from opensolaris.org

Mike Gerdts

2007-Aug-11 23:08 UTC

head link

[zfs-discuss] do zfs filesystems isolate corruption?

On 8/11/07, Stan Seibert <volsung at mailsnare.net>
wrote:> I''m not sure if that answers the question you were asking, but
generally I found that damage to a zpool was very well confined.
But you can''t count on it.  I currently have an open case where a
zpool became corrupt and put the system into a panic loop.  As this
case has progressed, I found that the panic loop part of it is not
present in any released version of S10 tested (S10U3 + 118833-36,
125100-07, 125100-10) but does exist in snv69.

The test mechanism is whether "zpool import" (no pool name) causes the
system to panic or not.  If that happens, I''m going on the assumption
that if this causes  panic, having the appropriate zpool.cache in
place will cause it to panic during every boot.

Oddly enough, I know I can''t blame the storage subsystem on this - it
is ZFS as well.  :)

It goes like this:

HDS 99xx
T2000 primary ldom
S10u3 with a file on zfs presented as a block device for an ldom
T2000 guest ldom
zpool on slice 3 of block device mentioned above

Depending on the OS running on the guest LDOM "zpool import" gives
different results:

S10U3 118833-36 - 125100-10:
  "zpool is corrupt" "restore from backups"
S10u4 Beta, snv69 and I think snv59:
   panic - S10u4 backtrace is very different from snv*

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

zfs discuss - Aug 2007 - do zfs filesystems isolate corruption?

[zfs-discuss] do zfs filesystems isolate corruption?

[zfs-discuss] do zfs filesystems isolate corruption?

[zfs-discuss] do zfs filesystems isolate corruption?

[zfs-discuss] do zfs filesystems isolate corruption?

[zfs-discuss] do zfs filesystems isolate corruption?

[zfs-discuss] do zfs filesystems isolate corruption?

[zfs-discuss] do zfs filesystems isolate corruption?