thr3ads.net - zfs discuss - [zfs-discuss] Bitrot and panics [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Andrew

2007-Apr-15 08:02 UTC

[zfs-discuss] Bitrot and panics

IIRC, uncorrectable bitrot even in a nonessential file detected by ZFS used to
cause a kernel panic.
Bug ID 4924238 was closed with the claim that bitrot-induced panics is not a
bug, but the description did mention an open bug ID 4879357, which suggests that
it''s considered a bug after all.

Can somebody clarify the intended behavior? For example, if I''m running
Solaris in a VM, then I shut down the VM, flip a bit in the file which hosts the
disk for the VM such that a nonessential file on that disk is corrupted, and
then power up the VM and try to read that file so that ZFS detects bitrot and
there''s no mirror available to correct the bitrot, then what is
supposed to happen?
 
 
This message posted from opensolaris.org

Eric Schrock

2007-Apr-15 17:43 UTC

head link

[zfs-discuss] Bitrot and panics

On Sun, Apr 15, 2007 at 01:02:23AM -0700, Andrew wrote:> IIRC, uncorrectable bitrot even in a nonessential file detected by ZFS
> used to cause a kernel panic.  Bug ID 4924238 was closed with the
> claim that bitrot-induced panics is not a bug, but the description did
> mention an open bug ID 4879357, which suggests that it''s
considered a
> bug after all.
> 
> Can somebody clarify the intended behavior? For example, if I''m
> running Solaris in a VM, then I shut down the VM, flip a bit in the
> file which hosts the disk for the VM such that a nonessential file on
> that disk is corrupted, and then power up the VM and try to read that
> file so that ZFS detects bitrot and there''s no mirror available to
> correct the bitrot, then what is supposed to happen?
ZFS should be resilient to all forms of bit rot regardless of
underlying replication being used.  Corruption in a plain file will
result in ''zpool status -v'' reporting the file which is
corrupted.
Corruption in metadata is protected by ditto blocks, which store
multiple copies of the same block regardless of the underlying
replication.  If you manage to corrupt all three copies of data within
the SPA metadata (the MOS, or ''meta objset''), then the pool
will be
unopenable, but the machine should not panic.

Unfortunately, there is one exception to this rule.  ZFS currently does
not handle write failure in an unreplicated pool.  As part of writing
out data, it is sometimes necessary to read in space map data.  If this
fails, then we can panic due to write failure.  This is a known bug and
is being worked on.

The only way this can occur is if all three copies of a particular space
map block are corrupted, and there exist no available replicas.

Hope that helps,

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Andrew

2007-Apr-22 20:57 UTC

head link

[zfs-discuss] Re: Bitrot and panics

eschrock wrote:> Unfortunately, there is one exception to this rule. ZFS currently does
> not handle write failure in an unreplicated pool. As part of writing
> out data, it is sometimes necessary to read in space map data. If this
> fails, then we can panic due to write failure. This is a known bug and
> is being worked on.
Do you know if there''s a bug ID for this?
 
 
This message posted from opensolaris.org

Eric Schrock

2007-Apr-22 21:03 UTC

head link

[zfs-discuss] Re: Bitrot and panics

On Sun, Apr 22, 2007 at 01:57:50PM -0700, Andrew wrote:> eschrock wrote:
> > Unfortunately, there is one exception to this rule. ZFS currently does
> > not handle write failure in an unreplicated pool. As part of writing
> > out data, it is sometimes necessary to read in space map data. If this
> > fails, then we can panic due to write failure. This is a known bug and
> > is being worked on.
> 
> Do you know if there''s a bug ID for this?
It is a variant of:

6413847 vdev label write failure should be handled more gracefully

Specifically:

6393634 while syncing, some read i/o errors are not handled

The first bug needs to be fixed before the second can be addressed.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Possibly Parallel Threads

Search for more reasonably related threads

zfs discuss - Apr 2007 - Bitrot and panics

[zfs-discuss] Bitrot and panics

[zfs-discuss] Bitrot and panics

[zfs-discuss] Re: Bitrot and panics

[zfs-discuss] Re: Bitrot and panics

Possibly Parallel Threads