thr3ads.net - zfs discuss - [zfs-discuss] How recoverable is an ''unrecoverable error''? [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Uwe Dippel

2009-Apr-15 14:32 UTC

[zfs-discuss] How recoverable is an ''unrecoverable error''?

My question is related to this:

# zpool status
 pool: rpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
   attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
   using ''zpool clear'' or replace the device with
''zpool replace''.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed after 0h46m with 0 errors on Tue Apr 14 00:19:34 
2009
config:
   NAME        STATE     READ WRITE CKSUM
   rpool       ONLINE       0     0     0
     c1d0s0    ONLINE       0     0     1
errors: No known data errors

Since it is a rather new drive and has no trouble with Ubuntu, I dared 
to clean it:
# zpool clear rpool
and checked it for errors:
# zpool scrub rpool
# zpool status -v
 pool: rpool
state: ONLINE
scrub: scrub completed after 0h47m with 0 errors on Tue Apr 14 23:53:48 
2009
config:
   NAME        STATE     READ WRITE CKSUM
   rpool       ONLINE       0     0     0
     c1d0s0    ONLINE       0     0     0
errors: No known data errors

Now I wonder where that error came from. It was just a single checksum 
error. It couldn''t go away with an earlier scrub, and seemingly left no
traces of badness on the drive. Something serious? At least it looks a 
tad contradictory: "Applications are unaffected.", it is
unrecoverable,
and once cleared, there is no error left.

Curious,

Uwe

Cindy.Swearingen at Sun.COM

2009-Apr-15 15:05 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Hi Uwe,

You can use the fmdump feature to help determine whether these disk
errors are persistent.

Using fmdump -ev will provide a lot of detail but you can review
how many disks errors have occurred and for how long.

A brief description is provided here:

http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Diagnosing_Potential_Problems

Cindy

Uwe Dippel wrote:> My question is related to this:
> 
> # zpool status
>  pool: rpool
> state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>    attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>    using ''zpool clear'' or replace the device with
''zpool replace''.
>   see: http://www.sun.com/msg/ZFS-8000-9P
> scrub: scrub completed after 0h46m with 0 errors on Tue Apr 14 00:19:34 
> 2009
> config:
>    NAME        STATE     READ WRITE CKSUM
>    rpool       ONLINE       0     0     0
>      c1d0s0    ONLINE       0     0     1
> errors: No known data errors
> 
> Since it is a rather new drive and has no trouble with Ubuntu, I dared 
> to clean it:
> # zpool clear rpool
> and checked it for errors:
> # zpool scrub rpool
> # zpool status -v
>  pool: rpool
> state: ONLINE
> scrub: scrub completed after 0h47m with 0 errors on Tue Apr 14 23:53:48 
> 2009
> config:
>    NAME        STATE     READ WRITE CKSUM
>    rpool       ONLINE       0     0     0
>      c1d0s0    ONLINE       0     0     0
> errors: No known data errors
> 
> Now I wonder where that error came from. It was just a single checksum 
> error. It couldn''t go away with an earlier scrub, and seemingly
left no
> traces of badness on the drive. Something serious? At least it looks a 
> tad contradictory: "Applications are unaffected.", it is
unrecoverable,
> and once cleared, there is no error left.
> 
> Curious,
> 
> Uwe
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Richard Elling

2009-Apr-15 15:23 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Uwe Dippel wrote:> My question is related to this:
>
> # zpool status
> pool: rpool
> state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>   attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the 
> errors
>   using ''zpool clear'' or replace the device with
''zpool replace''.
>  see: http://www.sun.com/msg/ZFS-8000-9P
> scrub: scrub completed after 0h46m with 0 errors on Tue Apr 14 
> 00:19:34 2009
> config:
>   NAME        STATE     READ WRITE CKSUM
>   rpool       ONLINE       0     0     0
>     c1d0s0    ONLINE       0     0     1
> errors: No known data errors
>
> Since it is a rather new drive and has no trouble with Ubuntu, I dared 
> to clean it:
> # zpool clear rpool
> and checked it for errors:
> # zpool scrub rpool
> # zpool status -v
> pool: rpool
> state: ONLINE
> scrub: scrub completed after 0h47m with 0 errors on Tue Apr 14 
> 23:53:48 2009
> config:
>   NAME        STATE     READ WRITE CKSUM
>   rpool       ONLINE       0     0     0
>     c1d0s0    ONLINE       0     0     0
> errors: No known data errors
>
> Now I wonder where that error came from. It was just a single checksum 
> error. It couldn''t go away with an earlier scrub, and seemingly
left
> no traces of badness on the drive. Something serious? At least it 
> looks a tad contradictory: "Applications are unaffected.", it is 
> unrecoverable, and once cleared, there is no error left.
Since there are "no known data errors," it was fixed, and the scrub
should succeed without errors.  You cannot conclude that the drive
is completely free of faults using scrub, you can only test the areas
of the drive which have active data.  Or, to look at it another way,
defects in the disk which can be corrected at the file system level,
will be.

As Cindy notes, more detailed info is available in FMA.  But know
that ZFS can detect transient faults, as well as permanent faults,
almost anywhere in the data path.
 -- richard

Bob Friesenhahn

2009-Apr-15 15:33 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On Wed, 15 Apr 2009, Uwe Dippel wrote:>
> Now I wonder where that error came from. It was just a single checksum
error.
> It couldn''t go away with an earlier scrub, and seemingly left no
traces of
> badness on the drive. Something serious? At least it looks a tad 
> contradictory: "Applications are unaffected.", it is
unrecoverable, and once
> cleared, there is no error left.
Since it was not reported that user data was impacted, it seems likely 
that there was a read failure (or bad checksum) for ZFS metadata which 
is redundantly stored.  It could just as well been file data but you 
are lucky this time.

If you are worried about your individual files, then it might be wise 
to set copies=2 so that file data is duplicated, but this will consume 
more space and reduce write performance.  It is better to add a mirror 
disk if you can since the whole disk could fail.

Ubuntu Linux is unlikely to notice data problems unless the drive 
reports hard errors.  ZFS is much better at checking for errors.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Uwe Dippel

2009-Apr-15 15:38 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Richard Elling wrote:>>
>> status: One or more devices has experienced an unrecoverable error.  An
>>   attempt was made to correct the error.  Applications are unaffected.
>>   NAME        STATE     READ WRITE CKSUM
>>   rpool       ONLINE       0     0     0
>>     c1d0s0    ONLINE       0     0     1
>> errors: No known data errors
>>
>> # zpool clear rpool
>> # zpool status -v
>> pool: rpool
>> state: ONLINE
>> scrub: scrub completed after 0h47m with 0 errors on Tue Apr 14 
>> 23:53:48 2009
>> config:
>>   NAME        STATE     READ WRITE CKSUM
>>   rpool       ONLINE       0     0     0
>>     c1d0s0    ONLINE       0     0     0
>> errors: No known data errors
>>
>> Now I wonder where that error came from. It was just a single 
>> checksum error. It couldn''t go away with an earlier scrub, and
>> seemingly left no traces of badness on the drive. Something serious? 
>> At least it looks a tad contradictory: "Applications are 
>> unaffected.", it is unrecoverable, and once cleared, there is no 
>> error left.
>
> Since there are "no known data errors," it was fixed, and the
scrub
> should succeed without errors.  You cannot conclude that the drive
> is completely free of faults using scrub, you can only test the areas
> of the drive which have active data.
I didn''t conclude that.
I conclude, when an ''unrecoverable error'' is found, that
''zpool clear''
cannot recover it. Still, there was one one CHSUM error before, and it 
wouldn''t go away before the ''clear''; while after the
''clear'' even that
one would disappear.>
>
> As Cindy notes, more detailed info is available in FMA.  But know
> that ZFS can detect transient faults, as well as permanent faults,
> almost anywhere in the data path.
So this is the respective output:

Feb 16 2009 23:18:47.848442332 ereport.io.scsi.cmd.disk.dev.uderr
nvlist version: 0
    class = ereport.io.scsi.cmd.disk.dev.uderr
    ena = 0xd0dd396561a00001
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = dev
        device-path = /pci at 0,0/pci1565,3409 at 4,1/storage at 4/disk at 0,0
        devid = id1,sd at f00551e8c4980493b000551a00000
    (end detector)

    driver-assessment = fail
    op-code = 0x1a
    cdb = 0x1a 0x0 0x8 0x0 0x18 0x0
    pkt-reason = 0x0
    pkt-state = 0x1f
    pkt-stats = 0x0
    stat-code = 0x0
    un-decode-info = sd_get_write_cache_enabled: Mode Sense caching page 
code mismatch 0

    un-decode-value     __ttl = 0x1
    __tod = 0x499983d7 0x329233dc

Mar 27 2009 22:27:42.314752029 ereport.fs.zfs.checksum
nvlist version: 0
    class = ereport.fs.zfs.checksum
    ena = 0xb393a3ba200001
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = zfs
        pool = 0xf6bd78c1d3b3c878
        vdev = 0x38287e797d1642bc
    (end detector)

    pool = rpool
    pool_guid = 0xf6bd78c1d3b3c878
    pool_context = 0
    pool_failmode = continue
    vdev_guid = 0x38287e797d1642bc
    vdev_type = disk
    vdev_path = /dev/dsk/c2d0s0
    vdev_devid = id1,cmdk at AWDC_WD6400AAKS-65A7B0=_____WD-WMASY4847131/a
    parent_guid = 0xf6bd78c1d3b3c878
    parent_type = root
    zio_err = 50
    zio_offset = 0x13a4c00000
    zio_size = 0x20000
    zio_objset = 0x13f
    zio_object = 0x20ff4
    zio_level = 0
    zio_blkid = 0xa
    __ttl = 0x1
    __tod = 0x49cce25e 0x12c2bc1d

Apr 13 2009 21:29:35.739718381 ereport.fs.zfs.checksum
nvlist version: 0
    class = ereport.fs.zfs.checksum
    ena = 0xb6afed32000001
    detector = (embedded nvlist)
    nvlist version: 0
        version = 0x0
        scheme = zfs
        pool = 0xf6bd78c1d3b3c878
        vdev = 0x38287e797d1642bc
    (end detector)

    pool = rpool
    pool_guid = 0xf6bd78c1d3b3c878
    pool_context = 0
    pool_failmode = continue
    vdev_guid = 0x38287e797d1642bc
    vdev_type = disk
    vdev_path = /dev/dsk/c1d0s0
    vdev_devid = id1,cmdk at AWDC_WD6400AAKS-65A7B0=_____WD-WMASY4847131/a
    parent_guid = 0xf6bd78c1d3b3c878
    parent_type = root
    zio_err = 50
    zio_offset = 0x421660000
    zio_size = 0x20000
    zio_objset = 0x107
    zio_object = 0x38dbf
    zio_level = 0
    zio_blkid = 0x4
    __ttl = 0x1
    __tod = 0x49e33e3f 0x2c1734ed

#

So I had not that many errors in the last 2 months: 3.
I''m sorry, but my question remains unanswered: Where did the 
unrecoverable error come from, and how it could go away?

Uwe

Uwe Dippel

2009-Apr-15 15:49 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Bob Friesenhahn wrote:>
> Since it was not reported that user data was impacted, it seems likely 
> that there was a read failure (or bad checksum) for ZFS metadata which 
> is redundantly stored.
(Maybe I am too much of a linguist to not stumble over the wording 
here.) If it is ''redundant'', it is
''recoverable'', am I right? Why, if
this is the case, does scrub not recover it, and scrub even fails to 
correct the CKSUM error as long as it is flagged
''unrecoverable'', but
can do exactly that after the ''clear'' command?
>
> Ubuntu Linux is unlikely to notice data problems unless the drive 
> reports hard errors.  ZFS is much better at checking for errors.
No doubt. But ext3 also seems to need much less attention, very much 
fewer commands. Which leaves it as a viable alternative. I still hope 
that one day ZFS will be maintainable as simple as ext3; respectively do 
all that maintenance on its own.  :)

Uwe

Blake

2009-Apr-15 16:18 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On Wed, Apr 15, 2009 at 11:49 AM, Uwe Dippel <udippel at gmail.com>
wrote:> Bob Friesenhahn wrote:
>>
>> Since it was not reported that user data was impacted, it seems likely
>> that there was a read failure (or bad checksum) for ZFS metadata which
is
>> redundantly stored.
>
> (Maybe I am too much of a linguist to not stumble over the wording here.)
If
> it is ''redundant'', it is ''recoverable'',
am I right? Why, if this is the
> case, does scrub not recover it, and scrub even fails to correct the CKSUM
> error as long as it is flagged ''unrecoverable'', but can
do exactly that
> after the ''clear'' command?
>
>>
>> Ubuntu Linux is unlikely to notice data problems unless the drive
reports
>> hard errors. ?ZFS is much better at checking for errors.
>
> No doubt. But ext3 also seems to need much less attention, very much fewer
> commands. Which leaves it as a viable alternative. I still hope that one
day
> ZFS will be maintainable as simple as ext3; respectively do all that
> maintenance on its own. ?:)
>
> Uwe
You only need to decide what you want here.  Yes, ext3 requires less
maintenance, because it can''t tell you if a block becomes corrupt
(though fsck-in when that *does* happen can require hours, compared to
zfs replacing with a good block from the other half of your mirror).

ZFS can *fully* do it''s job only when it has several copies of blocks
to choose from.  Since you have only one disk here, ZFS can only say
''hey, your checksum for this block is bad - sorry''.  ext3
might do the
same thing, though only if you tried to use the block with an
application that knew what the block was supposed to look like.

That said, I think your comments raise a valid point that ZFS could be
a little easier for individuals to use.  I totally understand why Sun
doesn''t focus on end-user management tools (not their market) - on the
other hand, the code is out there, so if you see a problem, get some
people together to write some management tools! :)

Fajar A. Nugraha

2009-Apr-15 17:05 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On Wed, Apr 15, 2009 at 10:49 PM, Uwe Dippel <udippel at gmail.com>
wrote:> Bob Friesenhahn wrote:
>>
>> Since it was not reported that user data was impacted, it seems likely
>> that there was a read failure (or bad checksum) for ZFS metadata which
is
>> redundantly stored.
>
> (Maybe I am too much of a linguist to not stumble over the wording here.)
If
> it is ''redundant'', it is ''recoverable'',
am I right?
Looking at the message link http://www.sun.com/msg/ZFS-8000-9P
"
Description
    A device has experienced uncorrectable errors in a replicated
configuration.
"

Which means the particular block on that device has "uncorrectable
errors" (read failure or bad checksum, as Bob pointed out). Possible
due to bad sector. So the "uncorrectable" refer to that particular
data block or device.

In your case since the error is most likely on zfs metadata (which is
automatically stored redundantly), zfs is able to read from the
redundant copy and replace the bad metadata.

The same thing would also happen if the error is on user data that has
redundancy through either :
- mirror or raidz vdev, or
- copies=2 (or more)

Had the error occured on user data, on a non-mirrored non-raidz pool,
with copies=1 (the default), you would''ve got
http://www.sun.com/msg/ZFS-8000-8A
> Why, if this is the
> case, does scrub not recover it
it does, since on this case the data is stored redundantly.
>, and scrub even fails to correct the CKSUM
> error as long as it is flagged ''unrecoverable'', but can
do exactly that
> after the ''clear'' command?
"clear" simply resets the error counter back to 0.
On your next run, the bad block is probably still unused. Since zfs
scrub only checks used blocks, the bad block is not checked. Thus
giving the impression "the error has magically go away", when in fact
you can re-experience it if the bad block is reused later.
>
>>
>> Ubuntu Linux is unlikely to notice data problems unless the drive
reports
>> hard errors. ?ZFS is much better at checking for errors.
>
> No doubt. But ext3 also seems to need much less attention, very much fewer
> commands. Which leaves it as a viable alternative. I still hope that one
day
> ZFS will be maintainable as simple as ext3; respectively do all that
> maintenance on its own. ?:)
>
In a sense zfs already "do all maintenance on its own" the same way
ext3 does:
- both stored metadata (superblock on ext3) redundantly
- both can recover cleanly from unclean shutdown (power failure, etc.)
- on both filesystem, had a bad sector occured on non-redundant data,
you''ll simply unable to access it.

You can mimic the "ignore error" behavior of ext3 somewhat by setting
checksum=off. Not recommended, but a usable setting if you already had
redundancy at lower level (e.g. hardware raid) and you trust it
completely.

Regards,

Fajar

Richard Elling

2009-Apr-15 17:42 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Uwe Dippel wrote:> Richard Elling wrote:
>>>
>>> status: One or more devices has experienced an unrecoverable error.
An
>>>   attempt was made to correct the error.  Applications are
unaffected.
>>>   NAME        STATE     READ WRITE CKSUM
>>>   rpool       ONLINE       0     0     0
>>>     c1d0s0    ONLINE       0     0     1
>>> errors: No known data errors
>>>
>>> # zpool clear rpool
>>> # zpool status -v
>>> pool: rpool
>>> state: ONLINE
>>> scrub: scrub completed after 0h47m with 0 errors on Tue Apr 14 
>>> 23:53:48 2009
>>> config:
>>>   NAME        STATE     READ WRITE CKSUM
>>>   rpool       ONLINE       0     0     0
>>>     c1d0s0    ONLINE       0     0     0
>>> errors: No known data errors
>>>
>>> Now I wonder where that error came from. It was just a single 
>>> checksum error. It couldn''t go away with an earlier scrub,
and
>>> seemingly left no traces of badness on the drive. Something
serious?
>>> At least it looks a tad contradictory: "Applications are 
>>> unaffected.", it is unrecoverable, and once cleared, there is
no
>>> error left.
>>
>> Since there are "no known data errors," it was fixed, and the
scrub
>> should succeed without errors.  You cannot conclude that the drive
>> is completely free of faults using scrub, you can only test the areas
>> of the drive which have active data.
>
> I didn''t conclude that.
Could you propose alternate wording?
> I conclude, when an ''unrecoverable error'' is found, that
''zpool clear''
> cannot recover it. 
ZFS did recover, which is why it says "no known data errors."
If the data was not recoverable, then it would show you which
file was affected.  Perhaps the confusion is the layer which is
reporting the bad data?  In the fmdump output, there is a ZFS
checksum mismatch detected.  It is unclear why there is a
mismatch because there was not a corresponding error event
logged by the disk driver. What ZFS knows is that the data it
read did not match the data it wrote.  So ZFS repaired the data.
Since ZFS is a COW architecture, the repair would involve
writing the corrected data elsewhere.
> Still, there was one one CHSUM error before, and it wouldn''t go
away
> before the ''clear''; while after the
''clear'' even that one would
> disappear.
Clear just resets the counters.
 -- richard

Jens Elkner

2009-Apr-16 03:14 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On Wed, Apr 15, 2009 at 10:32:13PM +0800, Uwe Dippel wrote:
  > status: One or more devices has experienced an unrecoverable error.  An
>   attempt was made to correct the error.  Applications are unaffected.
...> errors: No known data errors
> 
> Now I wonder where that error came from. It was just a single checksum 
Hmmm, had ~ 2 weeks ago also a curious thing with an StorEdge 3510
(2x2Gbps FC MP, 1 Controller, 2x6HDDs mirrored and exported as a
single device, no ZIL etc. tricks) connected to a X4600:

Since grill party time has started, the 3510 decided at a room temp of
33?C to go "offline" and take part on the party ;-). Result was that
during the offline time everything blocked (i.e. didn''t got timeout or
error), which tried to access a ZFS on that pool - wrt. the POV more or
less expected. After the 3510 came back, a ''zpool status ..''
showed
something like this:

        NAME                                     STATE    READ WRITE CKSUM
        pool2                                    FAULTED  289K 4.03M 0
          c4t600C0FF000000000099C790E0144EC00d0  FAULTED  289K 4.03M 0  too many
errors

errors: Permanent errors have been detected in the following files:

        pool2/home/stud/inf/foobar:<0x0>

Still everything was blocking. After a ''zpool clear'' all ZFS (
~ 2300 on
that pool) expect the listed one were accessable, but the status message
kept unchanged. Curious, thought that blocking/waiting for the device to
come back and the ZFS transaction stuff is actually made for a situation
like this, aka "re-commit" un-ACKed actions ...
Anyway, finally scrubbing the pool brought it back to normal ONLINE state
without any errors.  To be sure I compared the ZFS in question with the
backup from some hours ago - no difference. So same question made in the
subject.

BTW: Some days later we had an even bigger grill party  (~ 38?C) - this
time the X4xxx machines in this room decided to go offline and take part
as well (v4xx''s kept running ;-)).
So first the 3510 and some time later the X4600. This time the pool
was after going back online in DEGRADED state, had some more errors like
the above one and:

        <metadata>:<0x103>
        <metadata>:<0x4007>
		...

Clearing and scrubbing it brought it again back to normal ONLINE state
without any errors. Spot check on the noted files with errors showed
no damage ...

Everything nice (wrt. data loss), but curious ...

Regards,
jel.
-- 
Otto-von-Guericke University     http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany         Tel: +49 391 67 12768

Uwe Dippel

2009-Apr-16 09:38 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On Thu, Apr 16, 2009 at 1:05 AM, Fajar A. Nugraha <fajar at fajar.net>
wrote:

[...]

Thanks, Fajar, et al.

What this thread actually shows, alas, is that ZFS is rocket science.
In 2009, one would expect a file system to ''just work''. Why
would
anyone want to have to ''status'' it regularly, in case
''scrub'' it, and
if scrub doesn''t do the trick (and still not knowing how serious the
''unrecoverable error'' is - like in this case),
''clear'' it, ''scrub''
again, followed by another ''status'', or even a more advanced
fmdump
-eV to see all hex values in there (and leave it to the interpretation
of unknown what those actually are), and hope it will still make it;
and in the end getting the suggestion to ''add another disk for
RAID''.
Serious, guys and girls, I am pretty glad that I still run my servers
on OpenBSD (despite all temptations to change to OpenSolaris), where I
can ''boot and forget'' about them until a patch requires my
action. If
I can''t trust the metadata of a pool (which might disappear completely
or not, as we had to learn in here), and have to manually do all the
tasks further up, or write a script to do that for me (and how shall I
do that, if even in here seemingly an unrecoverable error can be
recovered and no real explanation is forthcoming), by all means, this
is a dead-born project; with all due respect that I as an engineer of
30 years have for you guys. I do guess and believe that ZFS is so much
better as filesystem than any other, honestly. But the history of
engineering has seen the best items fail, because their advanced
features completely bypassed the market-place and its psychologies.
Even I myself as an avid and responsible system administrator, I am
not sure that I wanted to read 30+ pages of commands and explanations
of ZFS-messages and comments:
http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide
In the end, I don''t feel like reading kernel code neither. Both kernel
and file system simply need to do the job. And if they tend to fall
over for lack of maintenance (that is manual control and
configuration), they are useless in the real world. Yes, some will
reiterate that with ZFS I can be sure to have 100% consistent data.
That''s all hunky dory. But we here simply cannot afford the huge
effort that is seemingly required therefore. And in 99%+ of the cases,
a very standard and easily handled FFS/UFS with RAID and backup will
just do, as much as I personally feel how great of a step ZFS is in
principle.

Uwe

Mattias Pantzare

2009-Apr-16 10:53 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On Thu, Apr 16, 2009 at 11:38, Uwe Dippel <udippel at gmail.com>
wrote:> On Thu, Apr 16, 2009 at 1:05 AM, Fajar A. Nugraha <fajar at
fajar.net> wrote:
>
> [...]
>
> Thanks, Fajar, et al.
>
> What this thread actually shows, alas, is that ZFS is rocket science.
> In 2009, one would expect a file system to ''just work''.
Why would
> anyone want to have to ''status'' it regularly, in case
''scrub'' it, and
> if scrub doesn''t do the trick (and still not knowing how serious
the
> ''unrecoverable error'' is - like in this case),
''clear'' it, ''scrub''
You don not have to status it regularly if you don''t want to. Just as
with any other file system. The difference is that you can. Just as
you can and should do on your RAID system that you use with any other
file system.

If you do not have any problems ZFS will just work. If you have
problems ZFS will ?how them to you much better than EXT3, FFS, UFS or
other traditional filesystem. And often fix them for you. In many
cases you would get corrupted data or have to run fsck for the same
error on FFS/UFS.

Scrub is much nicer than fsck, it is not easy to know the best answer
to the questons that fsck will give you if you have a serious metadata
problem on FFS/UFS. And yes, you can get into trouble even on OpenBSD.

You also have to look at the complexity of your volume manager as ZFS
is both a filesystem and volume manager in one.

Casper.Dik at Sun.COM

2009-Apr-16 10:59 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

>If you do not have any problems ZFS will just work. If you have
>problems ZFS will =B6how them to you much better than EXT3, FFS, UFS or
>other traditional filesystem. And often fix them for you. In many
>cases you would get corrupted data or have to run fsck for the same
>error on FFS/UFS.
As most data is "file data" none of the other filesystems would detect
an
error.  But the file is still corrupted.
>Scrub is much nicer than fsck, it is not easy to know the best answer
>to the questons that fsck will give you if you have a serious metadata
>problem on FFS/UFS. And yes, you can get into trouble even on OpenBSD.

Of course, if your memory is bad, you could see a transient error during a 
scrub.

Casper

Bob Friesenhahn

2009-Apr-16 17:26 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On Thu, 16 Apr 2009, Uwe Dippel wrote:>
> What this thread actually shows, alas, is that ZFS is rocket science.
> In 2009, one would expect a file system to ''just work''.
Why would
> anyone want to have to ''status'' it regularly, in case
''scrub'' it, and
For common uses, ZFS is not any more complicated than your ephemeral 
gmail.com email account but it seems that you have figured that out 
just fine.  Good for you.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Tim

2009-Apr-16 19:32 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On Thu, Apr 16, 2009 at 12:26 PM, Bob Friesenhahn <
bfriesen at simple.dallas.tx.us> wrote:
> On Thu, 16 Apr 2009, Uwe Dippel wrote:
>
>>
>> What this thread actually shows, alas, is that ZFS is rocket science.
>> In 2009, one would expect a file system to ''just
work''. Why would
>> anyone want to have to ''status'' it regularly, in case
''scrub'' it, and
>>
>
> For common uses, ZFS is not any more complicated than your ephemeral
> gmail.com email account but it seems that you have figured that out just
> fine.  Good for you.
>
> Bob
> --
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>

I can''t say I''ve ever had to translate binary to recover an
email from the
trash bin with Gmail... which is for "common users".  Unless of course
you''re suggesting "common users" will never want to recover a
file after zfs
alerts them it''s corrupted.

He''s got a very valid point, and the responses are disheartening at
best.
Just because other file systems don''t detect the corruption, or require
lots
of work to recover, does not make it OK for zfs to do the same.  Excuses are
just that, excuses.  He isn''t asking for an excuse, he''s
asking for an
answer.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090416/648953a1/attachment.html>

Richard Elling

2009-Apr-16 19:41 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Tim wrote:> I can''t say I''ve ever had to translate binary to recover
an email from
> the trash bin with Gmail... which is for "common users".  Unless
of
> course you''re suggesting "common users" will never want
to recover a
> file after zfs alerts them it''s corrupted. 
>
> He''s got a very valid point, and the responses are disheartening
at
> best.  Just because other file systems don''t detect the
corruption, or
> require lots of work to recover, does not make it OK for zfs to do the 
> same.  Excuses are just that, excuses.  He isn''t asking for an
excuse,
> he''s asking for an answer.
Excuses?  I did sense an issue with terminology and messaging, but
there are no excuses here.  ZFS detected a problem. The problem did
not affect his data, as it was recovered.

I''d like to reiterate here that if you can think of a better way to
communicate with people, then please file a bug. Changes in
messages and docs tend to be much easier than changes in logic.

P.S. don''t shoot the canary!
 -- richard

Florian Ermisch

2009-Apr-16 21:27 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Uwe Dippel schrieb:> Bob Friesenhahn wrote:
>>
>> Since it was not reported that user data was impacted, it seems likely 
>> that there was a read failure (or bad checksum) for ZFS metadata which 
>> is redundantly stored.
> 
> (Maybe I am too much of a linguist to not stumble over the wording 
> here.) If it is ''redundant'', it is
''recoverable'', am I right? Why, if
> this is the case, does scrub not recover it, and scrub even fails to 
> correct the CKSUM error as long as it is flagged
''unrecoverable'', but
> can do exactly that after the ''clear'' command?
> 
>>
>> Ubuntu Linux is unlikely to notice data problems unless the drive 
>> reports hard errors.  ZFS is much better at checking for errors.
> 
> No doubt. But ext3 also seems to need much less attention, very much 
> fewer commands. Which leaves it as a viable alternative. I still hope 
> that one day ZFS will be maintainable as simple as ext3; respectively do 
> all that maintenance on its own.  :)Ext3 has no (optional) redundancy by using more than one disc and no
volume managment. You need Device Mapper for redundancy (Multiple
Devices or Linux Volume Management) and volume management (LVM again).
If you want such features on Linux Ext3 is the top of at least 2,
probably 3 layers of storage managment.
Should I add NFS, CIFS and iSCSI exports or the needlessness of resizing
volumes?

You''re comparing a single tool with a whole production line.
Sorry for the flaming but yesterday I spend 4 additional hours at work
with recovery of a xen server with a single error somewhere in it''s LVM
causing the virtual servers to freeze.
> 
> Uwe
Kind Regards, FLorian> 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Drew Balfour

2009-Apr-16 22:15 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

>>>> Now I wonder where that error came from. It was just a single 
>>>> checksum error. It couldn''t go away with an earlier
scrub, and
>>>> seemingly left no traces of badness on the drive. Something
serious?
>>>> At least it looks a tad contradictory: "Applications are 
>>>> unaffected.", it is unrecoverable, and once cleared, there
is no
>>>> error left.
What happens if you rescrub the pool after clearing the errors? If zfs has
reused whatever was causing the issue, then
it shouldn''t be surprising that the error will show up again.
> Could you propose alternate wording?
My $.02, but the wording in the error message is rather obtuse.
"Unrecoverable error" indicates to me that something was
lost; technically this is true, but zfs was able to replicate the data from
another source. This is not all that clear
from the error:

status: One or more devices has experienced an unrecoverable error.  An
         attempt was made to correct the error.  Applications are unaffected.

This doesn''t indicate if the attempt was successful or not. We all know
it was, because if it wasn''t, we''d
(a) see another error instead and/or (b) see something other than "errors:
No known data errors". But, unless you know
zfs well enough to make that leap, you''re left wondering what actually
happened.

Granted, the ''verbose'' error page
(http://www.sun.com/msg/ZFS-8000-9P) does a much better job of explaining.
However,
confusing terse error messages are never good, and asking the user to go look
stuff up in order to understand isn''t good
either. Also, the verbose error page also doesn''t explain that despite
not having  a replicated configuration, metadata
is replicated and so errors can be recovered from a seemingly
''unrecoverable'' state.

Does anyone know why it''s "applications" and not
"data"?

Perhaps something like:

status: One or more devices has experienced an error. A successful attempt to
         correct the error was made using a replicated copy of the data.
         Data on the pool is unaffected.


-Drew

Toby Thain

2009-Apr-16 23:13 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On 16-Apr-09, at 5:27 PM, Florian Ermisch wrote:
> Uwe Dippel schrieb:
>> Bob Friesenhahn wrote:
>>>
>>> Since it was not reported that user data was impacted, it seems  
>>> likely that there was a read failure (or bad checksum) for ZFS  
>>> metadata which is redundantly stored.
>> (Maybe I am too much of a linguist to not stumble over the wording  
>> here.) If it is ''redundant'', it is
''recoverable'', am I right? Why,
>> if this is the case, does scrub not recover it, and scrub even  
>> fails to correct the CKSUM error as long as it is flagged  
>> ''unrecoverable'', but can do exactly that after the
''clear'' command?
>>>
>>> Ubuntu Linux is unlikely to notice data problems unless the drive  
>>> reports hard errors.  ZFS is much better at checking for errors.
>> No doubt. But ext3 also seems to need much less attention, very  
>> much fewer commands. Which leaves it as a viable alternative. I  
>> still hope that one day ZFS will be maintainable as simple as  
>> ext3; respectively do all that maintenance on its own.  :)
> Ext3 has no (optional) redundancy by using more than one disc and no
> volume managment. You need Device Mapper for redundancy (Multiple
> Devices or Linux Volume Management) and volume management (LVM again).

And you''ll still be lacking checksumming and self healing.

--Toby

> If you want such features on Linux Ext3 is the top of at least 2,
> probably 3 layers of storage managment.
> Should I add NFS, CIFS and iSCSI exports or the needlessness of  
> resizing
> volumes?
>
> You''re comparing a single tool with a whole production line.
> Sorry for the flaming but yesterday I spend 4 additional hours at work
> with recovery of a xen server with a single error somewhere in
it''s
> LVM
> causing the virtual servers to freeze.
>
>> Uwe
>
> Kind Regards, FLorian
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Uwe Dippel

2009-Apr-16 23:39 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Drew Balfour wrote:>
> Does anyone know why it''s "applications" and not
"data"?
>
> Perhaps something like:
>
> status: One or more devices has experienced an error. A successful 
> attempt to
>         correct the error was made using a replicated copy of the data.
>         Data on the pool is unaffected.
>
If it was (successful), that would have been something. It wasn''t. 
''status'' brought up the ''unrecoverable
error'', whatever number of
''scrub''s I did. Toby: ''self-healing'' is
fine, but that message simply
sounds scary, and worse: it doesn''t propose any further sort of action 
and its consequences.
"Determine if the device needs to be replaced, and clear the errors
  using ''zpool clear'' or replace the device with
''zpool replace''. "
This does sound scary, at least to me. How to ''determine if the device 
needs to be replaced''?
Should I ''clear'' or ''replace''?
In the end, it needed a ''clear'' and that one CKSUM error went
away. As
it seems without further consequences and a fully sane disk.
Don''t call that ''self-healing''. This is an arcane
method demanding
plenty of user activity, interaction, reading-up, etc.

Yes, Richard, you are correct, linguistically. There was an 
unrecoverable error in a layer not affecting the layer containing the 
data. Telling ZFS to replace some metadata with correct ones resolve the 
- probably - non-existent problem. This reminds me of vfat, with its 
mirror-FAT. Wouild I want to read about an ''unrecoverable
error'' when
the mirror is needed? probably not. And if, then I wouldn''t want to
have
to type ''clear''.
And surely I wouldn''t want to wait until I typed
''status'' until I am
made aware of the existence of an unrecoverable error, would I!

It seems most in here don''t run production servers. A term like 
''unrecoverable'' sends me into a state of frenzy. It sounds
like my
systems are dying any minute. From what I read, it is harmless. Some 
redundant metadata could not be retrieved. If this was the case, Toby, I 
wouldn''t want to have to type anything. I''d rather have the
system
detecting the situation on its own accord, trying the redundant metadata 
(we do have snapshots, don''t we!), and scrub on its very own. At the 
end, a mail to root would be in order, informing me that an error has 
been corrected and no data compromised at all. Thank you, ZFS!
That''s what I''d call ''self-healing'' and
21-st century.

Uwe

Bob Friesenhahn

2009-Apr-17 00:38 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On Fri, 17 Apr 2009, Uwe Dippel wrote:
> It seems most in here don''t run production servers. A term like 
> ''unrecoverable'' sends me into a state of frenzy. It
sounds like my systems
> are dying any minute. From what I read, it is harmless. Some redundant
While your system is still running and user data has not been 
compromised, the issue is not necessary harmless since it may be that 
your hard drive is on a path to failure.  Continuing data loss usually 
indicates a failing hard drive.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Robert Milkowski

2009-Apr-17 00:54 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Hello Uwe,

Thursday, April 16, 2009, 10:38:00 AM, you wrote:

UD> On Thu, Apr 16, 2009 at 1:05 AM, Fajar A. Nugraha <fajar at
fajar.net> wrote:

UD> [...]

UD> Thanks, Fajar, et al.

UD> What this thread actually shows, alas, is that ZFS is rocket science.
UD> In 2009, one would expect a file system to ''just work''.
Why would
UD> anyone want to have to ''status'' it regularly, in case
''scrub'' it, and
UD> if scrub doesn''t do the trick (and still not knowing how serious
the
UD> ''unrecoverable error'' is - like in this case),
''clear'' it, ''scrub''
UD> again, followed by another ''status'', or even a more
advanced fmdump
UD> -eV to see all hex values in there (and leave it to the interpretation
UD> of unknown what those actually are), and hope it will still make it;
UD> and in the end getting the suggestion to ''add another disk for
RAID''.
UD> Serious, guys and girls, I am pretty glad that I still run my servers
UD> on OpenBSD (despite all temptations to change to OpenSolaris), where I
UD> can ''boot and forget'' about them until a patch requires
my action. If
UD> I can''t trust the metadata of a pool (which might disappear
completely
UD> or not, as we had to learn in here), and have to manually do all the
UD> tasks further up, or write a script to do that for me (and how shall I
UD> do that, if even in here seemingly an unrecoverable error can be
UD> recovered and no real explanation is forthcoming), by all means, this
UD> is a dead-born project; with all due respect that I as an engineer of

With all due respect but you don''t understand how zfs works.
With your ext3 or whatever you use on OpenBSD if your system will
end-up with some corrupt data being returned from one of a disks in a
mirror you will get:

       - some of your data silently corrupted, and/or
       - file system will require fsck but it won''t fix user data if
       affected, and/or
       - os will panic, and/or
       - you loose more or all your data in a file system

With zfs in such a case everything will work fine and all application
will get *PROPER* data and corrupted block will be automatically fixed.
That''s what happened to you. You don''t have to do anything and
it will
just work.

Now, zfs wil not only returned proper data to your applications and
fixed a corrupted block but it also reported it to you via zpool
status output. You can do ''zpool clear'' in order to
acknowledge that
above has happened or you can leave it as it is, other than it being
an information of the above case you don''t have to do anything.


In summary - if you want to put it live and forget entirely, fine, do
it and it will work as expected and in cases of some data being
returned from one disk in a mirror it will be automatically fixed and
proper data will be returned. While on your OpenBSD there will be
serious consequences if one of disks returned bad data.


I don''t understand why you''re complaining about zfs reporting
to you
that you might have an issue - you do not need to read the report or
do anything if you don''t want to, or if you really value your data you
might investigate what''s going on until it is too late, while in a
mean time zfs provides your applications with correct data.

-- 
Best regards,
 Robert Milkowski
                                       http://milek.blogspot.com

Robert Milkowski

2009-Apr-17 00:58 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Hello Richard,

Thursday, April 16, 2009, 8:41:53 PM, you wrote:

RE> Tim wrote:>> I can''t say I''ve ever had to translate binary to
recover an email from
>> the trash bin with Gmail... which is for "common users". 
Unless of
>> course you''re suggesting "common users" will never
want to recover a
>> file after zfs alerts them it''s corrupted. 
>>
>> He''s got a very valid point, and the responses are
disheartening at
>> best.  Just because other file systems don''t detect the
corruption, or
>> require lots of work to recover, does not make it OK for zfs to do the 
>> same.  Excuses are just that, excuses.  He isn''t asking for an
excuse,
>> he''s asking for an answer.
RE> Excuses?  I did sense an issue with terminology and messaging, but
RE> there are no excuses here.  ZFS detected a problem. The problem did
RE> not affect his data, as it was recovered.

RE> I''d like to reiterate here that if you can think of a better way
to
RE> communicate with people, then please file a bug. Changes in
RE> messages and docs tend to be much easier than changes in logic.

RE> P.S. don''t shoot the canary!

I suspect that Uwe thought that unless he do ''zpool clear''
there was
something wrong and that it is required to do so. Well, actually not -
it''s only an information that corruption happened but thanks to
redundancy, checksums and zfs applications got *correct* data and
corrupted data was fixed. zpool clear is only to "reset" statistics os
such errors and nothing more, and one doesn''t even have to bother
checking for it if that someone don''t care about being pro-active to
possible future failure.

-- 
Best regards,
 Robert Milkowski
                                       http://milek.blogspot.com

Robert Milkowski

2009-Apr-17 01:08 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Hello Blake,

Wednesday, April 15, 2009, 5:18:19 PM, you wrote:


B> You only need to decide what you want here.  Yes, ext3 requires less
B> maintenance, because it can''t tell you if a block becomes corrupt
B> (though fsck-in when that *does* happen can require hours, compared to
B> zfs replacing with a good block from the other half of your mirror).

I can''t agree that ext3 requires less maintenance, actually it is
quite the opposite.

If everything is fine, there is no data corruption then you don''t have
to do anything on both file systems. But when corruption happens on
one side of a mirror you still don''t have to do anything in zfs case
and your data returned to applications will still be correct while
corrupted data on a disk will be automatically repaired. Now if you
really value your data you probably want to monitor if such
correctable by zfs events happen and investigate further to prevent
eventual failure - but you don''t have to.
In ext3 case if one side of a mirror returns corrupted data you will
end-up with applications getting BAD data and/or will have to fsck
filesystem and/or will loose some or all data and/or OS will panic,
etc.

Then if you do want to investigate then on Open Solaris platform
thanks to zfs, fma and other tools you''ve actualy got some chance to
nail down the underlying issue while on Linux with ext3 you end-up
blaming unidentified bugs (well, one might argue that lack of data
consistency checking and repair in fs is a bug...) in you file system
or at least your toolset to find what''s going on is somewhat limited
to what Open Solaris has to offer.

-- 
Best regards,
 Robert Milkowski
                                       http://milek.blogspot.com

Robert Milkowski

2009-Apr-17 01:15 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Hello Uwe,

Friday, April 17, 2009, 12:39:13 AM, you wrote:

UD> Drew Balfour wrote:>>
>> Does anyone know why it''s "applications" and not
"data"?
>>
>> Perhaps something like:
>>
>> status: One or more devices has experienced an error. A successful 
>> attempt to
>>         correct the error was made using a replicated copy of the data.
>>         Data on the pool is unaffected.
>>
UD> If it was (successful), that would have been something. It
wasn''t.
UD> ''status'' brought up the ''unrecoverable
error'', whatever number of
UD> ''scrub''s I did. Toby: ''self-healing''
is fine, but that message simply
UD> sounds scary, and worse: it doesn''t propose any further sort of
action
UD> and its consequences.

And it was *something* as it did.
When you read the message carefully you will see that it says that
"Applications are unaffected" and you don''t have to do
anything. You
can investigate if you want to but you don''t have to.

Now zpool scrub will read all used data and verify it against checksum,
correct if required and report new stats on error if needed. It won''t
clear error statistics. If you want to clear them then use ''zpool
clear'' as you did.

-- 
Best regards,
 Robert Milkowski
                                       http://milek.blogspot.com

Drew Balfour

2009-Apr-17 02:57 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Uwe Dippel wrote:
> If it was (successful), that would have been something. It wasn''t.
It was; zfs successfully repaired the data, as is evidenced by the lack of 
errors in the status output:

errors: No known data errors
> ''status'' brought up the ''unrecoverable
error'', whatever number of
> ''scrub''s I did. 
Hence the misunderstanding. The scrub is telling you, rather confusingly, that 
the device has an error, but zfs has managed to work around this error and 
maintain data integrity. The scrub will not ''fix'' the error,
as zfs can''t fix,
say, a bad block on your disk drive. It will, however, maintain data integrity 
if possible. See below for an example of what I''m trying to convey.
> "Determine if the device needs to be replaced, and clear the errors
>  using ''zpool clear'' or replace the device with
''zpool replace''. "
> This does sound scary, at least to me. How to ''determine if the
device
> needs to be replaced''?
> Should I ''clear'' or ''replace''?
It depends on what caused the error. For example, if I have a mirrored pool and 
accidentally format one side of the mirror, zpool status will show you the 
errors and leave it up to you.

For example:

# zpool create swim mirror c4t1d0s0 c4t1d0s1
# zpool status
   pool: swim
  state: ONLINE
  scrub: none requested
config:

         NAME          STATE     READ WRITE CKSUM
         swim          ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c4t1d0s0  ONLINE       0     0     0
             c4t1d0s1  ONLINE       0     0     0

errors: No known data errors

# dd if=/dev/zero of=/dev/dsk/c4t1d0s0 bs=1024x1024 skip=5 count=50
50+0 records in
50+0 records out

!!oh no, I just zero''d out half of one of my mirror devices!!

# zpool scrub swim

# zpool status
   pool: swim
  state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
         attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
         using ''zpool clear'' or replace the device with
''zpool replace''.
    see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: scrub completed after 0h0m with 0 errors on Thu Apr 16 18:52:28 2009
config:

         NAME          STATE     READ WRITE CKSUM
         swim          DEGRADED     0     0     0
           mirror      DEGRADED     0     0     0
             c4t1d0s0  DEGRADED     0     0    87  too many errors
             c4t1d0s1  ONLINE       0     0     0

errors: No known data errors

!!Since I didn''t actually have any data on the pool, the only errors
were
!!metadata checksum errors.

The confusion here is that in the above output, "error" has different
meanings
depending on its context.

"One or more devices has experienced an unrecoverable error."

In this context, "error" refers to zfs reading data off the disk, and
finding
that the checksum doesn''t match (or in this case, actually exist). zfs
has no
idea why the checksum doesn''t match; it could be a drive error, a
driver error,
a user caused error, bad bits on the bus, whatever. zfs cannot correct these 
errors, any more than any software can fix any hardware error. We do know that 
whatever the error was, we didn''t get an associated "I/O
Error" from the drive,
as that column is zero. So the drive doesn''t even know there''s
an error!

"An attempt was made to correct the error."

In this context, "error" refers to the actual bad checksum. zfs can
fix this. In
this case, by either reading from the other side of the mirror or from the 
replicated metadata. It should be noted that this attempt was successful, as zfs
was able to maintain data integrity. The is implied in the error, confusingly.


"scrub: scrub completed after 0h0m with 0 errors on Thu Apr 16 18:52:28
2009"
"errors: No known data errors"

In this context, "error" refers to uncorrectable, unrecoverable data
corruption.
There is a problem with your data, and zfs was unable to fix it. In this case, 
there were none of these, which is a good thing.

Now, as to whether to replace or clear...

In this particular case, I know what caused the error. Me. I know the disk is 
fine. I can simply:

# zpool clear swim
# zpool status
   pool: swim
  state: ONLINE
  scrub: scrub completed after 0h0m with 0 errors on Thu Apr 16 18:52:28 2009
config:

         NAME          STATE     READ WRITE CKSUM
         swim          ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c4t1d0s0  ONLINE       0     0     0
             c4t1d0s1  ONLINE       0     0     0

errors: No known data errors

zfs clear simply zeros the device error counters. I know there was nothing wrong
with the device, so I can forget about those errors.

If I didn''t know the cause of the error, and suspected a bad disk,
I''d probably
choose to replace the device.
> In the end, it needed a ''clear'' and that one CKSUM error
went away. As
> it seems without further consequences and a fully sane disk.
> Don''t call that ''self-healing''. This is an
arcane method demanding
> plenty of user activity, interaction, reading-up, etc.
zfs clear will _always_ clear _all_ errors. It''s a sysadmin''s
choice to clear
the error counters. You don''t have to clear the errors; if
you''d rather keep
track of all of the errors over the lifetime of the pool, go right ahead.

# zpool status | egrep "errors: |c4t1d0s0" 

        c4t1d0s0  ONLINE       0     0     0
errors: No known data errors
# dd if=/dev/zero of=/dev/dsk/c4t1d0s0 bs=1024x1024 skip=5 count=50
# zpool scrub swim
# zpool status | egrep "errors: |c4t1d0s0" 

        c4t1d0s0  DEGRADED	0     0 652    too many errors
errors: No known data errors
# dd if=/dev/zero of=/dev/dsk/c4t1d0s0 bs=1024x1024 skip=5 count=50
# zpool scrub swim
# zpool status | egrep "errors: |c4t1d0s0" 

        c4t1d0s0  DEGRADED	0     0 1.27K  too many errors
errors: No known data errors

You can zpool clear at any time, or you can never do it.
Of course, if you don''t know the cause of the errors, clearing probably
isn''t
the best course of action, if you value your data.

Replacing the device will also reset the counters, obviously, as the old device 
is removed and the new device (hopefully) has no problems:

# zpool status| grep c4t1d0s0
c4t1d0s0  DEGRADED     0     0    84  too many errors
# zpool replace swim c4t1d0s0 c4t1d0s3
# zpool status | grep c4t1d0s3
             c4t1d0s3  ONLINE       0     0     0  83.5K resilvered
> It seems most in here don''t run production servers. A term like 
> ''unrecoverable'' sends me into a state of frenzy.
Personally, I agree. I think the wording of the current message is confusing at 
best, and panic inducing at worst.
> If this was the case, Toby, I wouldn''t want to have to type
anything. I''d rather
> have the system 
> detecting the situation on its own accord, trying the redundant metadata 
> (we do have snapshots, don''t we!), and scrub on its very own. At
the
> end, a mail to root would be in order, informing me that an error has 
> been corrected and no data compromised at all.
That''s actually exactly what happened, minus the email. In your case,
and in all
the examples above, the "zpool scrub" is entirely unnecessary. I ran
it in the
examples to force zfs to examine the pool and find the errors. If I''d
left it
alone, and done things to the file system, it would have found the errors and 
dealt with them as the data was accessed. In other words, I could have done:

!!put some data on the pool:
# dd if=/dev/urandom of=/swim/a bs=1024x1024 count=60
60+0 records in
60+0 records out
!!do something foolish
#dd if=/dev/zero of=/dev/dsk/c4t1d0s0 bs=1024x1024 skip=5 count=50
50+0 records in
50+0 records out
!!use the data on the pool
# dd if=/swim/a of=/b bs=1024x1024 
                                60+0 records in
60+0 records out
# zpool status 
                                     pool: swim
  state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
         attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
         using ''zpool clear'' or replace the device with
''zpool replace''.
    see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: none requested
config:

         NAME          STATE     READ WRITE CKSUM
         swim          ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c4t1d0s0  ONLINE       0     0    14
             c4t1d0s1  ONLINE       0     0     0

errors: No known data errors

Now, if you''d like zfs to email you when it finds errors,
that''s easy enough to
do, since zfs helpfully logs failures with the fma daemon. By default that dumps
to /var/adm/messages, but sending an email to root, or paging you would be 
trivial to implement:

    Apr 16 19:28:53 pcandle3 fmd: [ID 441519 daemon.error] SUNW-MSG-ID:
                      ZFS-8000-GH, TYPE: Fault, VER: 1, SEVERITY: Major
    Apr 16 19:28:53 pcandle3 EVENT-TIME: Thu Apr 16 19:28:53 PDT 2009
    Apr 16 19:28:53 pcandle3 PLATFORM: Sun Fire X4200 M2, CSN: 0718BD03B4
                      , HOSTNAME: pcandle3
    Apr 16 19:28:53 pcandle3 SOURCE: zfs-diagnosis, REV: 1.0
    Apr 16 19:28:53 pcandle3 EVENT-ID: cd6fe5bc-9137-c32a-c811-ba98dac5dbe9
    Apr 16 19:28:53 pcandle3 DESC: The number of checksum errors associated
                      with a ZFS device
    Apr 16 19:28:53 pcandle3 exceeded acceptable levels.  Refer to
                      http://sun.com/msg/ZFS-8000-GH for more information.
    Apr 16 19:28:53 pcandle3 AUTO-RESPONSE: The device has been marked as
                      degraded.  An attempt
    Apr 16 19:28:53 pcandle3 will be made to activate a hot spare if available.
    Apr 16 19:28:53 pcandle3 IMPACT: Fault tolerance of the pool may be
                      compromised.
    Apr 16 19:28:53 pcandle3 REC-ACTION: Run ''zpool status -x''
and replace the
                      bad device.

However, I think we can all agree that _not_ telling you that there were 
problems is not a good idea.

I think the argument against automatically scrubbing the entire pool is that 
scrubs are very I/O intensive, and that could negatively impact performance. 
Assuming the pool is redundantly configured, there''s no danger of
losing data,
and any bad data or checksums will be corrected on-the-fly.

Of course, if it were my system and I got random, unexplained checksum errors, 
I''d probably scrub the pool, performance be damned.

-Drew

Richard Elling

2009-Apr-17 17:25 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Drew Balfour wrote:>>>>> Now I wonder where that error came from. It was just a
single
>>>>> checksum error. It couldn''t go away with an
earlier scrub, and
>>>>> seemingly left no traces of badness on the drive. Something
>>>>> serious? At least it looks a tad contradictory:
"Applications are
>>>>> unaffected.", it is unrecoverable, and once cleared,
there is no
>>>>> error left.
>
> What happens if you rescrub the pool after clearing the errors? If zfs 
> has reused whatever was causing the issue, then it shouldn''t be 
> surprising that the error will show up again.
Are you assuming that bad disk blocks are returned to the free pool?
This is more of a problem for file systems with pre-allocated metadata,
such as UFS.  In UFS, if a sector in a superblock copy goes bad, it
will still be reused.  In ZFS, metadata is COW and redundant, so
there is no forced re-use of disk blocks (except for the uberblocks
which are 4x redundant and use 128-slot circular queues).
>
>> Could you propose alternate wording?
>
> My $.02, but the wording in the error message is rather obtuse. 
> "Unrecoverable error" indicates to me that something was lost; 
> technically this is true, but zfs was able to replicate the data from 
> another source. This is not all that clear from the error:
>
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are 
> unaffected.
>
> This doesn''t indicate if the attempt was successful or not. We all
> know it was, because if it wasn''t, we''d
> (a) see another error instead and/or (b) see something other than 
> "errors: No known data errors". But, unless you know zfs well
enough
> to make that leap, you''re left wondering what actually happened.
>
> Granted, the ''verbose'' error page
(http://www.sun.com/msg/ZFS-8000-9P)
> does a much better job of explaining. However, confusing terse error 
> messages are never good, and asking the user to go look stuff up in 
> order to understand isn''t good either. Also, the verbose error
page
> also doesn''t explain that despite not having  a replicated 
> configuration, metadata is replicated and so errors can be recovered 
> from a seemingly ''unrecoverable'' state.
>
> Does anyone know why it''s "applications" and not
"data"?
>
> Perhaps something like:
>
> status: One or more devices has experienced an error. A successful 
> attempt to
>         correct the error was made using a replicated copy of the data.
>         Data on the pool is unaffected.
I think this is on the right track.  But the repair method, "replicated 
copy
of the data," should be more vague because there are other ways to
repair data.

Does anyone else have better wording?
 -- richard

Tim

2009-Apr-17 17:35 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On Fri, Apr 17, 2009 at 12:25 PM, Richard Elling
<richard.elling at gmail.com>wrote:
> Drew Balfour wrote:
>
>> Now I wonder where that error came from. It was just a single checksum
>>>>>> error. It couldn''t go away with an earlier
scrub, and seemingly left no
>>>>>> traces of badness on the drive. Something serious? At
least it looks a tad
>>>>>> contradictory: "Applications are
unaffected.", it is unrecoverable, and once
>>>>>> cleared, there is no error left.
>>>>>>
>>>>>
>> What happens if you rescrub the pool after clearing the errors? If zfs
has
>> reused whatever was causing the issue, then it shouldn''t be
surprising that
>> the error will show up again.
>>
>
> Are you assuming that bad disk blocks are returned to the free pool?
> This is more of a problem for file systems with pre-allocated metadata,
> such as UFS.  In UFS, if a sector in a superblock copy goes bad, it
> will still be reused.  In ZFS, metadata is COW and redundant, so
> there is no forced re-use of disk blocks (except for the uberblocks
> which are 4x redundant and use 128-slot circular queues).
>
>
>>  Could you propose alternate wording?
>>>
>>
>> My $.02, but the wording in the error message is rather obtuse.
>> "Unrecoverable error" indicates to me that something was
lost; technically
>> this is true, but zfs was able to replicate the data from another
source.
>> This is not all that clear from the error:
>>
>> status: One or more devices has experienced an unrecoverable error.  An
>>        attempt was made to correct the error.  Applications are
>> unaffected.
>>
>> This doesn''t indicate if the attempt was successful or not. We
all know it
>> was, because if it wasn''t, we''d
>> (a) see another error instead and/or (b) see something other than
"errors:
>> No known data errors". But, unless you know zfs well enough to
make that
>> leap, you''re left wondering what actually happened.
>>
>> Granted, the ''verbose'' error page
(http://www.sun.com/msg/ZFS-8000-9P)
>> does a much better job of explaining. However, confusing terse error
>> messages are never good, and asking the user to go look stuff up in
order to
>> understand isn''t good either. Also, the verbose error page
also doesn''t
>> explain that despite not having  a replicated configuration, metadata
is
>> replicated and so errors can be recovered from a seemingly
''unrecoverable''
>> state.
>>
>> Does anyone know why it''s "applications" and not
"data"?
>>
>> Perhaps something like:
>>
>> status: One or more devices has experienced an error. A successful
attempt
>> to
>>        correct the error was made using a replicated copy of the data.
>>        Data on the pool is unaffected.
>>
>
> I think this is on the right track.  But the repair method,
"replicated
> copy
> of the data," should be more vague because there are other ways to
> repair data.
>
> Does anyone else have better wording?
> -- richard
>
>
Unless you want to have a different response for each of the repair methods,
I''d just drop that part:

status: One or more devices has experienced an error. The error has been
       automatically corrected by zfs.
       Data on the pool is unaffected.


I suppose you could do a "for more information please contact Sun" or
something along those lines as well?

--Tim

(my reply to all skills have been suffering lately, sorry Richard).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090417/1e9b1f97/attachment.html>

Carson Gaspar

2009-Apr-17 17:45 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Tim wrote (although it wasn''t his error originally):
> Unless you want to have a different response for each of the repair 
> methods, I''d just drop that part:
> 
> status: One or more devices has experienced an error. The error has been
>        automatically corrected by zfs.
> 
>        Data on the pool is unaffected.
"Data on the pool are unaffected." Data is plural.

Aside from grammar police work, I also agree that this is a better error 
message to present to the user.

(If we''re going to change it, I''d appreciate the new version
being one
that doesn''t have a detrimental effect on my dental work due to teeth 
grinding... ;-)

-- 
Carson

Drew Balfour

2009-Apr-17 17:52 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

> Are you assuming that bad disk blocks are returned to the free pool?
Hrm. I was assuming that zfs was unaware of the source of the error, and 
therefore unable to avoid running into it again. If it was a bad sector, and the
disk knows about it, then you probably woulnd''t see it again. But if
the disk
thinks the sector is good, but it''s flipping bits, will zfs prevent the
disk
from reusing that sector?

-Drew

Dave

2009-Apr-17 17:59 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

Carson Gaspar wrote:> Tim wrote (although it wasn''t his error originally):
> 
>> Unless you want to have a different response for each of the repair 
>> methods, I''d just drop that part:
>>
>> status: One or more devices has experienced an error. The error has
been
>>        automatically corrected by zfs.
>>
>>        Data on the pool is unaffected.
> 
> "Data on the pool are unaffected." Data is plural.
> 
Not to nitpick, but I think most people would prefer the singular
''data''
when referring to the storage of data. The plural ''data'' in
this case is
very awkward.

Bob Friesenhahn

2009-Apr-17 18:17 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On Fri, 17 Apr 2009, Dave wrote:>
> Not to nitpick, but I think most people would prefer the singular
''data'' when
> referring to the storage of data. The plural ''data'' in
this case is very
> awkward.
Assuming that what is stored can be classified as data!

   http://en.wikipedia.org/wiki/Data

Why do we call these collections of bits "data"?

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Tim

2009-Apr-17 18:36 UTC

head link

[zfs-discuss] How recoverable is an ''unrecoverable error''?

On Fri, Apr 17, 2009 at 1:17 PM, Bob Friesenhahn <
bfriesen at simple.dallas.tx.us> wrote:
> On Fri, 17 Apr 2009, Dave wrote:
>
>>
>> Not to nitpick, but I think most people would prefer the singular
''data''
>> when referring to the storage of data. The plural
''data'' in this case is
>> very awkward.
>>
>
> Assuming that what is stored can be classified as data!
>
>  http://en.wikipedia.org/wiki/Data
>
> Why do we call these collections of bits "data"?
>
> Bob
> --
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
Because the CXX wouldn''t have a frigging clue what you were talking
about if
you started referencing collections of bits?  "We need to buy this $50,000
storage array to store our collection of bits" would likely get you
escorted
out of his/her office.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090417/cc418078/attachment.html>

zfs discuss - Apr 2009 - How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?

[zfs-discuss] How recoverable is an ''unrecoverable error''?