thr3ads.net - zfs discuss - [zfs-discuss] My first ''unrecoverable error'', what to do? [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Sam

2008-Jul-30 02:23 UTC

[zfs-discuss] My first ''unrecoverable error'', what to do?

I''ve had my 10x500 ZFS+ running for probably 6 months now and had
thought it was scrubbing occasionally (wrong) so I started a scrub this morning,
its almost done now and I got this:

errors: No known data errors
# zpool status
  pool: pile
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress, 97.93% done, 0h5m to go
config:

        NAME        STATE     READ WRITE CKSUM
        pile        ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c5t3d0  ONLINE       0     0     0
            c5t4d0  ONLINE       0     0     1
            c5t5d0  ONLINE       0     0     0
            c5t6d0  ONLINE       0     0     1
            c5t7d0  ONLINE       0     0     0
            c3d0    ONLINE       0     0     1
            c4d0    ONLINE       0     0     0


So it says its a minor error but still one to be concerned about, I thought
resilvering takes care of checksum errors, does it not?  Should I be running to
buy 3 new 500GB drives?

Thanks,
Sam
 
 
This message posted from opensolaris.org

Sam

2008-Jul-30 02:40 UTC

head link

[zfs-discuss] My first ''unrecoverable error'', what to do?

Could this someway be related to this rather large (100GB) difference that
''zfs list'' and ''zpool list'' report:

NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
pile  4.53T  4.31T   223G    95%  ONLINE  -
# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
pile  3.44T   120G  3.44T  /pile

I know there should be a 1TB difference in SIZE but the difference in AVAIL
makes no sense.
 
 
This message posted from opensolaris.org

Arne Schwabe

2008-Jul-30 02:57 UTC

head link

[zfs-discuss] My first ''unrecoverable error'', what to do?

Sam schrieb:> I''ve had my 10x500 ZFS+ running for probably 6 months now and had
thought it was scrubbing occasionally (wrong) so I started a scrub this morning,
its almost done now and I got this:
>
> errors: No known data errors
> # zpool status
>   pool: pile
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are
unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using ''zpool clear'' or replace the device with
''zpool replace''.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: scrub in progress, 97.93% done, 0h5m to go
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         pile        ONLINE       0     0     0
>           raidz2    ONLINE       0     0     0
>             c5t0d0  ONLINE       0     0     0
>             c5t1d0  ONLINE       0     0     0
>             c5t2d0  ONLINE       0     0     0
>             c5t3d0  ONLINE       0     0     0
>             c5t4d0  ONLINE       0     0     1
>             c5t5d0  ONLINE       0     0     0
>             c5t6d0  ONLINE       0     0     1
>             c5t7d0  ONLINE       0     0     0
>             c3d0    ONLINE       0     0     1
>             c4d0    ONLINE       0     0     0
>
>
> So it says its a minor error but still one to be concerned about, I thought
resilvering takes care of checksum errors, does it not?  Should I be running to
buy 3 new 500GB drives?
>
>   Failures can have different cause. Maybe a cable is defect. Also 
occosinal defect sectors are "normal" and are managed quite good by
the
defect managment of the drive. You can use zpool clear to reset the 
counters to 0.

Arne

Bob Friesenhahn

2008-Jul-30 05:01 UTC

head link

[zfs-discuss] My first ''unrecoverable error'', what to do?

On Tue, 29 Jul 2008, Sam wrote:> So it says its a minor error but still one to be concerned about, I 
> thought resilvering takes care of checksum errors, does it not? 
> Should I be running to buy 3 new 500GB drives?
Presumably these are SATA drives.  Studies show that typical SATA 
drives tend to produce recurring data errors during their lifetime so 
a few data errors are likely nothing to be alarmed about.  If you see 
many tens or hundreds then there would be cause for concern. 
Enterprise SCSI drives produce very few such errors and evidence 
suggests that data errors may portend doom.

I have yet to see an error here.  Knock on wood!

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Robert Milkowski

2008-Jul-30 10:07 UTC

head link

[zfs-discuss] My first ''unrecoverable error'', what to do?

Hello Sam,

Wednesday, July 30, 2008, 3:23:55 AM, you wrote:

S> I''ve had my 10x500 ZFS+ running for probably 6 months now and had
S> thought it was scrubbing occasionally (wrong) so I started a scrub
S> this morning, its almost done now and I got this:

S> errors: No known data errors
S> # zpool status
S>   pool: pile
S>  state: ONLINE
S> status: One or more devices has experienced an unrecoverable error.  An
S>         attempt was made to correct the error.  Applications are
unaffected.
S> action: Determine if the device needs to be replaced, and clear the errors
S>         using ''zpool clear'' or replace the device with
''zpool replace''.
S>    see: http://www.sun.com/msg/ZFS-8000-9P
S>  scrub: scrub in progress, 97.93% done, 0h5m to go
S> config:

S>         NAME        STATE     READ WRITE CKSUM
S>         pile        ONLINE       0     0     0
S>           raidz2    ONLINE       0     0     0
S>             c5t0d0  ONLINE       0     0     0
S>             c5t1d0  ONLINE       0     0     0
S>             c5t2d0  ONLINE       0     0     0
S>             c5t3d0  ONLINE       0     0     0
S>             c5t4d0  ONLINE       0     0     1
S>             c5t5d0  ONLINE       0     0     0
S>             c5t6d0  ONLINE       0     0     1
S>             c5t7d0  ONLINE       0     0     0
S>             c3d0    ONLINE       0     0     1
S>             c4d0    ONLINE       0     0     0


S> So it says its a minor error but still one to be concerned about,
S> I thought resilvering takes care of checksum errors, does it not? 
S> Should I be running to buy 3 new 500GB drives?

ZFS only reported to you that there were some checksum errors - all of
them were corrected and no bad data was given back to applications.


-- 
Best regards,
 Robert Milkowski                           mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

zfs discuss - Jul 2008 - My first ''unrecoverable error'', what to do?

[zfs-discuss] My first ''unrecoverable error'', what to do?

[zfs-discuss] My first ''unrecoverable error'', what to do?

[zfs-discuss] My first ''unrecoverable error'', what to do?

[zfs-discuss] My first ''unrecoverable error'', what to do?

[zfs-discuss] My first ''unrecoverable error'', what to do?