thr3ads.net - zfs discuss - [zfs-discuss] zpool status and CKSUM errors [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Robert Milkowski

2006-Jun-09 13:16 UTC

[zfs-discuss] zpool status and CKSUM errors

bash-3.00# zpool status -v nfs-s5-p1
  pool: nfs-s5-p1
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME                                     STATE     READ WRITE CKSUM
        nfs-s5-p1                                ONLINE       0     0     2
          c4t600C0FF00000000009258F7A4F1BC601d0  ONLINE       0     0     2

errors: No known data errors
bash-3.00#

As you can see there''s no protection with ZFS.
Does it mean that those two checksum errors were related to metadata and thanks
to ditto blocks it was corrected? (I assume application did receive proper data
and fs is ok).

btw: I''m really suprised how SATA disks are unreliable. I put dozen TBs
of data on ZFS last time and just after few days I got few hundreds checksum
error (there raid-z was used). And these disks are 500GB in 3511 array. Well
that would explain some fsck''s, etc. we saw before.
 
 
This message posted from opensolaris.org

Eric Schrock

2006-Jun-09 15:16 UTC

head link

[zfs-discuss] zpool status and CKSUM errors

On Fri, Jun 09, 2006 at 06:16:53AM -0700, Robert Milkowski
wrote:> bash-3.00# zpool status -v nfs-s5-p1
>   pool: nfs-s5-p1
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are
unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using ''zpool clear'' or replace the device with
''zpool replace''.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: none requested
> config:
> 
>         NAME                                     STATE     READ WRITE CKSUM
>         nfs-s5-p1                                ONLINE       0     0     2
>           c4t600C0FF00000000009258F7A4F1BC601d0  ONLINE       0     0     2
> 
> errors: No known data errors
> bash-3.00#
> 
> As you can see there''s no protection with ZFS.
> Does it mean that those two checksum errors were related to metadata
> and thanks to ditto blocks it was corrected? (I assume application did
> receive proper data and fs is ok).
Hmm, I''m not sure.  There are no persistent data errors (as shown by
the
''errors:'' line), so you should be.  If you want to send your
/var/fm/fmd/errlog, or ''fmdump -eV'' output, we can take a look
at the
details of the error.  If this is the case, then it''s a bug that the
checksum error is reported for the pool for a recovered ditto block.
You may want to try ''zpool clear nfs-s5-p1; zpool scrub
nfs-s5-p1'' and
see if it turns up anything.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Richard Elling

2006-Jun-09 16:35 UTC

head link

[zfs-discuss] zpool status and CKSUM errors

Robert Milkowski wrote:> btw: I''m really suprised how SATA disks are unreliable. I put
dozen TBs of
> data on ZFS last time and just after few days I got few hundreds checksum 
> error (there raid-z was used). And these disks are 500GB in 3511 array. 
> Well that would explain some fsck''s, etc. we saw before.
It is more likely due to the density than the interface.  In general,
high density disks will suffer from superparamagnetic affects more than
lower density disks.  There are several ways to combat this, but the
consumer market values space over reliability.  And since there is no
checksumming to detect problems, they don''t think they have problems --
the insidious effects of cancer.
  -- richard

Darren J Moffat

2006-Jun-09 16:45 UTC

head link

[zfs-discuss] zpool status and CKSUM errors

Richard Elling wrote:> Robert Milkowski wrote:
>> btw: I''m really suprised how SATA disks are unreliable. I put
dozen
>> TBs of data on ZFS last time and just after few days I got few 
>> hundreds checksum error (there raid-z was used). And these disks are 
>> 500GB in 3511 array. Well that would explain some fsck''s, etc.
we saw
>> before.
>
> It is more likely due to the density than the interface.  In general,
> high density disks will suffer from superparamagnetic affects more than
> lower density disks.  There are several ways to combat this, but the
> consumer market values space over reliability.  I''m not actually convinced the consumer market wants the space, it is 
more that we don''t have a choice because it bigger and bigger drives is
all we can buy.  Personally I have very little need for a 500G disk at 
home (mainly because I don''t do video and my photos are jpg not raw
;-)).
> And since there is no checksumming to detect problems, they don''t 
> think they have problems --
> the insidious effects of cancer.Or most of the data is stored in file formats don''t get impacted too 
much by the odd bit flip here and there (eg MPEG streams).

--
Darren J Moffat

Jeff Bonwick

2006-Jun-10 00:32 UTC

head link

[zfs-discuss] zpool status and CKSUM errors

> btw: I''m really suprised how SATA disks are unreliable. I put
dozen
> TBs of data on ZFS last time and just after few days I got few hundreds
> checksum error (there raid-z was used). And these disks are 500GB in
> 3511 array. Well that would explain some fsck''s, etc. we saw
before.
I suspect you''ve got a bad disk or controller.  A normal SATA drive
just won''t behave this badly.  Cool that RAID-Z survives it, though.

Jeff

Richard Elling

2006-Jun-10 00:54 UTC

head link

[zfs-discuss] zpool status and CKSUM errors

Jeff Bonwick wrote:>> btw: I''m really suprised how SATA disks are unreliable. I put
dozen
>> TBs of data on ZFS last time and just after few days I got few hundreds
>> checksum error (there raid-z was used). And these disks are 500GB in
>> 3511 array. Well that would explain some fsck''s, etc. we saw
before.
> 
> I suspect you''ve got a bad disk or controller.  A normal SATA
drive
> just won''t behave this badly.  Cool that RAID-Z survives it,
though.
I had a power supply go bad a few months ago (cheap PC-junk power supply)
and it trashed a bunch of my SATA and IDE disks [*] (though, happily,
not the IDE disk I scavenged from a Sun V100 :-).  The symptoms were
thousands of non-recoverable reads which were remapped until the
disks ran out of spare blocks.  Since I didn''t believe this, I got a
new, more expensive, and presumably more reliable power supply.
The IDE disks faired better, but I had to do a low-level format on
the SATA drive.  All is well now and zfs hasn''t shown any errors
since.  But, thunderstorm season is approaching next month...

I am also trying to collect field data which shows such failure modes
specifically looking for clusters of errors.  However, I can''t promise
anything, and may not get much time to do in-depth study anytime soon.

[*] my theory is that disks are about the only devices still using
12VDC power.  Some disk vendor specify the quality of the 12VDC
supply (eg. ripple) for specific drives.  In my case, the 12VDC
was the only common-mode failure in the system which would have
trashed most of the drives in this manner.
  -- richard

Robert Milkowski

2006-Jun-12 08:49 UTC

head link

[zfs-discuss] zpool status and CKSUM errors

Hello Eric,

Friday, June 9, 2006, 5:16:29 PM, you wrote:

ES> On Fri, Jun 09, 2006 at 06:16:53AM -0700, Robert Milkowski
wrote:>> bash-3.00# zpool status -v nfs-s5-p1
>>   pool: nfs-s5-p1
>>  state: ONLINE
>> status: One or more devices has experienced an unrecoverable error.  An
>>         attempt was made to correct the error.  Applications are
unaffected.
>> action: Determine if the device needs to be replaced, and clear the
errors
>>         using ''zpool clear'' or replace the device
with ''zpool replace''.
>>    see: http://www.sun.com/msg/ZFS-8000-9P
>>  scrub: none requested
>> config:
>> 
>>         NAME                                     STATE     READ WRITE
CKSUM
>>         nfs-s5-p1                                ONLINE       0     0  
2
>>           c4t600C0FF00000000009258F7A4F1BC601d0  ONLINE       0     0  
2
>> 
>> errors: No known data errors
>> bash-3.00#
>> 
>> As you can see there''s no protection with ZFS.
>> Does it mean that those two checksum errors were related to metadata
>> and thanks to ditto blocks it was corrected? (I assume application did
>> receive proper data and fs is ok).
ES> Hmm, I''m not sure.  There are no persistent data errors (as
shown by the
ES> ''errors:'' line), so you should be.  If you want to send
your
ES> /var/fm/fmd/errlog, or ''fmdump -eV'' output, we can take
a look at the
ES> details of the error.  If this is the case, then it''s a bug that
the
ES> checksum error is reported for the pool for a recovered ditto block.
ES> You may want to try ''zpool clear nfs-s5-p1; zpool scrub
nfs-s5-p1'' and
ES> see if it turns up anything.


Well, I just did ''fmdump -eV'' and last entry is from May 31th
and is
related to pools which are already destroyed.

I can see another 1 checksum error in that pool (I did zpool clear
last time) and it''s NOT reported by fmdump. This one occuerd afet May
31th.

I hope these are ditto blocks and nothing else (read: bad).

System is b39 SPARC.


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2006-Jun-12 09:00 UTC

head link

[zfs-discuss] zpool status and CKSUM errors

Hello Jeff,

Saturday, June 10, 2006, 2:32:49 AM, you wrote:
>> btw: I''m really suprised how SATA disks are unreliable. I put
dozen
>> TBs of data on ZFS last time and just after few days I got few hundreds
>> checksum error (there raid-z was used). And these disks are 500GB in
>> 3511 array. Well that would explain some fsck''s, etc. we saw
before.
JB> I suspect you''ve got a bad disk or controller.  A normal SATA
drive
JB> just won''t behave this badly.  Cool that RAID-Z survives it,
though.

It''s not that bad right now.
It was then but the array (3511) reported several times ''Drive NOTIFY:
Media Error
Encountered - 163A981 (311)'' and then I got all of these CKSUM errors.
Once it stabilized (drive finally filed and was replaced by hotspare)
I see no CKSUM errors after few days. Looks like drive was failing,
etc. But still I''m surprised that the array returned bad data (raid-5
on the array). We see such messages once in a while on several 3511s.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Eric Schrock

2006-Jun-12 15:05 UTC

head link

[zfs-discuss] zpool status and CKSUM errors

On Mon, Jun 12, 2006 at 10:49:49AM +0200, Robert Milkowski
wrote:> 
> Well, I just did ''fmdump -eV'' and last entry is from May
31th and is
> related to pools which are already destroyed.
> 
> I can see another 1 checksum error in that pool (I did zpool clear
> last time) and it''s NOT reported by fmdump. This one occuerd afet
May
> 31th.
> 
> I hope these are ditto blocks and nothing else (read: bad).
> 
> System is b39 SPARC.
Yes, that does sound like ditto blocks.  I''ll poke around with Bill and
figure out why the checksum errors would be percolating up to the pool
level.  They should be reported only for the leaf device.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Eric Schrock

2006-Jun-12 21:21 UTC

head link

[zfs-discuss] zpool status and CKSUM errors

I reproduced this pretty easily on a lab machine.  I''ve filed:

6437568 ditto block repair is incorrectly propagated to root vdev

To track this issue.  Keep in mind that you do have a flakey
controller/lun/something.  If this had been a user data block, your data
would be gone.

- Eric

On Mon, Jun 12, 2006 at 08:05:03AM -0700, Eric Schrock
wrote:> On Mon, Jun 12, 2006 at 10:49:49AM +0200, Robert Milkowski wrote:
> > 
> > Well, I just did ''fmdump -eV'' and last entry is from
May 31th and is
> > related to pools which are already destroyed.
> > 
> > I can see another 1 checksum error in that pool (I did zpool clear
> > last time) and it''s NOT reported by fmdump. This one occuerd
afet May
> > 31th.
> > 
> > I hope these are ditto blocks and nothing else (read: bad).
> > 
> > System is b39 SPARC.
> 
> Yes, that does sound like ditto blocks.  I''ll poke around with
Bill and
> figure out why the checksum errors would be percolating up to the pool
> level.  They should be reported only for the leaf device.
> 
> - Eric
> 
> --
> Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Robert Milkowski

2006-Jun-14 00:00 UTC

head link

[zfs-discuss] zpool status and CKSUM errors

Hello Eric,

Monday, June 12, 2006, 11:21:24 PM, you wrote:

ES> I reproduced this pretty easily on a lab machine.  I''ve filed:

ES> 6437568 ditto block repair is incorrectly propagated to root vdev

Good, thank you.

ES> To track this issue.  Keep in mind that you do have a flakey
ES> controller/lun/something.  If this had been a user data block, your data
ES> would be gone.

Well, probably something is wrong.
But it surprises me that every time I get CKSUM error in that config
every time it relates to metadata... well quite unlikely isn''t it?

btw: if it would be a data block then app reading that block would get
proper error and that''s it - right?


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2006-Jul-05 23:49 UTC

head link

[zfs-discuss] zpool status and CKSUM errors

Hello Eric,

Monday, June 12, 2006, 11:21:24 PM, you wrote:

ES> I reproduced this pretty easily on a lab machine.  I''ve filed:

ES> 6437568 ditto block repair is incorrectly propagated to root vdev

ES> To track this issue.  Keep in mind that you do have a flakey
ES> controller/lun/something.  If this had been a user data block, your data
ES> would be gone.


I belive that something else is also happening here.
I can see CKSUM errors on two different servers (v240 and T2000) all
on non-redundant zpools and all the times it looks like ditto block
helped - hey, it''s just improbable.

And while on T2000 from fmdump -ev I get:

Jul 05 19:59:43.8786 ereport.io.fire.pec.btp               0x14e4b8015f612002
Jul 05 20:05:28.9165 ereport.io.fire.pec.re                0x14e5f951ce12b002
Jul 05 20:05:58.5381 ereport.io.fire.pec.re                0x14e614e78f4c9002
Jul 05 20:05:58.5389 ereport.io.fire.pec.btp               0x14e614e7b6ddf002
Jul 05 23:34:11.1960 ereport.io.fire.pec.re                0x1513869a6f7a6002
Jul 05 23:34:11.1967 ereport.io.fire.pec.btp               0x1513869a95196002
Jul 06 00:09:17.1845 ereport.io.fire.pec.re                0x151b2fca4c988002
Jul 06 00:09:17.1852 ereport.io.fire.pec.btp               0x151b2fca72e6b002


on v240 fmdump shows nothing for over a month and I''m sure I did zpool
clear on that server later.


v240:
bash-3.00# zpool status nfs-s5-s7
  pool: nfs-s5-s7
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME                                     STATE     READ WRITE CKSUM
        nfs-s5-s7                                ONLINE       0     0   167
          c4t600C0FF00000000009258F28706F5201d0  ONLINE       0     0   167

errors: No known data errors
bash-3.00#
bash-3.00# zpool clear nfs-s5-s7
bash-3.00# zpool status nfs-s5-s7
  pool: nfs-s5-s7
 state: ONLINE
 scrub: none requested
config:

        NAME                                     STATE     READ WRITE CKSUM
        nfs-s5-s7                                ONLINE       0     0     0
          c4t600C0FF00000000009258F28706F5201d0  ONLINE       0     0     0

errors: No known data errors
bash-3.00#
bash-3.00# zpool scrub nfs-s5-s7
bash-3.00# zpool status nfs-s5-s7
  pool: nfs-s5-s7
 state: ONLINE
 scrub: scrub in progress, 0.01% done, 269h24m to go
config:

        NAME                                     STATE     READ WRITE CKSUM
        nfs-s5-s7                                ONLINE       0     0     0
          c4t600C0FF00000000009258F28706F5201d0  ONLINE       0     0     0

errors: No known data errors
bash-3.00#

We''ll see the result - I hope I would have not to stop it in the
morning. Anyway I have a feeling that nothing will be reported.


ps. I''ve got several similar pools on those two servers and I see
CKSUM errors on all of them with the same result - it''s almost
impossible.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2006-Jul-09 18:44 UTC

head link

[zfs-discuss] zpool status and CKSUM errors

Hello Robert,

Thursday, July 6, 2006, 1:49:34 AM, you wrote:

RM> Hello Eric,

RM> Monday, June 12, 2006, 11:21:24 PM, you wrote:

ES>> I reproduced this pretty easily on a lab machine.  I''ve
filed:

ES>> 6437568 ditto block repair is incorrectly propagated to root vdev

ES>> To track this issue.  Keep in mind that you do have a flakey
ES>> controller/lun/something.  If this had been a user data block, your
data
ES>> would be gone.


RM> I belive that something else is also happening here.
RM> I can see CKSUM errors on two different servers (v240 and T2000) all
RM> on non-redundant zpools and all the times it looks like ditto block
RM> helped - hey, it''s just improbable.

RM> And while on T2000 from fmdump -ev I get:

RM> Jul 05 19:59:43.8786 ereport.io.fire.pec.btp              
0x14e4b8015f612002
RM> Jul 05 20:05:28.9165 ereport.io.fire.pec.re               
0x14e5f951ce12b002
RM> Jul 05 20:05:58.5381 ereport.io.fire.pec.re               
0x14e614e78f4c9002
RM> Jul 05 20:05:58.5389 ereport.io.fire.pec.btp              
0x14e614e7b6ddf002
RM> Jul 05 23:34:11.1960 ereport.io.fire.pec.re               
0x1513869a6f7a6002
RM> Jul 05 23:34:11.1967 ereport.io.fire.pec.btp              
0x1513869a95196002
RM> Jul 06 00:09:17.1845 ereport.io.fire.pec.re               
0x151b2fca4c988002
RM> Jul 06 00:09:17.1852 ereport.io.fire.pec.btp              
0x151b2fca72e6b002


RM> on v240 fmdump shows nothing for over a month and I''m sure I did
zpool
RM> clear on that server later.


RM> v240:
RM> bash-3.00# zpool status nfs-s5-s7
RM>   pool: nfs-s5-s7
RM>  state: ONLINE
RM> status: One or more devices has experienced an unrecoverable error.  An
RM>         attempt was made to correct the error.  Applications are
unaffected.
RM> action: Determine if the device needs to be replaced, and clear the
errors
RM>         using ''zpool clear'' or replace the device with
''zpool replace''.
RM>    see: http://www.sun.com/msg/ZFS-8000-9P
RM>  scrub: none requested
RM> config:

RM>         NAME                                     STATE     READ WRITE
CKSUM
RM>         nfs-s5-s7                                ONLINE       0   0   167
RM>           c4t600C0FF00000000009258F28706F5201d0  ONLINE       0   0   167

RM> errors: No known data errors
RM> bash-3.00#
RM> bash-3.00# zpool clear nfs-s5-s7
RM> bash-3.00# zpool status nfs-s5-s7
RM>   pool: nfs-s5-s7
RM>  state: ONLINE
RM>  scrub: none requested
RM> config:

RM>         NAME                                     STATE     READ WRITE
CKSUM
RM>         nfs-s5-s7                                ONLINE       0   0     0
RM>           c4t600C0FF00000000009258F28706F5201d0  ONLINE       0   0     0

RM> errors: No known data errors
RM> bash-3.00#
RM> bash-3.00# zpool scrub nfs-s5-s7
RM> bash-3.00# zpool status nfs-s5-s7
RM>   pool: nfs-s5-s7
RM>  state: ONLINE
RM>  scrub: scrub in progress, 0.01% done, 269h24m to go
RM> config:

RM>         NAME                                     STATE     READ WRITE
CKSUM
RM>         nfs-s5-s7                                ONLINE       0   0     0
RM>           c4t600C0FF00000000009258F28706F5201d0  ONLINE       0   0     0

RM> errors: No known data errors
RM> bash-3.00#

RM> We''ll see the result - I hope I would have not to stop it in the
RM> morning. Anyway I have a feeling that nothing will be reported.


RM> ps. I''ve got several similar pools on those two servers and I
see
RM> CKSUM errors on all of them with the same result - it''s almost
RM> impossible.


ok, it took several days actually to complete scrub.
During scrub I saw some CKSUM errors already and now again there are
many of them, however scrub itself reported no errors at all.

bash-3.00# zpool status nfs-s5-s7
  pool: nfs-s5-s7
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed with 0 errors on Sun Jul  9 02:56:19 2006
config:

        NAME                                     STATE     READ WRITE CKSUM
        nfs-s5-s7                                ONLINE       0     0    18
          c4t600C0FF00000000009258F28706F5201d0  ONLINE       0     0    18

errors: No known data errors
bash-3.00#


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2006-Jul-17 20:44 UTC

head link

Fwd: Re[3]: [zfs-discuss] zpool status and CKSUM errors

Hi.

   Sorry for forward but maybe this will be more visible that way.

   I really think something strange is going on here and it''s
   virtually impossible that I have a problem with hardware and get
   CKSUM errors (many of them) only for ditto blocks.


This is a forwarded message
From: Robert Milkowski <rmilkowski at task.gda.pl>
To: Robert Milkowski <rmilkowski at task.gda.pl>
Date: Sunday, July 9, 2006, 8:44:16 PM
Subject: [zfs-discuss] zpool status and CKSUM errors

===8<==============Original message text==============Hello Robert,

Thursday, July 6, 2006, 1:49:34 AM, you wrote:

RM> Hello Eric,

RM> Monday, June 12, 2006, 11:21:24 PM, you wrote:

ES>> I reproduced this pretty easily on a lab machine.  I''ve
filed:

ES>> 6437568 ditto block repair is incorrectly propagated to root vdev

ES>> To track this issue.  Keep in mind that you do have a flakey
ES>> controller/lun/something.  If this had been a user data block, your
data
ES>> would be gone.


RM> I belive that something else is also happening here.
RM> I can see CKSUM errors on two different servers (v240 and T2000) all
RM> on non-redundant zpools and all the times it looks like ditto block
RM> helped - hey, it''s just improbable.

RM> And while on T2000 from fmdump -ev I get:

RM> Jul 05 19:59:43.8786 ereport.io.fire.pec.btp              
0x14e4b8015f612002
RM> Jul 05 20:05:28.9165 ereport.io.fire.pec.re               
0x14e5f951ce12b002
RM> Jul 05 20:05:58.5381 ereport.io.fire.pec.re               
0x14e614e78f4c9002
RM> Jul 05 20:05:58.5389 ereport.io.fire.pec.btp              
0x14e614e7b6ddf002
RM> Jul 05 23:34:11.1960 ereport.io.fire.pec.re               
0x1513869a6f7a6002
RM> Jul 05 23:34:11.1967 ereport.io.fire.pec.btp              
0x1513869a95196002
RM> Jul 06 00:09:17.1845 ereport.io.fire.pec.re               
0x151b2fca4c988002
RM> Jul 06 00:09:17.1852 ereport.io.fire.pec.btp              
0x151b2fca72e6b002


RM> on v240 fmdump shows nothing for over a month and I''m sure I did
zpool
RM> clear on that server later.


RM> v240:
RM> bash-3.00# zpool status nfs-s5-s7
RM>   pool: nfs-s5-s7
RM>  state: ONLINE
RM> status: One or more devices has experienced an unrecoverable error.  An
RM>         attempt was made to correct the error.  Applications are
unaffected.
RM> action: Determine if the device needs to be replaced, and clear the
errors
RM>         using ''zpool clear'' or replace the device with
''zpool replace''.
RM>    see: http://www.sun.com/msg/ZFS-8000-9P
RM>  scrub: none requested
RM> config:

RM>         NAME                                     STATE     READ WRITE
CKSUM
RM>         nfs-s5-s7                                ONLINE       0   0   167
RM>           c4t600C0FF00000000009258F28706F5201d0  ONLINE       0   0   167

RM> errors: No known data errors
RM> bash-3.00#
RM> bash-3.00# zpool clear nfs-s5-s7
RM> bash-3.00# zpool status nfs-s5-s7
RM>   pool: nfs-s5-s7
RM>  state: ONLINE
RM>  scrub: none requested
RM> config:

RM>         NAME                                     STATE     READ WRITE
CKSUM
RM>         nfs-s5-s7                                ONLINE       0   0     0
RM>           c4t600C0FF00000000009258F28706F5201d0  ONLINE       0   0     0

RM> errors: No known data errors
RM> bash-3.00#
RM> bash-3.00# zpool scrub nfs-s5-s7
RM> bash-3.00# zpool status nfs-s5-s7
RM>   pool: nfs-s5-s7
RM>  state: ONLINE
RM>  scrub: scrub in progress, 0.01% done, 269h24m to go
RM> config:

RM>         NAME                                     STATE     READ WRITE
CKSUM
RM>         nfs-s5-s7                                ONLINE       0   0     0
RM>           c4t600C0FF00000000009258F28706F5201d0  ONLINE       0   0     0

RM> errors: No known data errors
RM> bash-3.00#

RM> We''ll see the result - I hope I would have not to stop it in the
RM> morning. Anyway I have a feeling that nothing will be reported.


RM> ps. I''ve got several similar pools on those two servers and I
see
RM> CKSUM errors on all of them with the same result - it''s almost
RM> impossible.


ok, it took several days actually to complete scrub.
During scrub I saw some CKSUM errors already and now again there are
many of them, however scrub itself reported no errors at all.

bash-3.00# zpool status nfs-s5-s7
  pool: nfs-s5-s7
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed with 0 errors on Sun Jul  9 02:56:19 2006
config:

        NAME                                     STATE     READ WRITE CKSUM
        nfs-s5-s7                                ONLINE       0     0    18
          c4t600C0FF00000000009258F28706F5201d0  ONLINE       0     0    18

errors: No known data errors
bash-3.00#


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

===8<===========End of original message text===========

zfs discuss - Jun 2006 - zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

[zfs-discuss] zpool status and CKSUM errors

Fwd: Re[3]: [zfs-discuss] zpool status and CKSUM errors