thr3ads.net - zfs discuss - [zfs-discuss] RAID-Z resilver broken [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Chris Csanady

2007-Apr-07 22:05 UTC

[zfs-discuss] RAID-Z resilver broken

In a recent message, I detailed the excessive checksum errors that
occurred after replacing a disk.  It seems that after a resilver
completes, it leaves a large number of blocks in the pool which fail
to checksum properly.  Afterward, it is necessary to scrub the pool in
order to correct these errors.

After some testing, it seems that this only occurs with RAID-Z.  The
same behavior can be observed on both snv_59 and snv_60, though I do
not have any other installs to test at the moment.

The following commands should reproduce this result in a small test pool.

Chris


mkdir /tmp/test
mkfile 64m /tmp/test/0 /tmp/test/1
zpool create test raidz /tmp/test/0 /tmp/test/1
mkfile 16m /test/file

zpool export test
rm /tmp/test/0
zpool import -d /tmp/test test
mkfile 64m /tmp/test/0
zpool replace test /tmp/test/0

# wait for the resilver to complete, and observe that it completes successfully
zpool status test

# scrub the pool
zpool scrub test

# watch the checksum errors accumulate as the scrub progresses
zpool status test

Marco van Lienen

2007-Apr-11 13:38 UTC

head link

[zfs-discuss] RAID-Z resilver broken

On Sat, Apr 07, 2007 at 05:05:18PM -0500, in a galaxy far far away, Chris
Csanady said:> In a recent message, I detailed the excessive checksum errors that
> occurred after replacing a disk.  It seems that after a resilver
> completes, it leaves a large number of blocks in the pool which fail
> to checksum properly.  Afterward, it is necessary to scrub the pool in
> order to correct these errors.
> 
> After some testing, it seems that this only occurs with RAID-Z.  The
> same behavior can be observed on both snv_59 and snv_60, though I do
> not have any other installs to test at the moment.
A colleague at work and I have followed the same steps, included running a
digest on the /test/file, on a SXCE:61 build today and can confirm the exact
same,
and disturbing?, result.
My colleague mentioned to me he has witnessed the same
''resilver'' behavior on
builds 57 and 60.

The box which these steps were performed on was ''luupgraded''
from SXCE: 60 to 61 using the SUNWlu* packages from
61!

# cat /etc/release
                            Solaris Nevada snv_61 X86
           Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                             Assembled 26 March 2007

# mkdir /tmp/test
# mkfile 64m /tmp/test/0 /tmp/test/1
# zpool create test raidz /tmp/test/0 /tmp/test/1
# mkfile 16m /test/file
# digest -v -a sha1 /test/file
sha1 (/test/file) = 3b4417fc421cee30a9ad0fd9319220a8dae32da2
# 
# zpool export test
# rm /tmp/test/0
# zpool import -d /tmp/test test
# mkfile 64m /tmp/test/0
# zpool replace test /tmp/test/0
# digest -v -a sha1 /test/file
sha1 (/test/file) = 3b4417fc421cee30a9ad0fd9319220a8dae32da2
# zpool status test
  pool: test
 state: ONLINE
 scrub: resilver completed with 0 errors on Wed Apr 11 15:19:15 2007
config:

        NAME             STATE     READ WRITE CKSUM
        test             ONLINE       0     0     0
          raidz1         ONLINE       0     0     0
            /tmp/test/0  ONLINE       0     0     0
            /tmp/test/1  ONLINE       0     0     0

errors: No known data errors
# zpool scrub test
#
# zpool status test
  pool: test
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed with 0 errors on Wed Apr 11 15:22:30 2007
config:

        NAME             STATE     READ WRITE CKSUM
        test             ONLINE       0     0     0
          raidz1         ONLINE       0     0     0
            /tmp/test/0  ONLINE       0     0    17
            /tmp/test/1  ONLINE       0     0     0

errors: No known data errors

I don''t think these checksum errors are a good sign. 
The sha1 digest on the file *does* show to be the same so the question arises:
is the resilver process truly broken (even though in this test-case the test
file does appear to unchanged based on the sha1 digest) ?

Marco

-- 
# make mistake
make: don''t know how to make mistake. Stop

Mark Maybee

2007-Apr-11 18:25 UTC

head link

[zfs-discuss] RAID-Z resilver broken

ugh, thanks for exploring this and isolating the problem.  We will look
into what is going on (wrong) here.  I have filed bug:

6545015 RAID-Z resilver broken

to track this problem.

-Mark

Marco van Lienen wrote:> On Sat, Apr 07, 2007 at 05:05:18PM -0500, in a galaxy far far away, Chris
Csanady said:
>> In a recent message, I detailed the excessive checksum errors that
>> occurred after replacing a disk.  It seems that after a resilver
>> completes, it leaves a large number of blocks in the pool which fail
>> to checksum properly.  Afterward, it is necessary to scrub the pool in
>> order to correct these errors.
>>
>> After some testing, it seems that this only occurs with RAID-Z.  The
>> same behavior can be observed on both snv_59 and snv_60, though I do
>> not have any other installs to test at the moment.
> 
> A colleague at work and I have followed the same steps, included running a
digest on the /test/file, on a SXCE:61 build today and can confirm the exact
same,
> and disturbing?, result.
> My colleague mentioned to me he has witnessed the same
''resilver'' behavior on
> builds 57 and 60.
> 
> The box which these steps were performed on was
''luupgraded'' from SXCE: 60 to 61 using the SUNWlu* packages
from
> 61!
> 
> # cat /etc/release
>                             Solaris Nevada snv_61 X86
>            Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
>                         Use is subject to license terms.
>                              Assembled 26 March 2007
> 
> # mkdir /tmp/test
> # mkfile 64m /tmp/test/0 /tmp/test/1
> # zpool create test raidz /tmp/test/0 /tmp/test/1
> # mkfile 16m /test/file
> # digest -v -a sha1 /test/file
> sha1 (/test/file) = 3b4417fc421cee30a9ad0fd9319220a8dae32da2
> # 
> # zpool export test
> # rm /tmp/test/0
> # zpool import -d /tmp/test test
> # mkfile 64m /tmp/test/0
> # zpool replace test /tmp/test/0
> # digest -v -a sha1 /test/file
> sha1 (/test/file) = 3b4417fc421cee30a9ad0fd9319220a8dae32da2
> # zpool status test
>   pool: test
>  state: ONLINE
>  scrub: resilver completed with 0 errors on Wed Apr 11 15:19:15 2007
> config:
> 
>         NAME             STATE     READ WRITE CKSUM
>         test             ONLINE       0     0     0
>           raidz1         ONLINE       0     0     0
>             /tmp/test/0  ONLINE       0     0     0
>             /tmp/test/1  ONLINE       0     0     0
> 
> errors: No known data errors
> # zpool scrub test
> #
> # zpool status test
>   pool: test
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are
unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using ''zpool clear'' or replace the device with
''zpool replace''.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: scrub completed with 0 errors on Wed Apr 11 15:22:30 2007
> config:
> 
>         NAME             STATE     READ WRITE CKSUM
>         test             ONLINE       0     0     0
>           raidz1         ONLINE       0     0     0
>             /tmp/test/0  ONLINE       0     0    17
>             /tmp/test/1  ONLINE       0     0     0
> 
> errors: No known data errors
> 
> I don''t think these checksum errors are a good sign. 
> The sha1 digest on the file *does* show to be the same so the question
arises:
> is the resilver process truly broken (even though in this test-case the
test
> file does appear to unchanged based on the sha1 digest) ?
> 
> Marco
>

Chris Csanady

2007-Apr-11 19:23 UTC

head link

[zfs-discuss] RAID-Z resilver broken

On 4/11/07, Marco van Lienen <marco+zfs-discuss at lordsith.net>
wrote:>
> A colleague at work and I have followed the same steps, included
> running a digest on the /test/file, on a SXCE:61 build today and
> can confirm the exact same, and disturbing?, result.  My colleague
> mentioned to me he has witnessed the same ''resilver''
behavior on
> builds 57 and 60.
Thank you for taking the time to confirm this.  Just as long as people
are aware of it, it shouldn''t really cause much trouble.  Still, it
gave me quite a scare after replacing a bad disk.
> I don''t think these checksum errors are a good sign.
> The sha1 digest on the file *does* show to be the same so the
> question arises: is the resilver process truly broken (even though
> in this test-case the test file does appear to unchanged based on
> the sha1 digest) ?
ZFS still has good data, so this is not unexpected.  It is interesting
though that it managed to read all of the data without finding any bad
blocks.  I just tried this with a more complex directory structure,
and other variations, with the same result.  It is bizarre, but ZFS
only manages to use the good data in normal operation.

To see exactly what is damaged though, try the following instead.
After the resilver completes, zpool offline a known good device of the
RAID-Z.  Then, do a scrub or try to read the data.  Afterward, zpool
status -v will display a list of the damaged files, which is very
nice.

Chris

zfs discuss - Apr 2007 - RAID-Z resilver broken

[zfs-discuss] RAID-Z resilver broken

[zfs-discuss] RAID-Z resilver broken

[zfs-discuss] RAID-Z resilver broken

[zfs-discuss] RAID-Z resilver broken