In a recent message, I detailed the excessive checksum errors that occurred after replacing a disk. It seems that after a resilver completes, it leaves a large number of blocks in the pool which fail to checksum properly. Afterward, it is necessary to scrub the pool in order to correct these errors. After some testing, it seems that this only occurs with RAID-Z. The same behavior can be observed on both snv_59 and snv_60, though I do not have any other installs to test at the moment. The following commands should reproduce this result in a small test pool. Chris mkdir /tmp/test mkfile 64m /tmp/test/0 /tmp/test/1 zpool create test raidz /tmp/test/0 /tmp/test/1 mkfile 16m /test/file zpool export test rm /tmp/test/0 zpool import -d /tmp/test test mkfile 64m /tmp/test/0 zpool replace test /tmp/test/0 # wait for the resilver to complete, and observe that it completes successfully zpool status test # scrub the pool zpool scrub test # watch the checksum errors accumulate as the scrub progresses zpool status test
On Sat, Apr 07, 2007 at 05:05:18PM -0500, in a galaxy far far away, Chris Csanady said:> In a recent message, I detailed the excessive checksum errors that > occurred after replacing a disk. It seems that after a resilver > completes, it leaves a large number of blocks in the pool which fail > to checksum properly. Afterward, it is necessary to scrub the pool in > order to correct these errors. > > After some testing, it seems that this only occurs with RAID-Z. The > same behavior can be observed on both snv_59 and snv_60, though I do > not have any other installs to test at the moment.A colleague at work and I have followed the same steps, included running a digest on the /test/file, on a SXCE:61 build today and can confirm the exact same, and disturbing?, result. My colleague mentioned to me he has witnessed the same ''resilver'' behavior on builds 57 and 60. The box which these steps were performed on was ''luupgraded'' from SXCE: 60 to 61 using the SUNWlu* packages from 61! # cat /etc/release Solaris Nevada snv_61 X86 Copyright 2007 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 26 March 2007 # mkdir /tmp/test # mkfile 64m /tmp/test/0 /tmp/test/1 # zpool create test raidz /tmp/test/0 /tmp/test/1 # mkfile 16m /test/file # digest -v -a sha1 /test/file sha1 (/test/file) = 3b4417fc421cee30a9ad0fd9319220a8dae32da2 # # zpool export test # rm /tmp/test/0 # zpool import -d /tmp/test test # mkfile 64m /tmp/test/0 # zpool replace test /tmp/test/0 # digest -v -a sha1 /test/file sha1 (/test/file) = 3b4417fc421cee30a9ad0fd9319220a8dae32da2 # zpool status test pool: test state: ONLINE scrub: resilver completed with 0 errors on Wed Apr 11 15:19:15 2007 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz1 ONLINE 0 0 0 /tmp/test/0 ONLINE 0 0 0 /tmp/test/1 ONLINE 0 0 0 errors: No known data errors # zpool scrub test # # zpool status test pool: test state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed with 0 errors on Wed Apr 11 15:22:30 2007 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz1 ONLINE 0 0 0 /tmp/test/0 ONLINE 0 0 17 /tmp/test/1 ONLINE 0 0 0 errors: No known data errors I don''t think these checksum errors are a good sign. The sha1 digest on the file *does* show to be the same so the question arises: is the resilver process truly broken (even though in this test-case the test file does appear to unchanged based on the sha1 digest) ? Marco -- # make mistake make: don''t know how to make mistake. Stop
ugh, thanks for exploring this and isolating the problem. We will look into what is going on (wrong) here. I have filed bug: 6545015 RAID-Z resilver broken to track this problem. -Mark Marco van Lienen wrote:> On Sat, Apr 07, 2007 at 05:05:18PM -0500, in a galaxy far far away, Chris Csanady said: >> In a recent message, I detailed the excessive checksum errors that >> occurred after replacing a disk. It seems that after a resilver >> completes, it leaves a large number of blocks in the pool which fail >> to checksum properly. Afterward, it is necessary to scrub the pool in >> order to correct these errors. >> >> After some testing, it seems that this only occurs with RAID-Z. The >> same behavior can be observed on both snv_59 and snv_60, though I do >> not have any other installs to test at the moment. > > A colleague at work and I have followed the same steps, included running a digest on the /test/file, on a SXCE:61 build today and can confirm the exact same, > and disturbing?, result. > My colleague mentioned to me he has witnessed the same ''resilver'' behavior on > builds 57 and 60. > > The box which these steps were performed on was ''luupgraded'' from SXCE: 60 to 61 using the SUNWlu* packages from > 61! > > # cat /etc/release > Solaris Nevada snv_61 X86 > Copyright 2007 Sun Microsystems, Inc. All Rights Reserved. > Use is subject to license terms. > Assembled 26 March 2007 > > # mkdir /tmp/test > # mkfile 64m /tmp/test/0 /tmp/test/1 > # zpool create test raidz /tmp/test/0 /tmp/test/1 > # mkfile 16m /test/file > # digest -v -a sha1 /test/file > sha1 (/test/file) = 3b4417fc421cee30a9ad0fd9319220a8dae32da2 > # > # zpool export test > # rm /tmp/test/0 > # zpool import -d /tmp/test test > # mkfile 64m /tmp/test/0 > # zpool replace test /tmp/test/0 > # digest -v -a sha1 /test/file > sha1 (/test/file) = 3b4417fc421cee30a9ad0fd9319220a8dae32da2 > # zpool status test > pool: test > state: ONLINE > scrub: resilver completed with 0 errors on Wed Apr 11 15:19:15 2007 > config: > > NAME STATE READ WRITE CKSUM > test ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > /tmp/test/0 ONLINE 0 0 0 > /tmp/test/1 ONLINE 0 0 0 > > errors: No known data errors > # zpool scrub test > # > # zpool status test > pool: test > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using ''zpool clear'' or replace the device with ''zpool replace''. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: scrub completed with 0 errors on Wed Apr 11 15:22:30 2007 > config: > > NAME STATE READ WRITE CKSUM > test ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > /tmp/test/0 ONLINE 0 0 17 > /tmp/test/1 ONLINE 0 0 0 > > errors: No known data errors > > I don''t think these checksum errors are a good sign. > The sha1 digest on the file *does* show to be the same so the question arises: > is the resilver process truly broken (even though in this test-case the test > file does appear to unchanged based on the sha1 digest) ? > > Marco >
On 4/11/07, Marco van Lienen <marco+zfs-discuss at lordsith.net> wrote:> > A colleague at work and I have followed the same steps, included > running a digest on the /test/file, on a SXCE:61 build today and > can confirm the exact same, and disturbing?, result. My colleague > mentioned to me he has witnessed the same ''resilver'' behavior on > builds 57 and 60.Thank you for taking the time to confirm this. Just as long as people are aware of it, it shouldn''t really cause much trouble. Still, it gave me quite a scare after replacing a bad disk.> I don''t think these checksum errors are a good sign. > The sha1 digest on the file *does* show to be the same so the > question arises: is the resilver process truly broken (even though > in this test-case the test file does appear to unchanged based on > the sha1 digest) ?ZFS still has good data, so this is not unexpected. It is interesting though that it managed to read all of the data without finding any bad blocks. I just tried this with a more complex directory structure, and other variations, with the same result. It is bizarre, but ZFS only manages to use the good data in normal operation. To see exactly what is damaged though, try the following instead. After the resilver completes, zpool offline a known good device of the RAID-Z. Then, do a scrub or try to read the data. Afterward, zpool status -v will display a list of the damaged files, which is very nice. Chris