Kevin
2007-Jul-26 04:44 UTC
[zfs-discuss] Mysterious corruption with raidz2 vdev (1 checksum err on disk, 2 on vdev?)
After a scrub of a pool with 3 raidz2 vdevs (each with 5 disks in them) I see the following status output. Notice that the raidz2 vdev has 2 checksum errors, but only one disk inside the raidz2 vdev has a checksum error. How is this possible? I thought that you would have to have 3 errors in the same ''stripe'' within a raidz2 vdev in order for the error to become unrecoverable. And I have not reset any errors with zpool clear ... Comments will be appreciated. Thanks. $ zpool status -v pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed with 1 errors on Mon Jul 23 19:59:07 2007 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 2 raidz2 ONLINE 0 0 2 c2t0d0 ONLINE 0 0 1 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 1 c2t13d0 ONLINE 0 0 0 c2t14d0 ONLINE 0 0 0 spares c2t15d0 AVAIL errors: The following persistent errors have been detected: DATASET OBJECT RANGE 5 5fe9784 lvl=0 blkid=40299 This message posted from opensolaris.org
Kevin
2007-Jul-26 04:50 UTC
[zfs-discuss] Mysterious corruption with raidz2 vdev (1 checksum err on disk, 2 on vd
Here''s some additional output from the zpool and zfs tools: $ zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 10.2T 8.58T 1.64T 83% ONLINE - $ zfs list NAME USED AVAIL REFER MOUNTPOINT tank 5.11T 901G 5.11T /tank Record size is 128K, checksums are on, compression is off, atime is off. This is the only zpool/filesystem in the system. Thanks. This message posted from opensolaris.org
Matthew Ahrens
2007-Jul-27 23:29 UTC
[zfs-discuss] Mysterious corruption with raidz2 vdev (1 checksum err on disk, 2 on vdev?)
Kevin wrote:> After a scrub of a pool with 3 raidz2 vdevs (each with 5 disks in them) I see the following status output. Notice that the raidz2 vdev has 2 checksum errors, but only one disk inside the raidz2 vdev has a checksum error. How is this possible? I thought that you would have to have 3 errors in the same ''stripe'' within a raidz2 vdev in order for the error to become unrecoverable.A checksum error on a disk indicates that we know for sure that this disk gave us wrong data. With raidz[2], if we are unable to reconstruct the block successfully but no disk admitted that it failed, then we have no way of knowing which disk(s) are actually incorrect. So the errors on the raidz2 vdev indeed indicate that at least 3 disks below it gave the wrong data for a those 2 blocks; we just couldn''t tell which 3+ disks they were. It''s as if I know that A+B==3, but A is 1 and B is 3. I can''t tell if A is wrong or B is wrong (or both!). The checksum errors on the cXtXdX vdevs didn''t result in data loss, because we reconstructed the data from the other disks in the raidz group. --matt
Marc Bevand
2007-Jul-28 03:34 UTC
[zfs-discuss] Mysterious corruption with raidz2 vdev (1 checksum err on disk, 2 on vdev?)
Matthew Ahrens <Matthew.Ahrens <at> sun.com> writes:> > So the errors on the raidz2 vdev indeed indicate that at least 3 disks below > it gave the wrong data for a those 2 blocks; we just couldn''t tell which 3+ > disks they were.Something must be seriously wrong with this server. This is the first time I see an uncorrectable checksum error in a raidz2 vdev. I would suggest Kevin to run memtest86 or similar. It is more likely bad data has been written on the disks in the first place (due to flaky RAM/CPU/mobo/cables) rather than 3+ disks corrupting data in the same stripe ! -marc
Tomas Ă–gren
2007-Jul-29 22:15 UTC
[zfs-discuss] Mysterious corruption with raidz2 vdev (1 checksum err on disk, 2 on vdev?)
On 28 July, 2007 - Marc Bevand sent me these 0,7K bytes:> Matthew Ahrens <Matthew.Ahrens <at> sun.com> writes: > > > > So the errors on the raidz2 vdev indeed indicate that at least 3 disks below > > it gave the wrong data for a those 2 blocks; we just couldn''t tell which 3+ > > disks they were. > > Something must be seriously wrong with this server. This is the first time I > see an uncorrectable checksum error in a raidz2 vdev. I would suggest Kevin to > run memtest86 or similar. It is more likely bad data has been written on the > disks in the first place (due to flaky RAM/CPU/mobo/cables) rather than 3+ > disks corrupting data in the same stripe !They are all connected to the same controller.. which might have had a bad day.. but memory corruption sounds like a plausible problem too.. My workstation suddenly started having trouble compiling hello world.. memtest to the rescue, the next day I found 340 errors.. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
We''ll try running all of the diagnostic tests to rule out any other issues. But my question is, wouldn''t I need to see at least 3 checksum errors on the individual devices in order for there to be a visible error in the top level vdev? There doesn''t appear to be enough raw checksum errors on the disks for there to have been 3 errors in the same vdev block. Or am I not understanding the checksum count correctly? This message posted from opensolaris.org
Kevin wrote:> We''ll try running all of the diagnostic tests to rule out any other issues.Does the server have ECC memory? Many x86 systems do not :-(> But my question is, wouldn''t I need to see at least 3 checksum errors on the individual devices in order for there to be a visible error in the top level vdev? There doesn''t appear to be enough raw checksum errors on the disks for there to have been 3 errors in the same vdev block. Or am I not understanding the checksum count correctly?You are assuming the corruption occured at the disk. That does not seem to be the case. ZFS checksums provide end-to-end data integrity. It will detect hard or transient errors along the entire data path. -- richard
I suspect this is a bug in raidz error reporting. With a mirror, each copy either checksums correctly or it doesn''t, so we know which drives gave us bad data. With RAID-Z, we have to infer which drives have damage. If the number of drives returning bad data is less than or equal to the number of parity drives, we can both detect and correct the error. But if, say, three drives in a RAID-Z2 stripe return corrupt data, we have no way to know which drives are at fault -- there''s just not enough information, and I mean that in the mathematical sense (fewer equations than unknowns). That said, we should enhance ''zpool status'' to indicate the number of detected-but-undiagnosable errors on each RAID-Z vdev. Jeff Kevin wrote:> We''ll try running all of the diagnostic tests to rule out any other issues. > > But my question is, wouldn''t I need to see at least 3 checksum errors on the individual devices in order for there to be a visible error in the top level vdev? There doesn''t appear to be enough raw checksum errors on the disks for there to have been 3 errors in the same vdev block. Or am I not understanding the checksum count correctly? > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss