Hello all, After playing around a bit with the disks (powering down, pulling one disk out, powering down putting the disk back in and pulling out another one, repeat) zpool status reports permanent data corruption: # uname -a SunOS bhelliom 5.11 snv_55b i86pc i386 i86pc # zpool status -v pool: famine state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM famine ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2d0 ONLINE 0 0 0 c2d1 ONLINE 0 0 0 c3d0 ONLINE 0 0 0 c4d0 ONLINE 0 0 0 c4d1 ONLINE 0 0 0 c5d0 ONLINE 0 0 0 errors: The following persistent errors have been detected: DATASET OBJECT RANGE 6d 0 lvl=4 blkid=0 73 0 lvl=0 blkid=0 10b1 0 lvl=6 blkid=0 The corruption is somewhat understandable. It''s my home fileserver and I do the most horrible things to it now and then just to find out what happens. The point of this exercise was to go through the disks, label them, and locate c2d1 since it had been experiences lockups that required a cold reset to get the disk online again, and I was to lazy to do it without fully starting the OS and thus mounting the raidz each time. During one of the restarts both the disk I pulled out and c2d1 went missing while starting the filesystem. According to the zdb dump, object 0 seems to be the DMU node on each file system. My understanding of this part of ZFS is very shallow, but why does it allow the filesystems to be mounted rw with damaged DMU nodes, doesn''t that result in a risk of more permanent damage to the structure of those filesystems? Or are there redundant DMU nodes it''s now using, and in that case, why doesn''t it automatically fix the damaged ones? I''m currently doing a complete scrub, but according to zpool status latest estimate it will be 63h before I know how that went... -- Peter Bortas
Peter Bortas wrote:> According to the zdb dump, object 0 seems to be the DMU node on each > file system. My understanding of this part of ZFS is very shallow, but > why does it allow the filesystems to be mounted rw with damaged DMU > nodes, doesn''t that result in a risk of more permanent damage to the > structure of those filesystems? Or are there redundant DMU nodes it''s > now using, and in that case, why doesn''t it automatically fix the > damaged ones?Object 0 is basically the object that describes the other objects. So the end result will be that some range of (up to 32) objects in each of those filesystems will be inaccessible. There is no risk of additional damage by running in read/write mode, because ZFS is always able to detect what data is good and what is bad by using checksums. That said, blkid 0 of object 0 always happens to contain some critical objects (the ZPL "master node" and root directory). So if you are able to mount these filesystems at all, then it probably means that ZFS was able to find another redundant copy, or the failure was actually transient. (Eg, because one disk was temporarily offline, and some pieces of another disk are damaged, so raidz1 couldn''t reconstruct.) FYI, in a later build, ''zpool status -v'' actually tells you the names of the damaged filesystem & files, so you don''t have to muck around with zdb. --matt
On 6/30/07, Matthew Ahrens <Matthew.Ahrens at sun.com> wrote:> Peter Bortas wrote: > > According to the zdb dump, object 0 seems to be the DMU node on each > > file system. My understanding of this part of ZFS is very shallow, but > > why does it allow the filesystems to be mounted rw with damaged DMU > > nodes, doesn''t that result in a risk of more permanent damage to the > > structure of those filesystems? Or are there redundant DMU nodes it''s > > now using, and in that case, why doesn''t it automatically fix the > > damaged ones? > > Object 0 is basically the object that describes the other objects. So the > end result will be that some range of (up to 32) objects in each of those > filesystems will be inaccessible. There is no risk of additional damage by > running in read/write mode, because ZFS is always able to detect what data is > good and what is bad by using checksums. > > That said, blkid 0 of object 0 always happens to contain some critical > objects (the ZPL "master node" and root directory). So if you are able to > mount these filesystems at all, then it probably means that ZFS was able to > find another redundant copy, or the failure was actually transient. (Eg, > because one disk was temporarily offline, and some pieces of another disk are > damaged, so raidz1 couldn''t reconstruct.)The question is why it didn''t clear those errors when resilvering if it found redundant copies? Before the resilvering there where actually four of those errors. This one: 37 0 lvl=2 blkid=0 was removed by resilvering.> FYI, in a later build, ''zpool status -v'' actually tells you the names of the > damaged filesystem & files, so you don''t have to muck around with zdb.Yes, that is a feature that has been tempting me to upgrade for a while. Unfortunately I won''t have time to do it this weekend. -- Peter Bortas
On 6/30/07, Peter Bortas <bortas at gmail.com> wrote:> I''m currently doing a complete scrub, but according to zpool status > latest estimate it will be 63h before I know how that went...The scrub has now completed with 0 errors and the there are no longer any corruption errors reported. -- Peter Bortas