Hello all,
After playing around a bit with the disks (powering down, pulling one
disk out, powering down putting the disk back in and pulling out
another one, repeat) zpool status reports permanent data corruption:
# uname -a
SunOS bhelliom 5.11 snv_55b i86pc i386 i86pc
# zpool status -v
pool: famine
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
famine ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2d0 ONLINE 0 0 0
c2d1 ONLINE 0 0 0
c3d0 ONLINE 0 0 0
c4d0 ONLINE 0 0 0
c4d1 ONLINE 0 0 0
c5d0 ONLINE 0 0 0
errors: The following persistent errors have been detected:
DATASET OBJECT RANGE
6d 0 lvl=4 blkid=0
73 0 lvl=0 blkid=0
10b1 0 lvl=6 blkid=0
The corruption is somewhat understandable. It''s my home fileserver and
I do the most horrible things to it now and then just to find out what
happens. The point of this exercise was to go through the disks, label
them, and locate c2d1 since it had been experiences lockups that
required a cold reset to get the disk online again, and I was to lazy
to do it without fully starting the OS and thus mounting the raidz
each time. During one of the restarts both the disk I pulled out and
c2d1 went missing while starting the filesystem.
According to the zdb dump, object 0 seems to be the DMU node on each
file system. My understanding of this part of ZFS is very shallow, but
why does it allow the filesystems to be mounted rw with damaged DMU
nodes, doesn''t that result in a risk of more permanent damage to the
structure of those filesystems? Or are there redundant DMU nodes it''s
now using, and in that case, why doesn''t it automatically fix the
damaged ones?
I''m currently doing a complete scrub, but according to zpool status
latest estimate it will be 63h before I know how that went...
--
Peter Bortas
Peter Bortas wrote:> According to the zdb dump, object 0 seems to be the DMU node on each > file system. My understanding of this part of ZFS is very shallow, but > why does it allow the filesystems to be mounted rw with damaged DMU > nodes, doesn''t that result in a risk of more permanent damage to the > structure of those filesystems? Or are there redundant DMU nodes it''s > now using, and in that case, why doesn''t it automatically fix the > damaged ones?Object 0 is basically the object that describes the other objects. So the end result will be that some range of (up to 32) objects in each of those filesystems will be inaccessible. There is no risk of additional damage by running in read/write mode, because ZFS is always able to detect what data is good and what is bad by using checksums. That said, blkid 0 of object 0 always happens to contain some critical objects (the ZPL "master node" and root directory). So if you are able to mount these filesystems at all, then it probably means that ZFS was able to find another redundant copy, or the failure was actually transient. (Eg, because one disk was temporarily offline, and some pieces of another disk are damaged, so raidz1 couldn''t reconstruct.) FYI, in a later build, ''zpool status -v'' actually tells you the names of the damaged filesystem & files, so you don''t have to muck around with zdb. --matt
On 6/30/07, Matthew Ahrens <Matthew.Ahrens at sun.com> wrote:> Peter Bortas wrote: > > According to the zdb dump, object 0 seems to be the DMU node on each > > file system. My understanding of this part of ZFS is very shallow, but > > why does it allow the filesystems to be mounted rw with damaged DMU > > nodes, doesn''t that result in a risk of more permanent damage to the > > structure of those filesystems? Or are there redundant DMU nodes it''s > > now using, and in that case, why doesn''t it automatically fix the > > damaged ones? > > Object 0 is basically the object that describes the other objects. So the > end result will be that some range of (up to 32) objects in each of those > filesystems will be inaccessible. There is no risk of additional damage by > running in read/write mode, because ZFS is always able to detect what data is > good and what is bad by using checksums. > > That said, blkid 0 of object 0 always happens to contain some critical > objects (the ZPL "master node" and root directory). So if you are able to > mount these filesystems at all, then it probably means that ZFS was able to > find another redundant copy, or the failure was actually transient. (Eg, > because one disk was temporarily offline, and some pieces of another disk are > damaged, so raidz1 couldn''t reconstruct.)The question is why it didn''t clear those errors when resilvering if it found redundant copies? Before the resilvering there where actually four of those errors. This one: 37 0 lvl=2 blkid=0 was removed by resilvering.> FYI, in a later build, ''zpool status -v'' actually tells you the names of the > damaged filesystem & files, so you don''t have to muck around with zdb.Yes, that is a feature that has been tempting me to upgrade for a while. Unfortunately I won''t have time to do it this weekend. -- Peter Bortas
On 6/30/07, Peter Bortas <bortas at gmail.com> wrote:> I''m currently doing a complete scrub, but according to zpool status > latest estimate it will be 63h before I know how that went...The scrub has now completed with 0 errors and the there are no longer any corruption errors reported. -- Peter Bortas