Emmanuel
2008-Oct-05 12:25 UTC
[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)
Hi I am looking for guidance on the following zfs setup and error: - opensolaris 2008.05 running as guest in vmware server - ubuntu host - system has run flawlessly as an NFS file server for some months now. Single zpool (called ''tank''), 2 vdevs each as raid-Z, about 10 filesystems (one of them called ''mail'') - after a power surge causing a reboot, opensolaris became unable to mount the pool Using the opensolaris cd as a rescue disk, I discovered a permanent error (ZFS-8000-8A) quoting "tank/mail:<0x0>" as the location of the error (that is the name of the filesystem itself and not a specific file). The FS contains maidirs archives, probably of the order of 10,000 files. The pool comes out clean of a scrub. Googling, I tried to unmount / mount to possibly replay the log (ZIL) in case the transactions didn''t play through entirely. Same negative result. Reading http://docs.sun.com/app/docs/doc/819-5461/gbbwl?a=view, there is a case mentioning "monkey/dnode:<0x0>" that seems close enough. Is that really the case? If so, how do I ''move'' the data as the solution proposes? As you imagine, I''d like to rescue files so any alternative hint is welcome. Thanks. -- This message posted from opensolaris.org
Emmanuel
2008-Oct-05 12:40 UTC
[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)
Reading through the post the error message didn''t come through properly. It is "tank/mail:0x0" (with lesser than and greater than on either sides of the 0''s). Also, the 4 disks (2 vdevs x 2 for raid-z) are physical sata disks dedicated to the vmware image. Thanks. -- This message posted from opensolaris.org
I posted the article below in October and I have been waiting for 2008.11 hoping that the update would magically sort out my problem (basically, after a power cut, my pool imports but one of the datasets doesn''t - the other datasets as well as their contents are visible and seem fully functional). I went through a few commands (output attached) showing the import and zdb -d outputs for increasing levels of verbosity. At -ddddd, zdb core dumps. zdb follows 7 levels of indirections and breaks at L2. A zdb -R on that block segfaults zdb. Any advice or anything you guys see worth trying. The system is a virtualbox 2.0.6 guest (2GB allocated, 4 physical drives passed-through) on an up-to-date Ubuntu Hardy. -- This message posted from opensolaris.org -------------- next part -------------- ~$ pfexec zpool import pool: tank id: 10939520087096106673 state: ONLINE status: The pool is formatted using an older on-disk version. action: The pool can be imported using its name or numeric identifier, though some features will not be available without an explicit ''zpool upgrade''. config: tank ONLINE raidz1 ONLINE c5t0d0 ONLINE c5t1d0 ONLINE raidz1 ONLINE c5t2d0 ONLINE c5t3d0 ONLINE ~$ pfexec zpool import tank cannot mount ''tank/mail'': I/O error ~$ pfexec zdb tank version=10 name=''tank'' state=0 txg=7698185 pool_guid=10939520087096106673 hostid=724374 hostname=''moscow'' vdev_tree type=''root'' id=0 guid=10939520087096106673 children[0] type=''raidz'' id=0 guid=17648667281479346738 nparity=1 metaslab_array=13 metaslab_shift=32 ashift=9 asize=640114229248 is_log=0 children[0] type=''disk'' id=0 guid=5902022595400705343 path=''/dev/dsk/c5t0d0s0'' devid=''id1,sd at SATA_____VBOX_HARDDISK____VBd53bb1af-9f7400db/a'' phys_path=''/pci at 0,0/pci8086,2829 at d/disk at 0,0:a'' whole_disk=1 DTL=84 children[1] type=''disk'' id=1 guid=8827036041867308956 path=''/dev/dsk/c5t1d0s0'' devid=''id1,sd at SATA_____VBOX_HARDDISK____VBa5ee1c45-b2bbcaa3/a'' phys_path=''/pci at 0,0/pci8086,2829 at d/disk at 1,0:a'' whole_disk=1 DTL=83 children[1] type=''raidz'' id=1 guid=1724435683388879308 nparity=1 metaslab_array=218 metaslab_shift=33 ashift=9 asize=1500286287872 is_log=0 children[0] type=''disk'' id=0 guid=15007089885865328028 path=''/dev/dsk/c5t2d0s0'' devid=''id1,sd at SATA_____VBOX_HARDDISK____VBcea797d3-e6ef5750/a'' phys_path=''/pci at 0,0/pci8086,2829 at d/disk at 2,0:a'' whole_disk=1 DTL=221 children[1] type=''disk'' id=1 guid=9332007382569190498 path=''/dev/dsk/c5t3d0s0'' devid=''id1,sd at SATA_____VBOX_HARDDISK____VB7b6c68bc-7658138b/a'' phys_path=''/pci at 0,0/pci8086,2829 at d/disk at 3,0:a'' whole_disk=1 DTL=220 Uberblock magic = 0000000000bab10c version = 10 txg = 7698185 guid_sum = 14040546736538210696 timestamp = 1229725106 UTC = Sat Dec 20 09:18:26 2008 Dataset mos [META], ID 0, cr_txg 4, 21.5M, 228 objects Dataset tank/mail [ZPL], ID 38, cr_txg 35, 4.05G, 60849 objects Dataset tank/media [ZPL], ID 26, cr_txg 31, 164G, 21230 objects ^C :~$ pfexec zdb -ddd tank/mail tank Dataset tank/mail [ZPL], ID 38, cr_txg 35, 4.05G, 60849 objects ZIL header: claim_txg 7669623, seq 0 Object lvl iblk dblk lsize asize type 0 7 16K 16K 30.7M 17.5M DMU dnode ~$ pfexec zdb -ddddd tank/mail Dataset tank/mail [ZPL], ID 38, cr_txg 35, 4.05G, 60849 objects, rootbp [L0 DMU objset] 400L/200P DVA[0]=<1:400227c00:400> DVA[1]=<0:6000007400:400> fletcher4 lzjb LE contiguous birth=7669623 fill=60849 cksum=ec63b7b86:5ce7635a8d1:12b737a1b1974:293277eb03ab04 ZIL header: claim_txg 7669623, seq 0 first block: [L0 ZIL intent log] 1000L/1000P DVA[0]=<1:40732e000:2000> zilog uncompressed LE contiguous birth=7669622 fill=0 cksum=5b576a84665b3619:406081a28d9ebd5c:26:ca Block seqno 202, already claimed, [L0 ZIL intent log] 1000L/1000P DVA[0]=<1:40732e000:2000> zilog uncompressed LE contiguous birth=7669622 fill=0 cksum=5b576a84665b3619:406081a28d9ebd5c:26:ca Object lvl iblk dblk lsize asize type 0 7 16K 16K 30.7M 17.5M DMU dnode Indirect blocks: 0 L6 1:407378000:800 4000L/400P F=60848 B=7669622 0 L5 1:407372800:800 4000L/400P F=60848 B=7669622 0 L4 1:407372000:800 4000L/400P F=60848 B=7669622 0 L3 1:40736f800:800 4000L/400P F=60848 B=7669622 Error 50 reading <38, 0, 2, 0>: 1:402979000:1000 4000L/800P F=60848 B=7669622 Assertion failed: object_count == usedobjs (0x1 == 0xedb1), file ../zdb.c, line 1214 Abort (core dumped) ~$ pfexec zdb -R tank:1:402979000:1000 Found vdev type: raidz Segmentation Fault (core dumped)
Akhilesh Mritunjai
2008-Dec-25 22:53 UTC
[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)
Can''t help with recovering your data but can shed some light on how this may have happened, its in another old thread. This problem may happen if ZFS Thought that the data has been written but its not! It can happen in virtual machine environment as VM has to go through host OS buffers which may not have been flushed. So the guest may issue a write which may not be on disk or be ordered incorrectly due to going through another OS buffers. Be careful next time. -- This message posted from opensolaris.org
Emmanuel
2008-Dec-27 09:03 UTC
[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)
Thanks. From what I read on the forum, it also seems to be a problem with physical installs where a drive hastily reports a cache flushed to disk to improve benchmarks. Following advice from Sun, I lodged a bug report because of the core dumps on failed assertion (#5949). Hopefully a zdb bug fix will flow into a zfs bug fix to have a successful import. -- This message posted from opensolaris.org