thr3ads.net - zfs discuss - [zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05) [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Emmanuel

2008-Oct-05 12:25 UTC

[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)

Hi

I am looking for guidance on the following zfs setup and error:
- opensolaris 2008.05 running as guest in vmware server - ubuntu host
- system has run flawlessly as an NFS file server for some months now. Single
zpool (called ''tank''), 2 vdevs each as raid-Z, about 10
filesystems (one of them called ''mail'')
- after a power surge causing a reboot, opensolaris became unable to mount the
pool

Using the opensolaris cd as a rescue disk, I discovered a permanent error
(ZFS-8000-8A) quoting "tank/mail:<0x0>" as the location of the
error (that is the name of the filesystem itself and not a specific file). The
FS contains maidirs archives, probably of the order of 10,000 files.

The pool comes out clean of a scrub.

Googling, I tried to unmount / mount to possibly replay the log (ZIL) in case
the transactions didn''t play through entirely. Same negative result.

Reading http://docs.sun.com/app/docs/doc/819-5461/gbbwl?a=view, there is a case
mentioning "monkey/dnode:<0x0>" that seems close enough. Is that
really the case? If so, how do I ''move'' the data as the
solution proposes?

As you imagine, I''d like to rescue files so any alternative hint is
welcome.

Thanks.
--
This message posted from opensolaris.org

Emmanuel

2008-Oct-05 12:40 UTC

head link

[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)

Reading through the post the error message didn''t come through
properly. It is "tank/mail:0x0" (with lesser than and greater than on
either sides of the 0''s).
Also, the 4 disks (2 vdevs x 2 for raid-z) are physical sata disks dedicated to
the vmware image.

Thanks.
--
This message posted from opensolaris.org

Emmanuel

2008-Dec-21 05:34 UTC

head link

[zfs-discuss] damaged dataset + zdb coredumps

I posted the article below in October and I have been waiting for 2008.11 hoping
that the update would magically sort out my problem (basically, after a power
cut, my pool imports but one of the datasets doesn''t - the other
datasets as well as their contents are visible and seem fully functional).

I went through a few commands (output attached) showing the import and zdb -d
outputs for increasing levels of verbosity. At -ddddd, zdb core dumps. zdb
follows 7 levels of indirections and breaks at L2. A zdb -R on that block
segfaults zdb.

Any advice or anything you guys see worth trying.

The system is a virtualbox 2.0.6 guest (2GB allocated, 4 physical drives
passed-through) on an up-to-date Ubuntu Hardy.
-- 
This message posted from opensolaris.org
-------------- next part --------------
~$ pfexec zpool import

  pool: tank
    id: 10939520087096106673
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
	some features will not be available without an explicit ''zpool
upgrade''.
config:

	tank        ONLINE
	  raidz1    ONLINE
	    c5t0d0  ONLINE
	    c5t1d0  ONLINE
	  raidz1    ONLINE
	    c5t2d0  ONLINE
	    c5t3d0  ONLINE

~$ pfexec zpool import tank
cannot mount ''tank/mail'': I/O error

~$ pfexec zdb tank
    version=10
    name=''tank''
    state=0
    txg=7698185
    pool_guid=10939520087096106673
    hostid=724374
    hostname=''moscow''
    vdev_tree
        type=''root''
        id=0
        guid=10939520087096106673
        children[0]
                type=''raidz''
                id=0
                guid=17648667281479346738
                nparity=1
                metaslab_array=13
                metaslab_shift=32
                ashift=9
                asize=640114229248
                is_log=0
                children[0]
                        type=''disk''
                        id=0
                        guid=5902022595400705343
                        path=''/dev/dsk/c5t0d0s0''
                        devid=''id1,sd at
SATA_____VBOX_HARDDISK____VBd53bb1af-9f7400db/a''
                        phys_path=''/pci at 0,0/pci8086,2829 at d/disk
at 0,0:a''
                        whole_disk=1
                        DTL=84
                children[1]
                        type=''disk''
                        id=1
                        guid=8827036041867308956
                        path=''/dev/dsk/c5t1d0s0''
                        devid=''id1,sd at
SATA_____VBOX_HARDDISK____VBa5ee1c45-b2bbcaa3/a''
                        phys_path=''/pci at 0,0/pci8086,2829 at d/disk
at 1,0:a''
                        whole_disk=1
                        DTL=83
        children[1]
                type=''raidz''
                id=1
                guid=1724435683388879308
                nparity=1
                metaslab_array=218
                metaslab_shift=33
                ashift=9
                asize=1500286287872
                is_log=0
                children[0]
                        type=''disk''
                        id=0
                        guid=15007089885865328028
                        path=''/dev/dsk/c5t2d0s0''
                        devid=''id1,sd at
SATA_____VBOX_HARDDISK____VBcea797d3-e6ef5750/a''
                        phys_path=''/pci at 0,0/pci8086,2829 at d/disk
at 2,0:a''
                        whole_disk=1
                        DTL=221
                children[1]
                        type=''disk''
                        id=1
                        guid=9332007382569190498
                        path=''/dev/dsk/c5t3d0s0''
                        devid=''id1,sd at
SATA_____VBOX_HARDDISK____VB7b6c68bc-7658138b/a''
                        phys_path=''/pci at 0,0/pci8086,2829 at d/disk
at 3,0:a''
                        whole_disk=1
                        DTL=220
Uberblock

	magic = 0000000000bab10c
	version = 10
	txg = 7698185
	guid_sum = 14040546736538210696
	timestamp = 1229725106 UTC = Sat Dec 20 09:18:26 2008

Dataset mos [META], ID 0, cr_txg 4, 21.5M, 228 objects
Dataset tank/mail [ZPL], ID 38, cr_txg 35, 4.05G, 60849 objects
Dataset tank/media [ZPL], ID 26, cr_txg 31, 164G, 21230 objects
^C

:~$ pfexec zdb -ddd tank/mail tank
Dataset tank/mail [ZPL], ID 38, cr_txg 35, 4.05G, 60849 objects

    ZIL header: claim_txg 7669623, seq 0


    Object  lvl   iblk   dblk  lsize  asize  type
         0    7    16K    16K  30.7M  17.5M  DMU dnode


~$ pfexec zdb -ddddd tank/mail
Dataset tank/mail [ZPL], ID 38, cr_txg 35, 4.05G, 60849 objects, rootbp [L0 DMU
objset] 400L/200P DVA[0]=<1:400227c00:400> DVA[1]=<0:6000007400:400>
fletcher4 lzjb LE contiguous birth=7669623 fill=60849
cksum=ec63b7b86:5ce7635a8d1:12b737a1b1974:293277eb03ab04

    ZIL header: claim_txg 7669623, seq 0

	first block: [L0 ZIL intent log] 1000L/1000P DVA[0]=<1:40732e000:2000>
zilog uncompressed LE contiguous birth=7669622 fill=0
cksum=5b576a84665b3619:406081a28d9ebd5c:26:ca

	Block seqno 202, already claimed, [L0 ZIL intent log] 1000L/1000P
DVA[0]=<1:40732e000:2000> zilog uncompressed LE contiguous birth=7669622
fill=0 cksum=5b576a84665b3619:406081a28d9ebd5c:26:ca


    Object  lvl   iblk   dblk  lsize  asize  type
         0    7    16K    16K  30.7M  17.5M  DMU dnode
Indirect blocks:
               0 L6       1:407378000:800 4000L/400P F=60848 B=7669622
               0  L5      1:407372800:800 4000L/400P F=60848 B=7669622
               0   L4     1:407372000:800 4000L/400P F=60848 B=7669622
               0    L3    1:40736f800:800 4000L/400P F=60848 B=7669622
Error 50 reading <38, 0, 2, 0>: 1:402979000:1000 4000L/800P F=60848
B=7669622

Assertion failed: object_count == usedobjs (0x1 == 0xedb1), file ../zdb.c, line
1214
Abort (core dumped)


~$ pfexec zdb -R tank:1:402979000:1000
Found vdev type: raidz
Segmentation Fault (core dumped)

Akhilesh Mritunjai

2008-Dec-25 22:53 UTC

head link

[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)

Can''t help with recovering your data but can shed some light on how
this may have happened, its in another old thread.

This problem may happen if ZFS Thought that the data has been written but its
not! It can happen in virtual machine environment as VM has to go through host
OS buffers which may not have been flushed. So the guest may issue a write which
may not be on disk or be ordered incorrectly due to going through another OS
buffers.

Be careful next time.
-- 
This message posted from opensolaris.org

Emmanuel

2008-Dec-27 09:03 UTC

head link

[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)

Thanks. From what I read on the forum, it also seems to be a problem with
physical installs where a drive hastily reports a cache flushed to disk to
improve benchmarks.
Following advice from Sun, I lodged a bug report because of the core dumps on
failed assertion (#5949).
Hopefully a zdb bug fix will flow into a zfs bug fix to have a successful
import.
-- 
This message posted from opensolaris.org

zfs discuss - Oct 2008 - Permanent errors on filesystem (opensolaris 2008.05)

[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)

[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)

[zfs-discuss] damaged dataset + zdb coredumps

[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)

[zfs-discuss] Permanent errors on filesystem (opensolaris 2008.05)