thr3ads.net - zfs discuss - [zfs-discuss] Corrupted pool [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Ricardo Correia

2006-Dec-15 05:23 UTC

[zfs-discuss] Corrupted pool

Hi,

I''ve been using a ZFS pool inside a VMware''d NexentaOS, on a
single real disk
partition, for a few months in order to store some backups.

Today I noticed that there were some directories missing inside 2 separate 
filesystems, which I found strange. I went to the backup logs (also stored 
inside the pool) and it seemed that at least one of the directories 
(/pool/backup/var) went missing yesterday *while* the backup was ongoing 
(inside /pool/backup/var/log).

The backup process is a few simple rsyncs from the VMware host, running Linux, 
to the VMware guest, running Nexenta, followed by snapshot creation. The 
filesystems were *not* NFS mounted - I had the rsync server process running 
on the ZFS box.

I tried to look into the snapshots, but doing a ''zfs set
snapdir=visible
pool/backup'' didn''t make the .zfs dir appear. I did a
''zpool status'' and it
didn''t report any errors or checksum failures whatsoever.

I assumed there was probably corrupted memory in the running kernel instance, 
so I rebooted.

Now I can''t even mount the pool!

root at zfs:~# zpool status -x
  pool: pool
 state: FAULTED
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using ''zpool
online''.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool        UNAVAIL      0     0     0  insufficient replicas
          c2t0d0s1  UNAVAIL      0     0     0  cannot open
root at zfs:~# zdb -l /dev/dsk/c2t0d0s1
--------------------------------------------
LABEL 0
--------------------------------------------
    version=3
    name=''pool''
    state=0
    txg=143567
    pool_guid=3667491715056107646
    top_guid=8396736522625936678
    guid=8396736522625936678
    vdev_tree
        type=''disk''
        id=0
        guid=8396736522625936678
        path=''/dev/dsk/c2t0d0s1''
        devid=''id1,sd at f0000000044b1a92b000b8a720001/b''
        whole_disk=0
        metaslab_array=13
        metaslab_shift=30
        ashift=9
        asize=117896380416
        DTL=22
--------------------------------------------
LABEL 1
--------------------------------------------
    version=3
    name=''pool''
    state=0
    txg=143567
    pool_guid=3667491715056107646
    top_guid=8396736522625936678
    guid=8396736522625936678
    vdev_tree
        type=''disk''
        id=0
        guid=8396736522625936678
        path=''/dev/dsk/c2t0d0s1''
        devid=''id1,sd at f0000000044b1a92b000b8a720001/b''
        whole_disk=0
        metaslab_array=13
        metaslab_shift=30
        ashift=9
        asize=117896380416
        DTL=22
--------------------------------------------
LABEL 2
--------------------------------------------
    version=3
    name=''pool''
    state=0
    txg=143567
    pool_guid=3667491715056107646
    top_guid=8396736522625936678
    guid=8396736522625936678
    vdev_tree
        type=''disk''
        id=0
        guid=8396736522625936678
        path=''/dev/dsk/c2t0d0s1''
        devid=''id1,sd at f0000000044b1a92b000b8a720001/b''
        whole_disk=0
        metaslab_array=13
        metaslab_shift=30
        ashift=9
        asize=117896380416
        DTL=22
--------------------------------------------
LABEL 3
--------------------------------------------
    version=3
    name=''pool''
    state=0
    txg=143567
    pool_guid=3667491715056107646
    top_guid=8396736522625936678
    guid=8396736522625936678
    vdev_tree
        type=''disk''
        id=0
        guid=8396736522625936678
        path=''/dev/dsk/c2t0d0s1''
        devid=''id1,sd at f0000000044b1a92b000b8a720001/b''
        whole_disk=0
        metaslab_array=13
        metaslab_shift=30
        ashift=9
        asize=117896380416
        DTL=22

I don''t know much about Solaris partitions, but here''s how I
did it (I needed
to store swap on this disk):

partition> print
Current partition table (original):
Total disk cylinders available: 14466 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       swap    wm       1 -   131        1.00GB    (131/0/0)     2104515
  1   reserved    wm     132 - 14465      109.80GB    (14334/0/0) 230275710
  2     backup    wu       0 - 14465      110.82GB    (14466/0/0) 232396290
  3 unassigned    wm       0                0         (0/0/0)             0
  4 unassigned    wm       0                0         (0/0/0)             0
  5 unassigned    wm       0                0         (0/0/0)             0
  6 unassigned    wm       0                0         (0/0/0)             0
  7 unassigned    wm       0                0         (0/0/0)             0
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 unassigned    wm       0                0         (0/0/0)             0

However, I found this rather strange:

root at zfs:~# stat -L /dev/dsk/c2t0d0s1
  File: `/dev/dsk/c2t0d0s1''
  Size: 9223372036854775807     Blocks: 0          IO Block: 8192   block 
special file
Device: 4380000h/70778880d      Inode: 26214405    Links: 1     Device type: 
32,1
Access: (0640/brw-r-----)  Uid: (    0/    root)   Gid: (    3/     sys)
Access: 2006-12-15 05:11:56.000000000 +0000
Modify: 2006-12-15 05:11:56.000000000 +0000
Change: 2006-12-15 05:11:56.000000000 +0000

(The -L parameter in the GNU stat is the follow symlink option).

Notice the size!

stat -L /dev/dsk/c2t0d0s0 reports a size of 1077511680.

Interestingly, I never had any problems until now, I even did weekly scrubs 
and it *never* reported any errors or checksum failures.

I''ve tried stopping Nexenta, deleting the disk in VMware, readding it,
booting
and doing ''devfsadm -C''. It didn''t solve anything.

Is there any way to recover the data? I don''t really know how to begin 
diagnosing/solving the problem. I don''t understand why stat is
reporting that
strange number.

Thanks.

Anton B. Rang

2006-Dec-15 16:27 UTC

head link

[zfs-discuss] Re: Corrupted pool

> found this rather strange:
> root at zfs:~# stat -L /dev/dsk/c2t0d0s1
>   File: `/dev/dsk/c2t0d0s1''
> Size: 9223372036854775807     Blocks: 0          IO
> 
> Notice the size!
This is $7FFFFFFFFFFFFFFF, which is MAXOFFSET_T, aka UNKNOWN_SIZE. Not sure
why...a damaged label on this device?

I''m not sure why ZFS says it can''t open the device when it
seems to be able to print its labels.

-- Anton
 
 
This message posted from opensolaris.org

Ricardo Correia

2006-Dec-15 17:40 UTC

head link

[zfs-discuss] Re: Corrupted pool

On Friday 15 December 2006 16:27, Anton B. Rang wrote:> This is $7FFFFFFFFFFFFFFF, which is MAXOFFSET_T, aka UNKNOWN_SIZE. Not sure
> why...a damaged label on this device?
''format'' seems to show the partition table correctly:
root at zfs:~# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0d0 <DEFAULT cyl 713 alt 2 hd 64 sec 32>
          /pci at 0,0/pci-ide at 7,1/ide at 0/cmdk at 0,0
       1. c2t0d0 <DEFAULT cyl 14466 alt 2 hd 255 sec 63>
          /pci at 0,0/pci1000,30 at 10/sd at 0,0
Specify disk (enter its number): 1
selecting c2t0d0
[disk formatted]
/dev/dsk/c2t0d0s0 is currently used by swap. Please see swap(1M).
/dev/dsk/c2t0d0s1 is part of active ZFS pool pool. Please see zpool(1M).

format> partition
partition> print
Current partition table (original):
Total disk cylinders available: 14466 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       swap    wm       1 -   131        1.00GB    (131/0/0)     2104515
  1   reserved    wm     132 - 14465      109.80GB    (14334/0/0) 230275710
  2     backup    wu       0 - 14465      110.82GB    (14466/0/0) 232396290
  3 unassigned    wm       0                0         (0/0/0)             0
  4 unassigned    wm       0                0         (0/0/0)             0
  5 unassigned    wm       0                0         (0/0/0)             0
  6 unassigned    wm       0                0         (0/0/0)             0
  7 unassigned    wm       0                0         (0/0/0)             0
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 unassigned    wm       0                0         (0/0/0)             0

Also, interestingly, stat -L /dev/dsk/c2t0d0s* returns a size of 
9223372036854775807 for all slices except for slice 0.

Ricardo Correia

2006-Dec-15 17:42 UTC

head link

[zfs-discuss] Corrupted pool

This might help diagnosing the problem: zdb successfully traversed the pool. 
Here''s the output:

root at zfs:~# zdb -c pool

Traversing all blocks to verify checksums and verify nothing leaked ...
zdb_blkptr_cb: Got error 50 reading <5, 3539, 0, 12e7>  -- skipping

Error counts:

        errno  count
           50  1
leaked space: vdev 0, offset 0x1b36680000, size 131072
block traversal size 80614569984 != alloc 80614701056 (leaked 131072)

        bp count:         1617815
        bp logical:    91235758080       avg:  56394
        bp physical:   80118322688       avg:  49522    compression:   1.14
        bp allocated:  80614569984       avg:  49829    compression:   1.13
        SPA allocated: 80614701056      used: 68.88%

However, I still can''t mount it.

Ricardo Correia

2006-Dec-15 18:07 UTC

head link

[zfs-discuss] Corrupted pool

Not sure if this is helpful, but anyway..:

root at zfs:~# zdb -bb pool

Traversing all blocks to verify nothing leaked ...

        No leaks (block sum matches space maps exactly)

        bp count:         1617816
        bp logical:    91235889152       avg:  56394
        bp physical:   80118453760       avg:  49522    compression:   1.14
        bp allocated:  80614701056       avg:  49829    compression:   1.13
        SPA allocated: 80614701056      used: 68.88%

Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
    19   220K   46.0K    128K   6.71K    4.78     0.00  deferred free
     1    512     512      1K      1K    1.00     0.00  object directory
     2     1K      1K      2K      1K    1.00     0.00  object array
     1    16K   1.50K   3.00K   3.00K   10.67     0.00  packed nvlist
     -      -       -       -       -       -        -  packed nvlist size
   567  69.2M   7.48M   15.0M   27.1K    9.25     0.02  bplist
     -      -       -       -       -       -        -  bplist header
     -      -       -       -       -       -        -  SPA space map header
   588  2.85M   1.59M   3.24M   5.64K    1.79     0.00  SPA space map
     1     8K      8K      8K      8K    1.00     0.00  ZIL intent log
 48.8K   781M    199M    417M   8.54K    3.92     0.54  DMU dnode
   167   167K   83.5K    251K   1.50K    2.00     0.00  DMU objset
     -      -       -       -       -       -        -  DSL directory
     4     2K      2K      4K      1K    1.00     0.00  DSL directory child 
map
     3  11.5K   3.00K   6.00K      2K    3.83     0.00  DSL dataset snap map
     6  65.0K      8K     16K   2.67K    8.12     0.00  DSL props
     -      -       -       -       -       -        -  DSL dataset
     -      -       -       -       -       -        -  ZFS znode
     -      -       -       -       -       -        -  ZFS ACL
 1.38M  83.6G   74.2G   74.3G   54.0K    1.13    98.96  ZFS plain file
  119K   508M    179M    363M   3.04K    2.84     0.47  ZFS directory
     3  1.50K   1.50K   3.00K      1K    1.00     0.00  ZFS master node
   155  93.0K   77.5K    155K      1K    1.20     0.00  ZFS delete queue
     -      -       -       -       -       -        -  zvol object
     -      -       -       -       -       -        -  zvol prop
     -      -       -       -       -       -        -  other uint8[]
     -      -       -       -       -       -        -  other uint64[]
     -      -       -       -       -       -        -  other ZAP
     -      -       -       -       -       -        -  persistent error log
 1.54M  85.0G   74.6G   75.1G   48.7K    1.14   100.00  Total

Ricardo Correia

2006-Dec-15 20:11 UTC

head link

[zfs-discuss] Corrupted pool

With the help of dtrace, I found out that in vdev_disk_open() (in 
vdev_disk.c), the ddi_devid_compare() function was failing.

I don''t know why the devid has changed,  but simply doing zpool export
; zpool
import did the trick - the pool imported correctly and the contents seem to 
be intact.

Examining the backup logs, I can see some rather strange behaviour on the part 
of ZFS during the rsync! I''m not even going to try to understand it,
but I''m
keeping the logs if anyone is interested.

I''m now doing a scrub, this is the result so far:

root at zfs:/pool/tmp# zpool status -v
  pool: pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress, 33.38% done, 1h10m to go
config:

        NAME        STATE     READ WRITE CKSUM
        pool        ONLINE       0     0     2
          c2t0d0s1  ONLINE       0     0     2

errors: The following persistent errors have been detected:

          DATASET  OBJECT  RANGE
          pool     3539    634257408-634388480

The file with inode number 3539 is a leftover from the failed backups, and the 
snapshots seem to be intact, so everything is well now. ZFS checksums rule.

Eric Schrock

2006-Dec-15 21:54 UTC

head link

[zfs-discuss] Corrupted pool

On Fri, Dec 15, 2006 at 08:11:08PM +0000, Ricardo Correia
wrote:> With the help of dtrace, I found out that in vdev_disk_open() (in 
> vdev_disk.c), the ddi_devid_compare() function was failing.
> 
> I don''t know why the devid has changed,  but simply doing zpool
export ; zpool
> import did the trick - the pool imported correctly and the contents seem to
> be intact.
> 
> Examining the backup logs, I can see some rather strange behaviour on the
part
> of ZFS during the rsync! I''m not even going to try to understand
it, but I''m
> keeping the logs if anyone is interested.
Ah, you''re running into this bug:

650054 ZFS fails to see the disk if devid of the disk changes due to driver
upgrade

Basically, if we have the correct path but the wrong devid, we bail out
of vdev_disk_open().  But what we really should do is trust the path,
and then let the higher level code (vdev_validate()) make sure it''s
still the right device.  This should be fixed soon, as we need it to
handle some upcoming devid changes.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Ricardo Correia

2006-Dec-15 22:23 UTC

head link

[zfs-discuss] Corrupted pool

On Friday 15 December 2006 21:54, Eric Schrock wrote:> Ah, you''re running into this bug:
>
> 650054 ZFS fails to see the disk if devid of the disk changes due to driver
> upgrade
You mean 6500545 ;)
>
> Basically, if we have the correct path but the wrong devid, we bail out
> of vdev_disk_open().  But what we really should do is trust the path,
> and then let the higher level code (vdev_validate()) make sure
it''s
> still the right device.  This should be fixed soon, as we need it to
> handle some upcoming devid changes.
Indeed, that''s what was happening. After ddi_devid_compare() failed, it
tried
to do ldi_open_by_devid() which also failed and bailed out.

zfs discuss - Dec 2006 - Corrupted pool

[zfs-discuss] Corrupted pool

[zfs-discuss] Re: Corrupted pool

[zfs-discuss] Re: Corrupted pool

[zfs-discuss] Corrupted pool

[zfs-discuss] Corrupted pool

[zfs-discuss] Corrupted pool

[zfs-discuss] Corrupted pool

[zfs-discuss] Corrupted pool