thr3ads.net - zfs discuss - [zfs-discuss] pool metadata has duplicate children [Jan 2013]

If this information is useful, please help other people find it:
Share via:

John Giannandrea

2013-Jan-08 18:05 UTC

[zfs-discuss] pool metadata has duplicate children

I seem to have managed to end up with a pool that is confused abut its children
disks.  The pool is faulted with corrupt metadata:

  pool: d
 state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from
	a backup source.
   see: http://illumos.org/msg/ZFS-8000-72
  scan: none requested
config:

	NAME                     STATE     READ WRITE CKSUM
	d                        FAULTED      0     0     1
	  raidz1-0               FAULTED      0     0     6
	    da1                  ONLINE       0     0     0
	    3419704811362497180  OFFLINE      0     0     0  was /dev/da2
	    da3                  ONLINE       0     0     0
	    da4                  ONLINE       0     0     0
	    da5                  ONLINE       0     0     0

But if I look at the labels on all the online disks I see this:

# zdb -ul /dev/da1 | egrep ''(children|path)''
        children[0]:
            path: ''/dev/da1''
        children[1]:
            path: ''/dev/da2''
        children[2]:
            path: ''/dev/da2''
        children[3]:
            path: ''/dev/da3''
        children[4]:
            path: ''/dev/da4''
        ...

But the offline disk (da2) shows the older correct label:

        children[0]:
            path: ''/dev/da1''
        children[1]:
            path: ''/dev/da2''
        children[2]:
            path: ''/dev/da3''
        children[3]:
            path: ''/dev/da4''
        children[4]:
            path: ''/dev/da5''

zpool import -F doesnt help because none of the labels on the unfaulted disks
seem to have the right label.  And unless I can import the pool I cant replace
the bad drive.

Also zpool seems to really not want to import a raidz1 pool with one faulted
drive even though that should be readable.  I have read about the undocumented
-V option but dont know if that would help.

I got into this state when i noticed the pool was DEGRADED and was trying to
replace the bad disk.   I am debugging it under FreeBSD 9.1

Suggestions of things to try welcome, Im more interested in learning what went
wrong than restoring the pool.  I dont think I should have been able to go from
one offline drive to a unrecoverable pool this easily.

-jg

Gregg Wonderly

2013-Jan-08 18:33 UTC

head link

[zfs-discuss] pool metadata has duplicate children

Have you tried importing the pool with that drive completely unplugged?  Which
HBA are you using?  How many of these disks are on same or separate HBAs?

Gregg Wonderly


On Jan 8, 2013, at 12:05 PM, John Giannandrea <jg at meer.net> wrote:
> 
> I seem to have managed to end up with a pool that is confused abut its
children disks.  The pool is faulted with corrupt metadata:
> 
>  pool: d
> state: FAULTED
> status: The pool metadata is corrupted and the pool cannot be opened.
> action: Destroy and re-create the pool from
> 	a backup source.
>   see: http://illumos.org/msg/ZFS-8000-72
>  scan: none requested
> config:
> 
> 	NAME                     STATE     READ WRITE CKSUM
> 	d                        FAULTED      0     0     1
> 	  raidz1-0               FAULTED      0     0     6
> 	    da1                  ONLINE       0     0     0
> 	    3419704811362497180  OFFLINE      0     0     0  was /dev/da2
> 	    da3                  ONLINE       0     0     0
> 	    da4                  ONLINE       0     0     0
> 	    da5                  ONLINE       0     0     0
> 
> But if I look at the labels on all the online disks I see this:
> 
> # zdb -ul /dev/da1 | egrep ''(children|path)''
>        children[0]:
>            path: ''/dev/da1''
>        children[1]:
>            path: ''/dev/da2''
>        children[2]:
>            path: ''/dev/da2''
>        children[3]:
>            path: ''/dev/da3''
>        children[4]:
>            path: ''/dev/da4''
>        ...
> 
> But the offline disk (da2) shows the older correct label:
> 
>        children[0]:
>            path: ''/dev/da1''
>        children[1]:
>            path: ''/dev/da2''
>        children[2]:
>            path: ''/dev/da3''
>        children[3]:
>            path: ''/dev/da4''
>        children[4]:
>            path: ''/dev/da5''
> 
> zpool import -F doesnt help because none of the labels on the unfaulted
disks seem to have the right label.  And unless I can import the pool I cant
replace the bad drive.
> 
> Also zpool seems to really not want to import a raidz1 pool with one
faulted drive even though that should be readable.  I have read about the
undocumented -V option but dont know if that would help.
> 
> I got into this state when i noticed the pool was DEGRADED and was trying
to replace the bad disk.   I am debugging it under FreeBSD 9.1
> 
> Suggestions of things to try welcome, Im more interested in learning what
went wrong than restoring the pool.  I dont think I should have been able to go
from one offline drive to a unrecoverable pool this easily.
> 
> -jg
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

John Giannandrea

2013-Jan-09 05:30 UTC

head link

[zfs-discuss] pool metadata has duplicate children

Gregg Wonderly <greggwon at gmail.com> wrote:> Have you tried importing the pool with that drive completely unplugged?  
Thanks for your reply.   I just tried that.  zpool import now says:

   pool: d
     id: 13178956075737687211
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the ''-f'' flag.
   see: http://illumos.org/msg/ZFS-8000-72
 config:

        d                        FAULTED  corrupted data
          raidz1-0               FAULTED  corrupted data
            da1                  ONLINE
            3419704811362497180  OFFLINE
            da2                  ONLINE
            da3                  ONLINE
            da4                  ONLINE

Notice that in the absence of the faulted da2 the OS has assigned da3 to da2
etc.  I suspect this was part of the original problem in creating a label with
two da2s

zdb still reports that the label has two da2 children:

    vdev_tree:
        type: ''raidz''
        id: 0
        guid: 11828532517066189487
        nparity: 1
        metaslab_array: 23
        metaslab_shift: 36
        ashift: 9
        asize: 9999920660480
        is_log: 0
        children[0]:
            type: ''disk''
            id: 0
            guid: 13697627234083630557
            path: ''/dev/da1''
            whole_disk: 0
            DTL: 78
        children[1]:
            type: ''disk''
            id: 1
            guid: 3419704811362497180
            path: ''/dev/da2''
            whole_disk: 0
            DTL: 71
            offline: 1
        children[2]:
            type: ''disk''
            id: 2
            guid: 6790266178760006782
            path: ''/dev/da2''
            whole_disk: 0
            DTL: 77
        children[3]:
            type: ''disk''
            id: 3
            guid: 2883571222332651955
            path: ''/dev/da3''
            whole_disk: 0
            DTL: 76
        children[4]:
            type: ''disk''
            id: 4
            guid: 16640597255468768296
            path: ''/dev/da4''
            whole_disk: 0
            DTL: 75


> Which HBA are you using?  How many of these disks are on same or separate
HBAs?
all the disks are on the same HBA

twa0: <3ware 9000 series Storage Controller>
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9500S-8, 8 ports,
Firmware FE9X 2.08.00.006
da0 at twa0 bus 0 scbus0 target 0 lun 0
da1 at twa0 bus 0 scbus0 target 1 lun 0
da2 at twa0 bus 0 scbus0 target 2 lun 0
da3 at twa0 bus 0 scbus0 target 3 lun 0
da4 at twa0 bus 0 scbus0 target 4 lun 0

-jg

Peter Jeremy

2013-Jan-10 08:13 UTC

head link

[zfs-discuss] pool metadata has duplicate children

On 2013-Jan-08 21:30:57 -0800, John Giannandrea <jg at meer.net>
wrote:>Notice that in the absence of the faulted da2 the OS has assigned da3 to da2
etc.  I suspect this was part of the original problem in creating a label with
two da2s
The primary vdev identifier is tha guid.  Tha path is of secondary
importance (ZFS should automatically recover from juggled disks
without an issue - and has for me).

Try running "zdb -l" on each of your pool disks and verify that
each has 4 identical labels, and that the 5 guids (one on each
disk) are unique and match the vdev_tree you got from zdb.

My suspicion is that you''ve somehow "lost" the disk with the
guid
3419704811362497180.
>twa0: <3ware 9000 series Storage Controller>
>twa0: INFO: (0x15: 0x1300): Controller details:: Model 9500S-8, 8 ports,
Firmware FE9X 2.08.00.006
>da0 at twa0 bus 0 scbus0 target 0 lun 0
>da1 at twa0 bus 0 scbus0 target 1 lun 0
>da2 at twa0 bus 0 scbus0 target 2 lun 0
>da3 at twa0 bus 0 scbus0 target 3 lun 0
>da4 at twa0 bus 0 scbus0 target 4 lun 0
Are these all JBOD devices?

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130110/65e8d162/attachment.bin>

Possibly Parallel Threads

Search for more seemingly similar threads

zfs discuss - Jan 2013 - pool metadata has duplicate children

[zfs-discuss] pool metadata has duplicate children

[zfs-discuss] pool metadata has duplicate children

[zfs-discuss] pool metadata has duplicate children

[zfs-discuss] pool metadata has duplicate children

Possibly Parallel Threads