I seem to have managed to end up with a pool that is confused abut its children disks. The pool is faulted with corrupt metadata: pool: d state: FAULTED status: The pool metadata is corrupted and the pool cannot be opened. action: Destroy and re-create the pool from a backup source. see: http://illumos.org/msg/ZFS-8000-72 scan: none requested config: NAME STATE READ WRITE CKSUM d FAULTED 0 0 1 raidz1-0 FAULTED 0 0 6 da1 ONLINE 0 0 0 3419704811362497180 OFFLINE 0 0 0 was /dev/da2 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 But if I look at the labels on all the online disks I see this: # zdb -ul /dev/da1 | egrep ''(children|path)'' children[0]: path: ''/dev/da1'' children[1]: path: ''/dev/da2'' children[2]: path: ''/dev/da2'' children[3]: path: ''/dev/da3'' children[4]: path: ''/dev/da4'' ... But the offline disk (da2) shows the older correct label: children[0]: path: ''/dev/da1'' children[1]: path: ''/dev/da2'' children[2]: path: ''/dev/da3'' children[3]: path: ''/dev/da4'' children[4]: path: ''/dev/da5'' zpool import -F doesnt help because none of the labels on the unfaulted disks seem to have the right label. And unless I can import the pool I cant replace the bad drive. Also zpool seems to really not want to import a raidz1 pool with one faulted drive even though that should be readable. I have read about the undocumented -V option but dont know if that would help. I got into this state when i noticed the pool was DEGRADED and was trying to replace the bad disk. I am debugging it under FreeBSD 9.1 Suggestions of things to try welcome, Im more interested in learning what went wrong than restoring the pool. I dont think I should have been able to go from one offline drive to a unrecoverable pool this easily. -jg
Have you tried importing the pool with that drive completely unplugged? Which HBA are you using? How many of these disks are on same or separate HBAs? Gregg Wonderly On Jan 8, 2013, at 12:05 PM, John Giannandrea <jg at meer.net> wrote:> > I seem to have managed to end up with a pool that is confused abut its children disks. The pool is faulted with corrupt metadata: > > pool: d > state: FAULTED > status: The pool metadata is corrupted and the pool cannot be opened. > action: Destroy and re-create the pool from > a backup source. > see: http://illumos.org/msg/ZFS-8000-72 > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > d FAULTED 0 0 1 > raidz1-0 FAULTED 0 0 6 > da1 ONLINE 0 0 0 > 3419704811362497180 OFFLINE 0 0 0 was /dev/da2 > da3 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > > But if I look at the labels on all the online disks I see this: > > # zdb -ul /dev/da1 | egrep ''(children|path)'' > children[0]: > path: ''/dev/da1'' > children[1]: > path: ''/dev/da2'' > children[2]: > path: ''/dev/da2'' > children[3]: > path: ''/dev/da3'' > children[4]: > path: ''/dev/da4'' > ... > > But the offline disk (da2) shows the older correct label: > > children[0]: > path: ''/dev/da1'' > children[1]: > path: ''/dev/da2'' > children[2]: > path: ''/dev/da3'' > children[3]: > path: ''/dev/da4'' > children[4]: > path: ''/dev/da5'' > > zpool import -F doesnt help because none of the labels on the unfaulted disks seem to have the right label. And unless I can import the pool I cant replace the bad drive. > > Also zpool seems to really not want to import a raidz1 pool with one faulted drive even though that should be readable. I have read about the undocumented -V option but dont know if that would help. > > I got into this state when i noticed the pool was DEGRADED and was trying to replace the bad disk. I am debugging it under FreeBSD 9.1 > > Suggestions of things to try welcome, Im more interested in learning what went wrong than restoring the pool. I dont think I should have been able to go from one offline drive to a unrecoverable pool this easily. > > -jg > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Gregg Wonderly <greggwon at gmail.com> wrote:> Have you tried importing the pool with that drive completely unplugged?Thanks for your reply. I just tried that. zpool import now says: pool: d id: 13178956075737687211 state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://illumos.org/msg/ZFS-8000-72 config: d FAULTED corrupted data raidz1-0 FAULTED corrupted data da1 ONLINE 3419704811362497180 OFFLINE da2 ONLINE da3 ONLINE da4 ONLINE Notice that in the absence of the faulted da2 the OS has assigned da3 to da2 etc. I suspect this was part of the original problem in creating a label with two da2s zdb still reports that the label has two da2 children: vdev_tree: type: ''raidz'' id: 0 guid: 11828532517066189487 nparity: 1 metaslab_array: 23 metaslab_shift: 36 ashift: 9 asize: 9999920660480 is_log: 0 children[0]: type: ''disk'' id: 0 guid: 13697627234083630557 path: ''/dev/da1'' whole_disk: 0 DTL: 78 children[1]: type: ''disk'' id: 1 guid: 3419704811362497180 path: ''/dev/da2'' whole_disk: 0 DTL: 71 offline: 1 children[2]: type: ''disk'' id: 2 guid: 6790266178760006782 path: ''/dev/da2'' whole_disk: 0 DTL: 77 children[3]: type: ''disk'' id: 3 guid: 2883571222332651955 path: ''/dev/da3'' whole_disk: 0 DTL: 76 children[4]: type: ''disk'' id: 4 guid: 16640597255468768296 path: ''/dev/da4'' whole_disk: 0 DTL: 75> Which HBA are you using? How many of these disks are on same or separate HBAs?all the disks are on the same HBA twa0: <3ware 9000 series Storage Controller> twa0: INFO: (0x15: 0x1300): Controller details:: Model 9500S-8, 8 ports, Firmware FE9X 2.08.00.006 da0 at twa0 bus 0 scbus0 target 0 lun 0 da1 at twa0 bus 0 scbus0 target 1 lun 0 da2 at twa0 bus 0 scbus0 target 2 lun 0 da3 at twa0 bus 0 scbus0 target 3 lun 0 da4 at twa0 bus 0 scbus0 target 4 lun 0 -jg
On 2013-Jan-08 21:30:57 -0800, John Giannandrea <jg at meer.net> wrote:>Notice that in the absence of the faulted da2 the OS has assigned da3 to da2 etc. I suspect this was part of the original problem in creating a label with two da2sThe primary vdev identifier is tha guid. Tha path is of secondary importance (ZFS should automatically recover from juggled disks without an issue - and has for me). Try running "zdb -l" on each of your pool disks and verify that each has 4 identical labels, and that the 5 guids (one on each disk) are unique and match the vdev_tree you got from zdb. My suspicion is that you''ve somehow "lost" the disk with the guid 3419704811362497180.>twa0: <3ware 9000 series Storage Controller> >twa0: INFO: (0x15: 0x1300): Controller details:: Model 9500S-8, 8 ports, Firmware FE9X 2.08.00.006 >da0 at twa0 bus 0 scbus0 target 0 lun 0 >da1 at twa0 bus 0 scbus0 target 1 lun 0 >da2 at twa0 bus 0 scbus0 target 2 lun 0 >da3 at twa0 bus 0 scbus0 target 3 lun 0 >da4 at twa0 bus 0 scbus0 target 4 lun 0Are these all JBOD devices? -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130110/65e8d162/attachment.bin>