thr3ads.net - zfs discuss - [zfs-discuss] Recover after disk labels "failure" [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Oleg Muravskiy

2008-Oct-19 14:18 UTC

[zfs-discuss] Recover after disk labels "failure"

I have a zfs pool made of two vdevs, each using one whole physical disk, under
OpenSolaris 2008.5.
Disks live on a Netcell SATA/RAID controller, that has three ports (and I
planned to use three disks there, and configure mirrors in zfs), but as it
turned out could only provide one or two disks to the system. So I decided to
create a mirror of two identical drives  on controller instead of one of the
vdevs in zfs pool.

I did a ''dd'' dump of whole disk of second vdev, and created an
array on controller with second disk to mirror that one. After booting to
Solaris ''zpool status'' reported that it can''t use the
disk because label is missing or corrupted. I restored the dump made previously
with dd - and ''zpool status'' reported vdev in state FAILED,
data corrupted, ~100k files with errors. Upon reboot, controller''s BIOS
reported that one of the disks in mirror is failing and needs to be replaced.

So I rebuilt an array. After booting, ZFS still reports disk as FAILED. An
attempt to scrub crushes and reboots the system. Restoring the dump makes array
broken from controller''s point of view. It seems it stores some
configuration information on disk, and that conflicts with ZFS using whole disk.
This is an example of when using whole disk for ZFS is not a good idea. I
reverted to using one-disk array on controller, as it was when I created zfs
filesystem, but that does not help either. ''zdb -l''
"fails to unpack" label 0 and 1. 2 and 3 look ok to me, showing
correct zfs info. ''format'' reports this disk as being the part
of active zfs pool (as well as for the other disk, that was part of a mirror and
now connected via USB). ''zfs replace'' also does not want to
replace because it thinks second disk is part of active pool.

Is there a way to recover from this problem? I''m pretty sure the data
is still OK, it''s just labels that get "corrupted" by
controller or zfs. :(
--
This message posted from opensolaris.org

Oleg Muravskiy

2008-Oct-20 07:57 UTC

head link

[zfs-discuss] Recover after disk labels "failure"

> Is there a way to recover from this problem? I''m
> pretty sure the data is still OK, it''s just labels
> that get "corrupted" by controller or zfs. :(
And this s confirmed by zdb, after loooooong wait for comparison of data and
checksums: no data errors.
--
This message posted from opensolaris.org

Oleg Muravskiy

2008-Oct-21 10:33 UTC

head link

[zfs-discuss] Solution: Recover after disk labels "failure"

I recovered the pool by doing export, import  and scrub.

Apparently you could export pool with a FAILED device, and import will restore
labels from backup copies. Data errors are still there after import, so you need
to scrub pool. After all that the filesystem is back with no errors/problems.

It would be nice if documentation mention this, namely that before trying to
replace disks or restoring backups, you could try to export/import.

Also, it is not clear what "zpool clear" actually clears (what a nice
use of word "clear"!). It does not clear data errors recorded within
the pool. In my case they were registered when I tried to read data from pool
with one device marked as FAILED (when in fact only label was corrupted, the
data itself was OK), and disappeared upon scrub.

So my thanks go to people on Internet who share their findings about zfs, and
zfs developers who made such a robust system (I still think it''s the
best from all [free] systems I used).
--
This message posted from opensolaris.org

zfs discuss - Oct 2008 - Recover after disk labels "failure"

[zfs-discuss] Recover after disk labels "failure"

[zfs-discuss] Recover after disk labels "failure"

[zfs-discuss] Solution: Recover after disk labels "failure"