thr3ads.net - zfs discuss - [zfs-discuss] Single-disk pool corrupted after controller failure [May 2010]

If this information is useful, please help other people find it:
Share via:

Diogo Franco

2010-May-01 20:06 UTC

[zfs-discuss] Single-disk pool corrupted after controller failure

I had a single spare 500GB HDD and I decided to install a FreeBSD file
server in it for learning purposes, and I moved almost all of my data
to it. Yesterday, and naturally after no longer having backups of the
data in the server, I had a controller failure (SiS 180 (oh, the
quality)) and the HDD was considered unplugged. When I noticed a few
checksum failures on `zfs status` (including two on metadata (small
hex numbers)), I tried running `zfs scrub tank`, thinking it was a
regular data corruption and then the box locked up. I had also
converted the pool to v14 a few days before, so the freebsd v13 tools
couldn''t do anything to help.

Today I downloaded the OpenSolaris 134 snapshot image and booted it to
try and rescue the pool, but:

# zpool status
no pools available

So I couldn''t run a clean or an export or destroy to reimport with -D.
I tried to run a regular import:

# zpool import
 pool: tank
   id: 6157028625215863355
state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the ''-f'' flag.
  see: http://www.sun.com/msg/ZFS-8000-EY
config:

tank        FAULTED  corrupted data
 c5d0p1    UNAVAIL  corrupted data

There was no important data written in the past two days or so, thus
using an older uberblock would''t be a problem, so I tried using the
new recovery option:

# mkdir -p /mnt/tank && zpool import -fF -R /mnt/tank tank
cannot import ''tank'': one or more devices is currently
unavailable
Destroy and re-create the pool from
a backup source.

I tried googling for other people with similar issues, but almost all
of them had raids and other complex configuration and were not really
related to this problem.
After seeing that on some cases labels were corrupted, I tried running
zdb -l on mine:

# zdb -l /dev/dsk/c5d0p1
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
    version: 14
    name: ''tank''
    state: 0
    txg: 11420324
    pool_guid: 6157028625215863355
    hostid: 2563111091
    hostname: ''''
    top_guid: 1987270273092463401
    guid: 1987270273092463401
    vdev_tree:
        type: ''disk''
        id: 0
        guid: 1987270273092463401
        path: ''/dev/ad6s1d''
        whole_disk: 0
        metaslab_array: 23
        metaslab_shift: 32
        ashift: 9
        asize: 497955373056
        is_log: 0
        DTL: 111
--------------------------------------------
LABEL 3
--------------------------------------------
    version: 14
    name: ''tank''
    state: 0
    txg: 11420324
    pool_guid: 6157028625215863355
    hostid: 2563111091
    hostname: ''''
    top_guid: 1987270273092463401
    guid: 1987270273092463401
    vdev_tree:
        type: ''disk''
        id: 0
        guid: 1987270273092463401
        path: ''/dev/ad6s1d''
        whole_disk: 0
        metaslab_array: 23
        metaslab_shift: 32
        ashift: 9
        asize: 497955373056
        is_log: 0
        DTL: 111

I''m looking for pointers on how to fix this situation, since the disk
still has available metadata.

Bill Sommerfeld

2010-May-01 21:07 UTC

head link

[zfs-discuss] Single-disk pool corrupted after controller failure

On 05/01/10 13:06, Diogo Franco wrote:> After seeing that on some cases labels were corrupted, I tried running
> zdb -l on mine:...
(labels 0, 1 not there, labels 2, 3 are there).
> I''m looking for pointers on how to fix this situation, since the
disk
> still has available metadata.
there are two reasons why you could get this:
  1) the labels are gone.

  2) the labels are not at the start of what solaris sees as p1, and 
thus are somewhere else on the disk.  I''d look more closely at how 
freebsd computes the start of the partition or slice
''/dev/ad6s1d''
that contains the pool.

I think #2 is somewhat more likely.

					- Bill

Diogo Franco

2010-May-02 03:51 UTC

head link

[zfs-discuss] Single-disk pool corrupted after controller failure

On 05/01/2010 06:07 PM, Bill Sommerfeld wrote:> there are two reasons why you could get this:
>  1) the labels are gone.Possible, since I got the metadata errors on `zfs status` before.
>  2) the labels are not at the start of what solaris sees as p1, and thus
> are somewhere else on the disk.  I''d look more closely at how
freebsd
> computes the start of the partition or slice
''/dev/ad6s1d''
> that contains the pool.
> 
> I think #2 is somewhat more likely.c5d0p1 is the only place where zdb finds any labels at all too...

Peter Jeremy

2010-May-02 22:33 UTC

head link

[zfs-discuss] Single-disk pool corrupted after controller failure

On 2010-May-02 04:06:41 +0800, Diogo Franco <diogomfranco at gmail.com>
wrote:>regular data corruption and then the box locked up. I had also
>converted the pool to v14 a few days before, so the freebsd v13 tools
>couldn''t do anything to help.
Note that ZFS v14 was imported to FreeBSD 8-stable in mid-January.
I can''t comment whether it would be able to recover your data.

On 2010-May-02 05:07:17 +0800, Bill Sommerfeld <bill.sommerfeld at
oracle.com> wrote:>  2) the labels are not at the start of what solaris sees as p1, and 
>thus are somewhere else on the disk.  I''d look more closely at how 
>freebsd computes the start of the partition or slice
''/dev/ad6s1d''
>that contains the pool.
>
>I think #2 is somewhat more likely.
This is almost certainly the problem.  ad6s1 may be the same as c5d0p1
but OpenSolaris isn''t going to understand the FreeBSD partition label
on that slice.  All I can suggest is to (temporarily) change the disk
slicing so that there is a fdisk slice that matches ad6s1d.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100503/d1955a71/attachment.bin>

Diogo Franco

2010-May-03 15:59 UTC

head link

[zfs-discuss] Single-disk pool corrupted after controller failure

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/02/2010 07:33 PM, Peter Jeremy wrote:> Note that ZFS v14 was imported to FreeBSD 8-stable in mid-January.
> I can''t comment whether it would be able to recover your data.I managed to get a livefs cd that had zfs14, but it was unable to import
the zpool ("internal error: Illegal byte sequence"). The zpool does
appear if I try to run `zpool import` though, as "tank FAULTED corrupted
data", and ad6s1d is ONLINE. There is no -F option on bsd''s zpool
import.
> This is almost certainly the problem.  ad6s1 may be the same as c5d0p1
> but OpenSolaris isn''t going to understand the FreeBSD partition
label
> on that slice.  All I can suggest is to (temporarily) change the disk
> slicing so that there is a fdisk slice that matches ad6s1d.How could I do just that? I know that my label has a 1G UFS, 1G swap,
and the rest is ZFS; but I don''t know how to calculate the correct
offset to give to ''format''. I can just regenerate the UFS
later after
the ZFS is fixed since it was only used for its /boot.

Also, don''t To: me and Cc: the list, I''m subscribed to it :)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkve8tUACgkQvPrHMFeSjrLxaQCgrAP0Ne+EPg6Dl7zojQItszfd
nrUAnjJR19Zu08aSNpBtlTbV3GvecDQk
=vkEK
-----END PGP SIGNATURE-----

Peter Jeremy

2010-May-04 01:00 UTC

head link

[zfs-discuss] Single-disk pool corrupted after controller failure

On 2010-May-03 23:59:17 +0800, Diogo Franco <diogomfranco at gmail.com>
wrote:>I managed to get a livefs cd that had zfs14, but it was unable to import
>the zpool ("internal error: Illegal byte sequence"). The zpool
does
>appear if I try to run `zpool import` though, as "tank FAULTED
corrupted
>data", and ad6s1d is ONLINE.
That''s not promising.
> There is no -F option on bsd''s zpool import.
It was introduced around zfs20.  I feared it might be needed.
>
>> This is almost certainly the problem.  ad6s1 may be the same as c5d0p1
>> but OpenSolaris isn''t going to understand the FreeBSD
partition label
>> on that slice.  All I can suggest is to (temporarily) change the disk
>> slicing so that there is a fdisk slice that matches ad6s1d.
>How could I do just that? I know that my label has a 1G UFS, 1G swap,
>and the rest is ZFS; but I don''t know how to calculate the correct
>offset to give to ''format''. I can just regenerate the UFS
later after
>the ZFS is fixed since it was only used for its /boot.
In FreeBSD, "bsdlabel ad0s1" will report the size and offset of the
''d'' partition in sectors.  The offset is relative to the start
of that
slice - which would normally be absolute block 63 ("fdisk ad0" will
confirm that).

Adding the offset of ''s1'' to the offset of
''d'' will give you a sector
offset for your ZFS data.  I haven''t tried using OpenSolaris on x86
so I''m not sure if format allows sector offsets (I know format on
Solaris/SPARC insists on cylinder offsets).  Since cylinders are a
fiction anyway, you might be able to kludge a cylinder size to suit
your offset if necessary.  The FreeBSD fdisk(8) man page implies
that slices start at a track boundary and and at a cylinder boundary
but I''m not sure if this is a restriction on LBA disks.

Note that if you keep a record of your existing c5d0 format and
restore it later, this will recover your existing boot and swap so you
shouldn''t need to restore them.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100504/b6b46e36/attachment.bin>

Possibly Parallel Threads

Search for more reasonably related threads

zfs discuss - May 2010 - Single-disk pool corrupted after controller failure

[zfs-discuss] Single-disk pool corrupted after controller failure

[zfs-discuss] Single-disk pool corrupted after controller failure

[zfs-discuss] Single-disk pool corrupted after controller failure

[zfs-discuss] Single-disk pool corrupted after controller failure

[zfs-discuss] Single-disk pool corrupted after controller failure

[zfs-discuss] Single-disk pool corrupted after controller failure

Possibly Parallel Threads