Diogo Franco
2010-May-01 20:06 UTC
[zfs-discuss] Single-disk pool corrupted after controller failure
I had a single spare 500GB HDD and I decided to install a FreeBSD file server in it for learning purposes, and I moved almost all of my data to it. Yesterday, and naturally after no longer having backups of the data in the server, I had a controller failure (SiS 180 (oh, the quality)) and the HDD was considered unplugged. When I noticed a few checksum failures on `zfs status` (including two on metadata (small hex numbers)), I tried running `zfs scrub tank`, thinking it was a regular data corruption and then the box locked up. I had also converted the pool to v14 a few days before, so the freebsd v13 tools couldn''t do anything to help. Today I downloaded the OpenSolaris 134 snapshot image and booted it to try and rescue the pool, but: # zpool status no pools available So I couldn''t run a clean or an export or destroy to reimport with -D. I tried to run a regular import: # zpool import pool: tank id: 6157028625215863355 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tank FAULTED corrupted data c5d0p1 UNAVAIL corrupted data There was no important data written in the past two days or so, thus using an older uberblock would''t be a problem, so I tried using the new recovery option: # mkdir -p /mnt/tank && zpool import -fF -R /mnt/tank tank cannot import ''tank'': one or more devices is currently unavailable Destroy and re-create the pool from a backup source. I tried googling for other people with similar issues, but almost all of them had raids and other complex configuration and were not really related to this problem. After seeing that on some cases labels were corrupted, I tried running zdb -l on mine: # zdb -l /dev/dsk/c5d0p1 -------------------------------------------- LABEL 0 -------------------------------------------- failed to unpack label 0 -------------------------------------------- LABEL 1 -------------------------------------------- failed to unpack label 1 -------------------------------------------- LABEL 2 -------------------------------------------- version: 14 name: ''tank'' state: 0 txg: 11420324 pool_guid: 6157028625215863355 hostid: 2563111091 hostname: '''' top_guid: 1987270273092463401 guid: 1987270273092463401 vdev_tree: type: ''disk'' id: 0 guid: 1987270273092463401 path: ''/dev/ad6s1d'' whole_disk: 0 metaslab_array: 23 metaslab_shift: 32 ashift: 9 asize: 497955373056 is_log: 0 DTL: 111 -------------------------------------------- LABEL 3 -------------------------------------------- version: 14 name: ''tank'' state: 0 txg: 11420324 pool_guid: 6157028625215863355 hostid: 2563111091 hostname: '''' top_guid: 1987270273092463401 guid: 1987270273092463401 vdev_tree: type: ''disk'' id: 0 guid: 1987270273092463401 path: ''/dev/ad6s1d'' whole_disk: 0 metaslab_array: 23 metaslab_shift: 32 ashift: 9 asize: 497955373056 is_log: 0 DTL: 111 I''m looking for pointers on how to fix this situation, since the disk still has available metadata.
Bill Sommerfeld
2010-May-01 21:07 UTC
[zfs-discuss] Single-disk pool corrupted after controller failure
On 05/01/10 13:06, Diogo Franco wrote:> After seeing that on some cases labels were corrupted, I tried running > zdb -l on mine:... (labels 0, 1 not there, labels 2, 3 are there).> I''m looking for pointers on how to fix this situation, since the disk > still has available metadata.there are two reasons why you could get this: 1) the labels are gone. 2) the labels are not at the start of what solaris sees as p1, and thus are somewhere else on the disk. I''d look more closely at how freebsd computes the start of the partition or slice ''/dev/ad6s1d'' that contains the pool. I think #2 is somewhat more likely. - Bill
Diogo Franco
2010-May-02 03:51 UTC
[zfs-discuss] Single-disk pool corrupted after controller failure
On 05/01/2010 06:07 PM, Bill Sommerfeld wrote:> there are two reasons why you could get this: > 1) the labels are gone.Possible, since I got the metadata errors on `zfs status` before.> 2) the labels are not at the start of what solaris sees as p1, and thus > are somewhere else on the disk. I''d look more closely at how freebsd > computes the start of the partition or slice ''/dev/ad6s1d'' > that contains the pool. > > I think #2 is somewhat more likely.c5d0p1 is the only place where zdb finds any labels at all too...
Peter Jeremy
2010-May-02 22:33 UTC
[zfs-discuss] Single-disk pool corrupted after controller failure
On 2010-May-02 04:06:41 +0800, Diogo Franco <diogomfranco at gmail.com> wrote:>regular data corruption and then the box locked up. I had also >converted the pool to v14 a few days before, so the freebsd v13 tools >couldn''t do anything to help.Note that ZFS v14 was imported to FreeBSD 8-stable in mid-January. I can''t comment whether it would be able to recover your data. On 2010-May-02 05:07:17 +0800, Bill Sommerfeld <bill.sommerfeld at oracle.com> wrote:> 2) the labels are not at the start of what solaris sees as p1, and >thus are somewhere else on the disk. I''d look more closely at how >freebsd computes the start of the partition or slice ''/dev/ad6s1d'' >that contains the pool. > >I think #2 is somewhat more likely.This is almost certainly the problem. ad6s1 may be the same as c5d0p1 but OpenSolaris isn''t going to understand the FreeBSD partition label on that slice. All I can suggest is to (temporarily) change the disk slicing so that there is a fdisk slice that matches ad6s1d. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100503/d1955a71/attachment.bin>
Diogo Franco
2010-May-03 15:59 UTC
[zfs-discuss] Single-disk pool corrupted after controller failure
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/02/2010 07:33 PM, Peter Jeremy wrote:> Note that ZFS v14 was imported to FreeBSD 8-stable in mid-January. > I can''t comment whether it would be able to recover your data.I managed to get a livefs cd that had zfs14, but it was unable to import the zpool ("internal error: Illegal byte sequence"). The zpool does appear if I try to run `zpool import` though, as "tank FAULTED corrupted data", and ad6s1d is ONLINE. There is no -F option on bsd''s zpool import.> This is almost certainly the problem. ad6s1 may be the same as c5d0p1 > but OpenSolaris isn''t going to understand the FreeBSD partition label > on that slice. All I can suggest is to (temporarily) change the disk > slicing so that there is a fdisk slice that matches ad6s1d.How could I do just that? I know that my label has a 1G UFS, 1G swap, and the rest is ZFS; but I don''t know how to calculate the correct offset to give to ''format''. I can just regenerate the UFS later after the ZFS is fixed since it was only used for its /boot. Also, don''t To: me and Cc: the list, I''m subscribed to it :) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkve8tUACgkQvPrHMFeSjrLxaQCgrAP0Ne+EPg6Dl7zojQItszfd nrUAnjJR19Zu08aSNpBtlTbV3GvecDQk =vkEK -----END PGP SIGNATURE-----
Peter Jeremy
2010-May-04 01:00 UTC
[zfs-discuss] Single-disk pool corrupted after controller failure
On 2010-May-03 23:59:17 +0800, Diogo Franco <diogomfranco at gmail.com> wrote:>I managed to get a livefs cd that had zfs14, but it was unable to import >the zpool ("internal error: Illegal byte sequence"). The zpool does >appear if I try to run `zpool import` though, as "tank FAULTED corrupted >data", and ad6s1d is ONLINE.That''s not promising.> There is no -F option on bsd''s zpool import.It was introduced around zfs20. I feared it might be needed.> >> This is almost certainly the problem. ad6s1 may be the same as c5d0p1 >> but OpenSolaris isn''t going to understand the FreeBSD partition label >> on that slice. All I can suggest is to (temporarily) change the disk >> slicing so that there is a fdisk slice that matches ad6s1d. >How could I do just that? I know that my label has a 1G UFS, 1G swap, >and the rest is ZFS; but I don''t know how to calculate the correct >offset to give to ''format''. I can just regenerate the UFS later after >the ZFS is fixed since it was only used for its /boot.In FreeBSD, "bsdlabel ad0s1" will report the size and offset of the ''d'' partition in sectors. The offset is relative to the start of that slice - which would normally be absolute block 63 ("fdisk ad0" will confirm that). Adding the offset of ''s1'' to the offset of ''d'' will give you a sector offset for your ZFS data. I haven''t tried using OpenSolaris on x86 so I''m not sure if format allows sector offsets (I know format on Solaris/SPARC insists on cylinder offsets). Since cylinders are a fiction anyway, you might be able to kludge a cylinder size to suit your offset if necessary. The FreeBSD fdisk(8) man page implies that slices start at a track boundary and and at a cylinder boundary but I''m not sure if this is a restriction on LBA disks. Note that if you keep a record of your existing c5d0 format and restore it later, this will recover your existing boot and swap so you shouldn''t need to restore them. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100504/b6b46e36/attachment.bin>
Possibly Parallel Threads
- 7-STABLE, gjournal and fsck.
- [Bug 17462] New: XVideo Overlay adaptor is jerky on HD videos (NV18 card)
- [Bug 17371] New: Xv on NV18: HD video flickers with Adaptor #0, going back and forth
- How to take down a system to the point of requiring a newfs with one line of C (userland)
- How to take down a system to the point of requiring a newfs with one line of C (userland)