thr3ads.net - zfs discuss - [zfs-discuss] Understanding ZFS recovery [Jul 2012]

If this information is useful, please help other people find it:
Share via:

RichTea

2012-Jul-12 10:20 UTC

[zfs-discuss] Understanding ZFS recovery

Hi All,

I hope this is the correct place for this question as I think it was ZFS
that saved me.

A little while ago I did some thing very silly in a moment of non
concentration...
I meant to use dd to copy an image (around 500Mb) to a USB disk but instead
wrote over a non raided zfs disk!
This of course stopped the zpool from operating, I fiddled around trying to
repair the problem and ended up having to reboot the server as I had ended
up with many hung process trying to read the disk.

I am using Solaris 11, in my x86 HP Microserver n36l with sata disks.

Below is various status output''s: My question is how after a reboot did
the
disk / zpool recover with from what I have seen so far no corruption at all?

root at n36l:/export/home/drowl# zpool status -x
pool: data2
state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run ''zpool
clear''.
see: http://www.sun.com/msg/ZFS-8000-HC
scan: none requested
config:

NAME STATE READ WRITE CKSUM
data2 UNAVAIL 0 0 0 experienced I/O failures
c2t0d0s0 UNAVAIL 0 0 0 experienced I/O failures

I guessing from the below its the label / partition that is busted...

root at n36l:/export/home/drowl# prtvtoc /dev/rdsk/c2t0d0s2
prtvtoc: /dev/rdsk/c2t0d0s2: Unable to read Disk geometry errno = 0x5

root at n36l:/export/home/drowl# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c2t0d0 <drive type unknown>
/pci at 0,0/pci103c,1609 at 11/disk at 0,0
...
Specify disk (enter its number): 0

Error: can''t open disk ''/dev/rdsk/c2t0d0p0''.

AVAILABLE DRIVE TYPES:
0. Auto configure
...
19. ATA -Hitachi HDT7210-A3AA
20. other
Specify disk type (enter its number):

Any help most welcomed,
Ritchie.
--
<--Time flies like an arrow; fruit flies like a banana. -->
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120712/277f4407/attachment.html>

Jim Klimov

2012-Jul-12 12:20 UTC

head link

[zfs-discuss] Understanding ZFS recovery

2012-07-12 14:20, RichTea wrote:> I meant to use dd to copy an image (around 500Mb) to a USB disk but
> instead wrote over a non raided zfs disk!
> Below is various status output''s: My question is how after a
reboot did
> the disk / zpool recover with from what I have seen so far no corruption
> at all?
Well, from your outputs I can''t say there are no corruptions -
the pool is suspended, its device is unavailable and partitions
are screwed ;)

How did you decide it is okay and that zfs saved you? Did you
NOT post some further progress in your recovery?

Purely speculating, I might however suggest that your disk was
dedicated to the pool completely, so its last blocks contain
spare uberblocks (zpool labels) and that might help ZFS detect
and import the pool - if it did indeed. Further on, this label
refers to the current root of blockpointer (metadata) tree,
which is stored with double or sometimes triple redundancy,
so much of you metadata which you tried reading is indeed
readable from at least one copy.

Likely some data was overwritten by your actions and lost,
but you can only detect that by trying to read all data from
the pool. If the checksums of read-in blocks mismatch the
expectations from block-pointers, ZFS will know there are
errors or even losses.
For example, "zpool scrub" does just that - so you should
run that on your pool, if it is now importable to you.

Good luck, HTH,
//Jim Klimov

Edward Ned Harvey

2012-Jul-12 13:35 UTC

head link

[zfs-discuss] Understanding ZFS recovery

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Jim Klimov
> 
> Purely speculating, I might however suggest that your disk was
> dedicated to the pool completely, so its last blocks contain
> spare uberblocks (zpool labels) and that might help ZFS detect
> and import the pool - 
Certain types of data have multiple copies on disk.  I have overwritten the
first 1MB of a disk before, and then still been able to import the pool, so
I suspect, with a little effort, you''ll be able to import your pool
again.

After the pool is imported, of course, some of your data is very likely to
be corrupt.  ZFS should be able to detect it, because the checksum
won''t
match.  You should run a scrub.

You''ll be able to produce a list of all the partially-corrupted files. 
Most
likely, you''ll just want to rm those files, and then you''ll
know you have
good files, whatever is still left.

RichTea

2012-Jul-12 14:33 UTC

head link

[zfs-discuss] Understanding ZFS recovery

>How did you decide it is okay and that zfs saved you? Did you
>NOT post some further progress in your recovery?
I made no further recovery attempts,  the pool imported cleanly after
rebooting, or so i thought [1] as a zpool status showed no errors and i
could read data from the drive again.



On Thu, Jul 12, 2012 at 2:35 PM, Edward Ned Harvey <
opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:
> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> > bounces at opensolaris.org] On Behalf Of Jim Klimov
> >
> > Purely speculating, I might however suggest that your disk was
> > dedicated to the pool completely, so its last blocks contain
> > spare uberblocks (zpool labels) and that might help ZFS detect
> > and import the pool -
>
> Certain types of data have multiple copies on disk.  I have overwritten the
> first 1MB of a disk before, and then still been able to import the pool, so
> I suspect, with a little effort, you''ll be able to import your
pool again.
>
> After the pool is imported, of course, some of your data is very likely to
> be corrupt.  ZFS should be able to detect it, because the checksum
won''t
> match.  You should run a scrub.
>

[1]  Ok i have run a scrub on the pool and is now being reported as being
in DEGRADED status again.

I did think it was strange that the zpool had magically recovered its self:

root at n36l:~# zpool status data2
  pool: data2
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using ''zpool clear'' or replace the device with
''zpool replace''.
   see: http://www.sun.com/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 0h26m with 0 errors on Thu Jul 12 15:07:47 2012
config:

        NAME        STATE     READ WRITE CKSUM
        data2       DEGRADED     0     0     0
          c2t0d0s0  DEGRADED     0     0     0  too many errors

errors: No known data errors


At least it is letting me access data for now, i guess the only fix is to
migrate data off and then "rebuild" the disk.

--
Ritchie

> You''ll be able to produce a list of all the partially-corrupted
files.
>  Most
> likely, you''ll just want to rm those files, and then
you''ll know you have
> good files, whatever is still left.
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120712/ebf3d11d/attachment-0001.html>

Cindy Swearingen

2012-Jul-12 19:25 UTC

head link

[zfs-discuss] Understanding ZFS recovery

Hi Rich,

I don''t think anyone can say definitively how this problem resolved,
but I believe that the dd command overwrote some of the disk label,
as you describe below.

Your format output below looks like you relabeled the disk and maybe
that was enough to resolve this problem.

I have had success with just relabeling the disk in an active pool,
when I accidentally trampled it with the wrong command.

You could try to use zpool clear to clear the DEGRADED device.
Possibly, scrub again and clear as needed.

Thanks,

Cindy

On 07/12/12 08:33, RichTea wrote:>  >How did you decide it is okay and that zfs saved you? Did you
>  >NOT post some further progress in your recovery?
>
> I made no further recovery attempts,  the pool imported cleanly after
> rebooting, or so i thought [1] as a zpool status showed no errors and i
> could read data from the drive again.
>
>
>
> On Thu, Jul 12, 2012 at 2:35 PM, Edward Ned Harvey
> <opensolarisisdeadlongliveopensolaris at nedharvey.com
> <mailto:opensolarisisdeadlongliveopensolaris at nedharvey.com>>
wrote:
>
>      > From: zfs-discuss-bounces at opensolaris.org
>     <mailto:zfs-discuss-bounces at opensolaris.org>
[mailto:zfs-discuss-
>     <mailto:zfs-discuss->
>      > bounces at opensolaris.org <mailto:bounces at
opensolaris.org>] On
>     Behalf Of Jim Klimov
>      >
>      > Purely speculating, I might however suggest that your disk was
>      > dedicated to the pool completely, so its last blocks contain
>      > spare uberblocks (zpool labels) and that might help ZFS detect
>      > and import the pool -
>
>     Certain types of data have multiple copies on disk.  I have
>     overwritten the
>     first 1MB of a disk before, and then still been able to import the
>     pool, so
>     I suspect, with a little effort, you''ll be able to import your
pool
>     again.
>
>     After the pool is imported, of course, some of your data is very
>     likely to
>     be corrupt.  ZFS should be able to detect it, because the checksum
won''t
>     match.  You should run a scrub.
>
>
>
> [1]  Ok i have run a scrub on the pool and is now being reported as
> being in DEGRADED status again.
>
> I did think it was strange that the zpool had magically recovered its self:
>
> root at n36l:~# zpool status data2
>    pool: data2
>   state: DEGRADED
> status: One or more devices has experienced an unrecoverable error.  An
>          attempt was made to correct the error.  Applications are
> unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>          using ''zpool clear'' or replace the device with
''zpool replace''.
>     see: http://www.sun.com/msg/ZFS-8000-9P
>    scan: scrub repaired 0 in 0h26m with 0 errors on Thu Jul 12 15:07:47
2012
> config:
>
>          NAME        STATE     READ WRITE CKSUM
>          data2       DEGRADED     0     0     0
>            c2t0d0s0  DEGRADED     0     0     0  too many errors
>
> errors: No known data errors
>
>
> At least it is letting me access data for now, i guess the only fix is
> to migrate data off and then "rebuild" the disk.
>
> --
> Ritchie
>
>
>     You''ll be able to produce a list of all the
partially-corrupted
>     files.  Most
>     likely, you''ll just want to rm those files, and then
you''ll know you
>     have
>     good files, whatever is still left.
>
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Jul 2012 - Understanding ZFS recovery

[zfs-discuss] Understanding ZFS recovery

[zfs-discuss] Understanding ZFS recovery

[zfs-discuss] Understanding ZFS recovery

[zfs-discuss] Understanding ZFS recovery

[zfs-discuss] Understanding ZFS recovery