Hi All, I hope this is the correct place for this question as I think it was ZFS that saved me. A little while ago I did some thing very silly in a moment of non concentration... I meant to use dd to copy an image (around 500Mb) to a USB disk but instead wrote over a non raided zfs disk! This of course stopped the zpool from operating, I fiddled around trying to repair the problem and ended up having to reboot the server as I had ended up with many hung process trying to read the disk. I am using Solaris 11, in my x86 HP Microserver n36l with sata disks. Below is various status output''s: My question is how after a reboot did the disk / zpool recover with from what I have seen so far no corruption at all? root at n36l:/export/home/drowl# zpool status -x pool: data2 state: SUSPENDED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run ''zpool clear''. see: http://www.sun.com/msg/ZFS-8000-HC scan: none requested config: NAME STATE READ WRITE CKSUM data2 UNAVAIL 0 0 0 experienced I/O failures c2t0d0s0 UNAVAIL 0 0 0 experienced I/O failures I guessing from the below its the label / partition that is busted... root at n36l:/export/home/drowl# prtvtoc /dev/rdsk/c2t0d0s2 prtvtoc: /dev/rdsk/c2t0d0s2: Unable to read Disk geometry errno = 0x5 root at n36l:/export/home/drowl# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c2t0d0 <drive type unknown> /pci at 0,0/pci103c,1609 at 11/disk at 0,0 ... Specify disk (enter its number): 0 Error: can''t open disk ''/dev/rdsk/c2t0d0p0''. AVAILABLE DRIVE TYPES: 0. Auto configure ... 19. ATA -Hitachi HDT7210-A3AA 20. other Specify disk type (enter its number): Any help most welcomed, Ritchie. -- <--Time flies like an arrow; fruit flies like a banana. --> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120712/277f4407/attachment.html>
2012-07-12 14:20, RichTea wrote:> I meant to use dd to copy an image (around 500Mb) to a USB disk but > instead wrote over a non raided zfs disk!> Below is various status output''s: My question is how after a reboot did > the disk / zpool recover with from what I have seen so far no corruption > at all?Well, from your outputs I can''t say there are no corruptions - the pool is suspended, its device is unavailable and partitions are screwed ;) How did you decide it is okay and that zfs saved you? Did you NOT post some further progress in your recovery? Purely speculating, I might however suggest that your disk was dedicated to the pool completely, so its last blocks contain spare uberblocks (zpool labels) and that might help ZFS detect and import the pool - if it did indeed. Further on, this label refers to the current root of blockpointer (metadata) tree, which is stored with double or sometimes triple redundancy, so much of you metadata which you tried reading is indeed readable from at least one copy. Likely some data was overwritten by your actions and lost, but you can only detect that by trying to read all data from the pool. If the checksums of read-in blocks mismatch the expectations from block-pointers, ZFS will know there are errors or even losses. For example, "zpool scrub" does just that - so you should run that on your pool, if it is now importable to you. Good luck, HTH, //Jim Klimov
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Jim Klimov > > Purely speculating, I might however suggest that your disk was > dedicated to the pool completely, so its last blocks contain > spare uberblocks (zpool labels) and that might help ZFS detect > and import the pool -Certain types of data have multiple copies on disk. I have overwritten the first 1MB of a disk before, and then still been able to import the pool, so I suspect, with a little effort, you''ll be able to import your pool again. After the pool is imported, of course, some of your data is very likely to be corrupt. ZFS should be able to detect it, because the checksum won''t match. You should run a scrub. You''ll be able to produce a list of all the partially-corrupted files. Most likely, you''ll just want to rm those files, and then you''ll know you have good files, whatever is still left.
>How did you decide it is okay and that zfs saved you? Did you >NOT post some further progress in your recovery?I made no further recovery attempts, the pool imported cleanly after rebooting, or so i thought [1] as a zpool status showed no errors and i could read data from the drive again. On Thu, Jul 12, 2012 at 2:35 PM, Edward Ned Harvey < opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > > bounces at opensolaris.org] On Behalf Of Jim Klimov > > > > Purely speculating, I might however suggest that your disk was > > dedicated to the pool completely, so its last blocks contain > > spare uberblocks (zpool labels) and that might help ZFS detect > > and import the pool - > > Certain types of data have multiple copies on disk. I have overwritten the > first 1MB of a disk before, and then still been able to import the pool, so > I suspect, with a little effort, you''ll be able to import your pool again. > > After the pool is imported, of course, some of your data is very likely to > be corrupt. ZFS should be able to detect it, because the checksum won''t > match. You should run a scrub. >[1] Ok i have run a scrub on the pool and is now being reported as being in DEGRADED status again. I did think it was strange that the zpool had magically recovered its self: root at n36l:~# zpool status data2 pool: data2 state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using ''zpool clear'' or replace the device with ''zpool replace''. see: http://www.sun.com/msg/ZFS-8000-9P scan: scrub repaired 0 in 0h26m with 0 errors on Thu Jul 12 15:07:47 2012 config: NAME STATE READ WRITE CKSUM data2 DEGRADED 0 0 0 c2t0d0s0 DEGRADED 0 0 0 too many errors errors: No known data errors At least it is letting me access data for now, i guess the only fix is to migrate data off and then "rebuild" the disk. -- Ritchie> You''ll be able to produce a list of all the partially-corrupted files. > Most > likely, you''ll just want to rm those files, and then you''ll know you have > good files, whatever is still left. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120712/ebf3d11d/attachment-0001.html>
Hi Rich, I don''t think anyone can say definitively how this problem resolved, but I believe that the dd command overwrote some of the disk label, as you describe below. Your format output below looks like you relabeled the disk and maybe that was enough to resolve this problem. I have had success with just relabeling the disk in an active pool, when I accidentally trampled it with the wrong command. You could try to use zpool clear to clear the DEGRADED device. Possibly, scrub again and clear as needed. Thanks, Cindy On 07/12/12 08:33, RichTea wrote:> >How did you decide it is okay and that zfs saved you? Did you > >NOT post some further progress in your recovery? > > I made no further recovery attempts, the pool imported cleanly after > rebooting, or so i thought [1] as a zpool status showed no errors and i > could read data from the drive again. > > > > On Thu, Jul 12, 2012 at 2:35 PM, Edward Ned Harvey > <opensolarisisdeadlongliveopensolaris at nedharvey.com > <mailto:opensolarisisdeadlongliveopensolaris at nedharvey.com>> wrote: > > > From: zfs-discuss-bounces at opensolaris.org > <mailto:zfs-discuss-bounces at opensolaris.org> [mailto:zfs-discuss- > <mailto:zfs-discuss-> > > bounces at opensolaris.org <mailto:bounces at opensolaris.org>] On > Behalf Of Jim Klimov > > > > Purely speculating, I might however suggest that your disk was > > dedicated to the pool completely, so its last blocks contain > > spare uberblocks (zpool labels) and that might help ZFS detect > > and import the pool - > > Certain types of data have multiple copies on disk. I have > overwritten the > first 1MB of a disk before, and then still been able to import the > pool, so > I suspect, with a little effort, you''ll be able to import your pool > again. > > After the pool is imported, of course, some of your data is very > likely to > be corrupt. ZFS should be able to detect it, because the checksum won''t > match. You should run a scrub. > > > > [1] Ok i have run a scrub on the pool and is now being reported as > being in DEGRADED status again. > > I did think it was strange that the zpool had magically recovered its self: > > root at n36l:~# zpool status data2 > pool: data2 > state: DEGRADED > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are > unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using ''zpool clear'' or replace the device with ''zpool replace''. > see: http://www.sun.com/msg/ZFS-8000-9P > scan: scrub repaired 0 in 0h26m with 0 errors on Thu Jul 12 15:07:47 2012 > config: > > NAME STATE READ WRITE CKSUM > data2 DEGRADED 0 0 0 > c2t0d0s0 DEGRADED 0 0 0 too many errors > > errors: No known data errors > > > At least it is letting me access data for now, i guess the only fix is > to migrate data off and then "rebuild" the disk. > > -- > Ritchie > > > You''ll be able to produce a list of all the partially-corrupted > files. Most > likely, you''ll just want to rm those files, and then you''ll know you > have > good files, whatever is still left. > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss