Rob Levy
2010-May-20 13:52 UTC
[zfs-discuss] reconstruct recovery of rpool zpool and zfs file system with bad sectors
Folks I posted this question on (OpenSolaris - Help) without any replies http://opensolaris.org/jive/thread.jspa?threadID=129436&tstart=0 and am re-posting here in the hope someone can help ... I have updated the wording a little too (in an attempt to clarify) I currently use OpenSolaris on a Toshiba M10 laptop. One morning the system wouldn''t boot OpenSolaris 2009.06 (it was simply unable progress to the second stage grub). On further investigation I discovered the hdd partition slice with rpool appeared to have bad sectors. Faced with either a rebuild or an attempt at recovery, I first made an attempt to recover the slice before rebuilding. The c7t0d0 HDD (p0) was divided into p1 (NTFS 24GB), p2 (OpenSolaris 24GB), p3 (OpenSolaris zfs pool for data 160GB) and p4 (50GB extended with 32GB pcfs, 12GB linux and linux swap) partitions (or something close to that). On the first Solaris partition (p2), slice 0 was the OpenSolaris rpool zpool. To attempt recovery I booted the OpenSolaris 2009.06 live CD and was able to import the ZFS pool which was configured on p3. On the p2 device (Solaris boot partition which wouldn''t boot) I then ran dd if=/dev/rdsk/c7t0d0s2 bs=512 conv=sync, noerror of=/p0/s2image.dd. Due to sector read error timeouts, this took longer than my maintenance window allowed and I ended up aborting the attempt with a significant amount of sectors already captured. On block examination of this (so far) captured image.dd, I noticed the first two s0 vdev labels appeared to be intact. I then skipped the expected number of s2 sectors to get to the s0 start and copied blocks to attempt to reconstruct the s0 rpool (against this I ran zdb -l which reported the first two labels) and gave me the encouragement necessary to continue the exercise. At the next opportunity I ran the command again using the skip directive to capture the balance of slice. The result was that I had two files (images) comprising the good c7t0d0s0 sectors (with I expect the bad padded) Ie. an s0image_start.dd and s0image_end.dd As mentioned at this stage I was able to run ''zfs -l s0image_start.dd'' and see the first two vdev labels and ''zfs -l s0image_end.dd'' and see the last two vdev labels. I then combined the two files (I tried various approaches eg. cat and dd with the append directive) however only the first two vdev labels appear to be readable in the resulting s0image_s0.dd? The resulting file size, which I expect is largely good sectors with padding for bad sectors, matches that of the prtvtoc s0 sectors count multiplied by 512. Can anyone advise .. why I am unable to read the third and forth vdev labels once the start and end files are combined? Is there another approach that may prove more fruitful? Once I have the file (with labels being in the correct places) I was intending to attempt to import the vdev zpool as rpool2 or attempt any repair procedures I could locate (as far as was possible anyway) to see what data could be recovered (besides it was an opportunity to get another close look at ZFS). Incidentally *only* the c7t0d0s0 slice appeared to have bad sectors (I do wonder what the significance is of this?). -- This message posted from opensolaris.org
Roy Sigurd Karlsbakk
2010-May-20 14:59 UTC
[zfs-discuss] reconstruct recovery of rpool zpool and zfs file system with bad sectors
----- "Rob Levy" <Rob.Levy at oracle.com> skrev:> Folks I posted this question on (OpenSolaris - Help) without any > replies > http://opensolaris.org/jive/thread.jspa?threadID=129436&tstart=0 and > am re-posting here in the hope someone can help ... I have updated the > wording a little too (in an attempt to clarify) > > I currently use OpenSolaris on a Toshiba M10 laptop. > > One morning the system wouldn''t boot OpenSolaris 2009.06 (it was > simply unable progress to the second stage grub). On further > investigation I discovered the hdd partition slice with rpool appeared > to have bad sectors.I would recommend against debugging a filesystem like you have described here. If you have bad sectors on a drive, get a new drive, connect the other drive (directly or with an USB dock or something), import the pools, move the data (rsync if you just want the data or zfs send/receive if you also want the snapshots etc). This might take a while with bad sectors and disk timeouts, but you''ll get (most of?) your data moved over without much hassle. Just my two c. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Rob Levy
2010-May-25 13:26 UTC
[zfs-discuss] reconstruct recovery of rpool zpool and zfs file system with bad sectors
Roy, Thanks for your reply. I did get a new drive and attempted the approach (as you have suggested pre your reply) however once booted off the OpenSolaris Live CD (or the rebuilt new drive), I was not able to import the rpool (which I had established had sector errors). I expect I should have had some success if the vdev labels were intact (I currently suspect some critical boot files are impacted by bad sectors resulting in failed boot attempts from that partition slice). Unfortunately, I didn''t keep a copy of the messages (if any - I have tried many permutations since). At my last attempt ... I installed knoppix (debian) on one of the partitions (also allowed access to smartctl and hdparm too - I was hoping to reduce the read timeout to speed up the exercise), then added zfs-fuse (to access the space I will use to stage the recovery file) and added dd_rescue and gnu ddrescue packages. smartctl appears not to be able to manage the disk while attached to usb (but I am guessing because don''t have much experience with it). At this point I attempted dd_rescue to create an image of the partition with bad sectors (hoping there were efficiencies beyong normal dd) but it was at 5.6GB in 36 hours, so again I needed to abort however it does log the blocks attempted so far so hopefully I can skip past them when I next get an opportunity. Although it does now appear that gnu ddrescue is the preferred of the two utilities which I may opt to use to look at creating an image of the partition before attempting recovery of the slice (rpool). As an aside, I noticed that the knoppix ''dmesg | grep sd'' command which reflects the primary partition devices, no longer appears to reflect the solaris partition (p2) slice devices (as it would the extended p4 partitions logical partition devices configured). I suspect due to this, the rpool (one of the solaris partition slices) appears not to be detected by the knoppix zfs-fuse ''zpool import'' (although I can access the zpool which exists on partition p3). I wonder if this is related to the transition from ufs to zfs? -- This message posted from opensolaris.org