thr3ads.net - zfs discuss - [zfs-discuss] ZFS on a damaged disk [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Patrick P Korsnick

2006-Dec-13 06:49 UTC

[zfs-discuss] ZFS on a damaged disk

i have a machine with a disk that has some sort of defect and i''ve
found that if i partition only half of the disk that the machine will still
work.  i tried to use ''format'' to scan the disk and find the
bad blocks, but it didn''t work.

so as i don''t know where the bad blocks are but i''d still like
to use some of the rest of the disk, i thought ZFS might be able to help.  i
partitioned the disk so slices 4,5,6 and 7 are each 5GB.  i thought i''d
make one or multiple zpools on those slices and then i''d be able to
narrow down where the bad sections are.

so my question is can i declare a zpool that spans multiple c0d0sXX but
isn''t a mirror and if i can, then will zfs be able to detect where the
problem c0d0sXX is and not use it?  if not, i''ll have to make 4
different zpools and experiment with storing stuff on each to find the
approximate location of the bad blocks.
 
 
This message posted from opensolaris.org

Tomas Ögren

2006-Dec-13 07:59 UTC

head link

[zfs-discuss] ZFS on a damaged disk

On 12 December, 2006 - Patrick P Korsnick sent me these 1,1K bytes:
> i have a machine with a disk that has some sort of defect and i''ve
> found that if i partition only half of the disk that the machine will
> still work.  i tried to use ''format'' to scan the disk and
find the bad
> blocks, but it didn''t work.
> 
> so as i don''t know where the bad blocks are but i''d still
like to use
> some of the rest of the disk, i thought ZFS might be able to help.  i
> partitioned the disk so slices 4,5,6 and 7 are each 5GB.  i thought
> i''d make one or multiple zpools on those slices and then
i''d be able
> to narrow down where the bad sections are.
> 
> so my question is can i declare a zpool that spans multiple c0d0sXX
> but isn''t a mirror and if i can, then will zfs be able to detect
where
> the problem c0d0sXX is and not use it?  if not, i''ll have to make
4
> different zpools and experiment with storing stuff on each to find the
> approximate location of the bad blocks.
Either create 4 separate pools; zpool create slice4 c0d0s4;zpool create
slice5 c0d0s5;....  and then torture each of them to see where it''s
corrupted.. Or you can for instance create a raidz(2) of those 4 and
watch performance go down the hill, but still work..
zpool create broken raidz2 c0d0s4 c0d0s5 c0d0s6 c0d0s7

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Akhilesh Mritunjai

2006-Dec-13 11:01 UTC

head link

[zfs-discuss] Re: ZFS on a damaged disk

Ok now this takes the "Most egregiously creative misuse of ZFS" award
:-)

I doubt ZFS can help if badblocks "didn''t work". It would
help to know what was the problem with it, but generally a destructive test
reveals a lot.

OTOH, you can also do better by writing a small program which writes random data
+ its checksum (4K data fro /dev/urandom + md5) to the whole disk. A second
program can read the chunks back and report locations where the data
doesn''t match the checksum.
 
 
This message posted from opensolaris.org

Bill Sommerfeld

2006-Dec-13 15:17 UTC

head link

[zfs-discuss] ZFS on a damaged disk

On Tue, 2006-12-12 at 22:49 -0800, Patrick P Korsnick
wrote:> i have a machine with a disk that has some sort of defect and i''ve
> found that if i partition only half of the disk that the machine will
> still work.  i tried to use ''format'' to scan the disk and
find the bad
> blocks, but it didn''t work.
So, Richard Elling will likely have data but I have anecdotes:

I''ve seen two cases of disk failure where errors only occurred during
random I/O; all blocks were readable sequentially; in both cases, this
permitted the disk to be replaced without data loss and without
resorting to backups by doing a "dd" bit image copy to the replacement
drive.

Dunno how much you value your time or the data which will be stored on
this machine but my reflex would be to replace the disk rather than
spend a lot of time working around its failings.

						- Bill

Anton B. Rang

2006-Dec-13 18:15 UTC

head link

[zfs-discuss] Re: ZFS on a damaged disk

> i have a machine with a disk that has some sort of
> defect and i''ve found that if i partition only half
> of the disk that the machine will still work.
"will still work" ... for now.

Don''t keep using this disk.  Chances are that something really bad has
happened to it (e.g. the head has scraped the platter) and you''re at
risk that either (a) the head has sustained damage which may eventually cause
problems, or (b) there are little flakes of disk surface floating around waiting
to randomly destroy more of the disk.

Anton
 
 
This message posted from opensolaris.org

Richard Elling

2006-Dec-13 18:24 UTC

head link

[zfs-discuss] ZFS on a damaged disk

Bill Sommerfeld wrote:> On Tue, 2006-12-12 at 22:49 -0800, Patrick P Korsnick wrote:
>> i have a machine with a disk that has some sort of defect and
i''ve
>> found that if i partition only half of the disk that the machine will
>> still work.  i tried to use ''format'' to scan the disk
and find the bad
>> blocks, but it didn''t work.
> 
> So, Richard Elling will likely have data but I have anecdotes:
I have some data, but without knowing more about the disk, it is
difficult to say where to do.  In some cases a "low level format"
will clear up some errors for a little while for some drives.
> I''ve seen two cases of disk failure where errors only occurred
during
> random I/O; all blocks were readable sequentially; in both cases, this
> permitted the disk to be replaced without data loss and without
> resorting to backups by doing a "dd" bit image copy to the
replacement
> drive.
ouch!  This is a new one for my list.  I suspect a firmware bug.
Note: firmware/software bugs are not considered for general MTBF
calculations -- we assume that the wear-out mechanisms are mechanical.
Software doesn''t wear out.
> Dunno how much you value your time or the data which will be stored on
> this machine but my reflex would be to replace the disk rather than
> spend a lot of time working around its failings.
Yep, the cost of a drive is often much less than the time and
aggravation involved in repairing one.  I know I''ve spent way too
many hours poking-and-hoping :-)
  -- richard

Richard Elling

2006-Dec-13 18:33 UTC

head link

[zfs-discuss] ZFS on a damaged disk

Patrick P Korsnick wrote:> i have a machine with a disk that has some sort of defect and 
> i''ve found that if i partition only half of the disk that the 
> machine will still work.  i tried to use ''format'' to scan
the
> disk and find the bad blocks, but it didn''t work.
By "it didn''t work" did you mean that it didn''t find
any bad blocks?
I usually see the reverse: too many bad blocks which leads me to
do a "low level format" with the disk utilities from the drive
manufacturer.
> so as i don''t know where the bad blocks are but i''d still
like
> to use some of the rest of the disk, i thought ZFS might be 
> able to help.  i partitioned the disk so slices 4,5,6 and 7 are 
> each 5GB.  i thought i''d make one or multiple zpools on those 
> slices and then i''d be able to narrow down where the bad sections 
> are.
This should work, I''m thinking something like mirror slice 0+4,
1+5, 3+6 where the slices are spread across the disk in order.
As you fill up and scrub the zpool, you might find bad areas.
> so my question is can i declare a zpool that spans multiple 
> c0d0sXX but isn''t a mirror and if i can, then will zfs be able
> to detect where the problem c0d0sXX is and not use it?  if not,
> i''ll have to make 4 different zpools and experiment with storing 
> stuff on each to find the approximate location of the bad blocks.
It depends on the nature of the failure.  If it is a write failure,
then at this time (IIRC) ZFS will panic.  If it is a read failure,
then you may lose access to some files.  There is work underway to
improve ZFS RAS, so you will want to use the latest build to avoid
running into cases which were not handled as well in previous
builds.

Or, get a new disk.  I can get used 40GByte disks at the swap meet
for about $10.
  -- richard

Bill Sommerfeld

2006-Dec-13 21:35 UTC

head link

[zfs-discuss] ZFS on a damaged disk

On Wed, 2006-12-13 at 10:24 -0800, Richard Elling wrote:
> > I''ve seen two cases of disk failure where errors only
occurred during
> > random I/O; all blocks were readable sequentially; in both cases, this
> > permitted the disk to be replaced without data loss and without
> > resorting to backups by doing a "dd" bit image copy to the
replacement
> > drive.
> 
> ouch!  This is a new one for my list.  I suspect a firmware bug.
it wouldn''t be "a" firmware bug -- two different models of
disk in two
very different systems (one was a 2.5" laptop drive, the other was a
3.5" normal size disk in a whitebox 1U rackmount PC).  In both cases
there were significant numbers of hard read errors during normal
operation -- in one case, you couldn''t boot without running into them.

In one of the two cases, the system was turned off overnight, then
turned on the next afternoon for the sequential copy.

Nathan Kroenert

2006-Dec-13 22:00 UTC

head link

[zfs-discuss] ZFS on a damaged disk

On a recent journey of pain and frustration, I had to recover a UFS 
filesystem from a broken disk. The disk had many bad blocks and more 
were going bad over time. Sadly, there were just a few files that I 
wanted, but I could not mount the disk without it killing my system. 
(PATA disks... PITA if you ask me...)

My recovery method, though painful, might be of value in you locating 
the bad regions of the disk.

What I did was to kick off a script that used dd, and did something like 
this...

=========#! /usr/bin/ksh

SEEK=0

while :
do
         dd if=/dev/rdsk/c0d1s7 of=backup.ufs.s7 bs=8192 \
         oseek=${SEEK} iseek=${SEEK} count=1 conv=noerror,sync
         SEEK=$((SEEK + 1))
done
=========
(Or something to that effect.)

Anyhoo - the point is that this hit the disk one block at a time(I chose 
8kb, as it was the ufs block size, and 512 byte blocks looked like it 
would take 3 weeks), and I was ultimately able to get my data back (at 
least the bits I cared about...) after futzing with fsck and some other 
novelties.

If you were to do something similar to this, but instead of copying the 
block, send it to /dev/null, and log the result of dd, you could get a 
complete list of broken blocks.

A few botnotes:
  - Yes. This is slow. WAY slow, and there are thousands of different 
ways that could have done this better and faster. However, it saved me 
from having to do anything else, and at the time, I did not feel like 
breaking out a compiler.  Due to the massively large number of bad 
blocks on my disk, the size of the disk, 160GB, and the number of 
retries my system made for each bad block, it took 10 days (!!) to read 
through the whole disk 8kb at a time.
  - If you are happy to throw away larger blocks of disk, you could use 
a larger block size, which would speed things up.
  - If you disk really does have bad blocks that are getting in the way, 
chances are that it''s going to get worse, and pain will ensue.
I''d
suggest that a new disk might be a better option.
  - On the new disk front, note that many hard disks come with 5 year 
warranties these days. If the disk is not super old, you might be able 
to get it replaced under warranty if you send it directly to the 
manufacturer...

Hope this helps at least provide some ideas. :)

Oh - and.... get a new disk. ;)

Nathan.




Patrick P Korsnick wrote:
> i have a machine with a disk that has some sort of defect and i''ve
found that if i partition only half of the disk that the machine will still
work.  i tried to use ''format'' to scan the disk and find the
bad blocks, but it didn''t work.
> 
> so as i don''t know where the bad blocks are but i''d still
like to use some of the rest of the disk, i thought ZFS might be able to help. 
i partitioned the disk so slices 4,5,6 and 7 are each 5GB.  i thought
i''d make one or multiple zpools on those slices and then i''d
be able to narrow down where the bad sections are.
> 
> so my question is can i declare a zpool that spans multiple c0d0sXX but
isn''t a mirror and if i can, then will zfs be able to detect where the
problem c0d0sXX is and not use it?  if not, i''ll have to make 4
different zpools and experiment with storing stuff on each to find the
approximate location of the bad blocks.
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Dec 2006 - ZFS on a damaged disk

[zfs-discuss] ZFS on a damaged disk

[zfs-discuss] ZFS on a damaged disk

[zfs-discuss] Re: ZFS on a damaged disk

[zfs-discuss] ZFS on a damaged disk

[zfs-discuss] Re: ZFS on a damaged disk

[zfs-discuss] ZFS on a damaged disk

[zfs-discuss] ZFS on a damaged disk

[zfs-discuss] ZFS on a damaged disk

[zfs-discuss] ZFS on a damaged disk