thr3ads.net - zfs discuss - [zfs-discuss] Repairing known bad disk blocks before zfs encounters them [Apr 2008]

If this information is useful, please help other people find it:
Share via:

David

2008-Apr-15 19:54 UTC

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

I have some code that implements background media scanning so I am able to
detect bad blocks well before zfs encounters them.   I need a script or
something that will map the known bad block(s) to a logical block so I can force
zfs to repair the bad block from redundant/parity data.

I can''t find anything that isn''t part of a draconian
scanning/repair mechanism.   Granted the zfs architecture can map physical block
X to logical block Y, Z, and other letters of the alphabet .. but I want to go
backwards.

2nd part of the question .. assuming I know /dev/dsk/c0t0d0 has an ECC error on
block n, and I now have the appropriate storage pool info & offset that
corresponds to that block, then how do I force the file system to repair the
offending block.

This was easy to address in LINUX assuming the filesystem was built on the
/dev/md driver, because all I had to do is force a read and twiddle with the
parameters to force a non-cached I/O and subsequent repair.

It seems as if zfs''s is too smart for it''s own good and
won''t let me fix something that I know is bad, before zfs has a chance
to discover it for itself.   :)
 
 
This message posted from opensolaris.org

Richard Elling

2008-Apr-16 17:33 UTC

head link

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

David wrote:> I have some code that implements background media scanning so I am able to
detect bad blocks well before zfs encounters them.   I need a script or
something that will map the known bad block(s) to a logical block so I can force
zfs to repair the bad block from redundant/parity data.
>
> I can''t find anything that isn''t part of a draconian
scanning/repair mechanism.   Granted the zfs architecture can map physical block
X to logical block Y, Z, and other letters of the alphabet .. but I want to go
backwards.
>
> 2nd part of the question .. assuming I know /dev/dsk/c0t0d0 has an ECC
error on block n, and I now have the appropriate storage pool info & offset
that corresponds to that block, then how do I force the file system to repair
the offending block.
>
> This was easy to address in LINUX assuming the filesystem was built on the
/dev/md driver, because all I had to do is force a read and twiddle with the
parameters to force a non-cached I/O and subsequent repair.
>   
Just read it.
 -- richard
> It seems as if zfs''s is too smart for it''s own good and
won''t let me fix something that I know is bad, before zfs has a chance
to discover it for itself.   :)
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Robert Milkowski

2008-Apr-16 19:43 UTC

head link

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

Hello Richard,

Wednesday, April 16, 2008, 6:33:05 PM, you wrote:

RE> David wrote:>> I have some code that implements background media scanning so I am able
to detect bad blocks well before zfs encounters them.   I need a script or
something that will map the known bad block(s) to a logical block so I can force
zfs to repair the bad block from redundant/parity data.
>>
>> I can''t find anything that isn''t part of a draconian
scanning/repair mechanism.   Granted the zfs architecture can map physical block
X to logical block Y, Z, and other letters of the alphabet .. but I want to go
backwards.
>>
>> 2nd part of the question .. assuming I know /dev/dsk/c0t0d0 has an ECC
error on block n, and I now have the appropriate storage pool info & offset
that corresponds to that block, then how do I force the file system to repair
the offending block.
>>
>> This was easy to address in LINUX assuming the filesystem was built on
the /dev/md driver, because all I had to do is force a read and twiddle with the
parameters to force a non-cached I/O and subsequent repair.
>>   
RE> Just read it.

or even use zpool scrub in a first place...

-- 
Best regards,
 Robert Milkowski                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

Richard Elling

2008-Apr-16 21:46 UTC

head link

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

David Lethe wrote:> Read ... What?   All I have is block x on physical device y.   Granted zfs
recalculates parity, but zfs won''t do this unless I can read the
appropriate storage pool and offset.
>   
Read the data (file) which is using the block.
When you scrub a ZFS file system, if a bad block is detected and
it cannot be fixed, then zfs status -v will show which file is affected.
Scrubbing works by reading, which automatically does checksum
validation and stimulates the repair process, as needed.

It seems to me that you are trying to do the same thing, but from
the bottom up.  You might take a look at how bad blocks are
simulated in zdb (-B option).  zdb isn''t well documented beyond
the source, though.
 -- richard
> Or are you saying that if do a raw read on the bad block via, dd for
instance that zfs will monitor the sensecode for the read error and magically
fix it?
>  
> Is there a completely different technique I can use that let''s me
inform the os that block x, drive y is corrupt ... that zfs will take notice of
and initiate a targeted parity rebuild?  or a way to do a limited badblock scan
for physical drive where I can pass the starting and ending blocks?
>
>
>
> -----Original Message-----
>
> From:  "Richard Elling" <Richard.Elling at Sun.COM>
> Subj:  Re: [zfs-discuss] Repairing known bad disk blocks before zfs
encounters them
> Date:  Wed Apr 16, 2008 12:36 pm
> Size:  1K
> To:  "David" <david at santools.com>
> cc:  "zfs-discuss at opensolaris.org" <zfs-discuss at
opensolaris.org>
>
> David wrote: 
>   
>> I have some code that implements background media scanning so I am able
to detect bad blocks well before zfs encounters them.   I need a script or
something that will map the known bad block(s) to a logical block so I can force
zfs to repair the bad block from redundant/parity data.
>>
>> I can''t find anything that isn''t part of a draconian
scanning/repair mechanism.   Granted the zfs architecture can map physical block
X to logical block Y, Z, and other letters of the alphabet .. but I want to go
backwards.
>>
>> 2nd part of the question .. assuming I know /dev/dsk/c0t0d0 has an ECC
error on block n, and I now have the appropriate storage pool info & offset
that corresponds to that block, then how do I force the file system to repair
the offending block.
>>
>> This was easy to address in LINUX assuming the filesystem was built on
the /dev/md driver, because all I had to do is force a read and twiddle with the
parameters to force a non-cached I/O and subsequent repair.
>>    
>>     
>  
> Just read it. 
>  -- richard 
>  
>   
>> It seems as if zfs''s is too smart for it''s own good
and won''t let me fix something that I know is bad, before zfs has a
chance to discover it for itself.   :)
>>   
>>   
>> This message posted from opensolaris.org 
>> _______________________________________________ 
>> zfs-discuss mailing list 
>> zfs-discuss at opensolaris.org 
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 
>>    
>>     
>  
>  
>
>
>

Brandon High

2008-Apr-16 22:23 UTC

head link

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

On Tue, Apr 15, 2008 at 12:54 PM, David <david at santools.com>
wrote:> I have some code that implements background media scanning so I am able to
detect bad blocks well before zfs encounters them.   I need a script or
something that will map the known bad block(s) to a logical block so I can force
zfs to repair the bad block from redundant/parity data.
Why are you doing this background scanning, and how is it better than
the ECC offered by zfs? While your solution is valid for one
environment (md on linux) it may not be application, required, or
desirable with zfs.

If there''s data on the bad blocks and you''re using the disks
in a
mirror, raidz or raidz2 pool, any read of those blocks (or performing
a scrub) would correct the problem. Of course, this won''t help if
you''re not using a mirror or pool with parity, but you mentioned that
you use md on linux now, so I assume you will be.

One advantage that zfs holds over md and other raid solutions is that
the data is checksummed, so you''ll always know that the data you got
back was what you had originally written. There''s no need to check the
entire mirror or parity volume ahead of time. Running a scrub will
verifiy the blocks that are actually in use and works with the i/o
scheduler, so should have a lower impact on performance.

-B

-- 
Brandon High bhigh at freaks.com
"The good is the enemy of the best." - Nietzsche

Bart Smaalders

2008-Apr-17 02:08 UTC

head link

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

Richard Elling wrote:> David wrote:
>> I have some code that implements background media scanning so I am able
to detect bad blocks well before zfs encounters them.   I need a script or
something that will map the known bad block(s) to a logical block so I can force
zfs to repair the bad block from redundant/parity data.
>>
>> I can''t find anything that isn''t part of a draconian
scanning/repair mechanism.   Granted the zfs architecture can map physical block
X to logical block Y, Z, and other letters of the alphabet .. but I want to go
backwards.
>>
>> 2nd part of the question .. assuming I know /dev/dsk/c0t0d0 has an ECC
error on block n, and I now have the appropriate storage pool info & offset
that corresponds to that block, then how do I force the file system to repair
the offending block.
>>
>> This was easy to address in LINUX assuming the filesystem was built on
the /dev/md driver, because all I had to do is force a read and twiddle with the
parameters to force a non-cached I/O and subsequent repair.
>>   
> 
> Just read it.
>  -- richard
> 
>> It seems as if zfs''s is too smart for it''s own good
and won''t let me fix something that I know is bad, before zfs has a
chance to discover it for itself.   :)
>>  
>>  
I think what the OP was saying is that he somehow knows that an
unallocated block on the disk is bad, and he''d like to tell ZFS about
it
ahead of time.

But repair implies there''s data to read on the disk; ZFS won''t
read disk
blocks it didn''t write.

- Bart


-- 
Bart Smaalders			Solaris Kernel Performance
barts at cyber.eng.sun.com		http://blogs.sun.com/barts
"You will contribute more with mercurial than with thunderbird."

Anton B. Rang

2008-Apr-17 06:01 UTC

head link

[zfs-discuss] Repairing known bad disk blocks before zfs

If you don''t do background scrubbing, you don''t know about bad
blocks in advance. If you''re running RAID-Z, this means you''ll
lose data if a block is unreadable and another device goes bad. This is the
point of scrubbing, it lets you repair the problem while you still have
redundancy. :-)

Whether it''s better to use ''zfs scrub'' vs. reading
the whole device depends on your environment. One issue with zfs scrub is that
it generates effectively small random i/o. This is a good thing if
you''re in a heavily loaded environment, since the i/o can be scheduled
amidst everything else. It''s not a good thing if you''re in a
lightly loaded environment, since it''s much slower than just reading
the whole disk sequentially. So if, for instance, you have a "down
time" at night (e.g. university environment), reading the whole device at
that time would be more efficient.
 
 
This message posted from opensolaris.org

Robert Milkowski

2008-Apr-17 08:54 UTC

head link

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

Hello Brandon,

Wednesday, April 16, 2008, 11:23:48 PM, you wrote:

BH> On Tue, Apr 15, 2008 at 12:54 PM, David <david at santools.com>
wrote:>> I have some code that implements background media scanning so I am able
to detect bad blocks well before zfs encounters them.   I need a script or
something that will map the known bad block(s) to a logical block so I can force
zfs to repair the bad block from redundant/parity data.
BH> Why are you doing this background scanning, and how is it better than
BH> the ECC offered by zfs? While your solution is valid for one

it is not ECC

-- 
Best regards,
 Robert                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

zfs discuss - Apr 2008 - Repairing known bad disk blocks before zfs encounters them

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them

[zfs-discuss] Repairing known bad disk blocks before zfs

[zfs-discuss] Repairing known bad disk blocks before zfs encounters them