thr3ads.net - freebsd stable - bad sector in gmirror HDD [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Dan Langille

2011-Aug-19 21:09 UTC

bad sector in gmirror HDD

System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011

After a recent power failure, I'm seeing this in my logs:

Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable
(pending) sectors

And gmirror reports:

# gmirror status
      Name    Status  Components
mirror/gm0  DEGRADED  ad0 (100%)
                      ad2

I think the solution is: gmirror rebuild

Comments?



Searching on that error message, I was led to believe that identifying the bad
sector and
running dd to read it would cause the HDD to reallocate that bad block.

  http://smartmontools.sourceforge.net/badblockhowto.html

However, since ad2 is one half of a gmirror, I don't think this is the best
approach.

Comments?




More information:

smartd, gpart, dh, diskinfo, and fdisk output at
http://beta.freebsddiary.org/smart-fixing-bad-sector.php

also:

# gmirror list
Geom name: gm0
State: DEGRADED
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 3362720654
Providers:
1. Name: mirror/gm0
   Mediasize: 40027028992 (37G)
   Sectorsize: 512
   Mode: r6w5e14
Consumers:
1. Name: ad0
   Mediasize: 40027029504 (37G)
   Sectorsize: 512
   Mode: r1w1e1
   State: SYNCHRONIZING
   Priority: 0
   Flags: DIRTY, SYNCHRONIZING
   GenID: 0
   SyncID: 1
   Synchronized: 100%
   ID: 949692477
2. Name: ad2
   Mediasize: 40027029504 (37G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY, BROKEN
   GenID: 0
   SyncID: 1
   ID: 3585934016



-- 
Dan Langille - http://langille.org

Chuck Swiger

2011-Aug-19 21:52 UTC

head link

bad sector in gmirror HDD

On Aug 19, 2011, at 1:50 PM, Dan Langille wrote:> Searching on that error message, I was led to believe that identifying the
bad sector and
> running dd to read it would cause the HDD to reallocate that bad block.
> 
>  http://smartmontools.sourceforge.net/badblockhowto.html
> 
> However, since ad2 is one half of a gmirror, I don't think this is the
best approach.
> 
> Comments?
Reading the underlying failing drive with dd will help identify any other
questionable sectors.  However, your drive temps are too high-- many vendors
call out either 50C or 55C as the point where drive reliability becomes
significantly degraded.

Regards,
-- 
-Chuck

Jeremy Chadwick

2011-Aug-19 23:21 UTC

head link

bad sector in gmirror HDD

On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille
wrote:> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011
> 
> After a recent power failure, I'm seeing this in my logs:
> 
> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable
(pending) sectors
I doubt this is related to a power failure.
> Searching on that error message, I was led to believe that identifying the
bad sector and
> running dd to read it would cause the HDD to reallocate that bad block.
> 
>   http://smartmontools.sourceforge.net/badblockhowto.html
This is incorrect (meaning you've misunderstood what's written there).

Unreadable LBAs can be a result of the LBA being actually bad (as in
uncorrectable), or the LBA being marked "suspect".  In either case the
LBA will return an I/O error when read.

If the LBAs are marked "suspect", the drive will perform re-analysis
of
the LBA (to determine if the LBA can be read and the data re-mapped, or
if it cannot then the LBA is marked uncorrectable) when you **write** to
the LBA.

The above smartd output doesn't tell me much.  Providing actual SMART
attribute data (smartctl -a) for the drive would help.  The brand of the
drive, the firmware version, and the model all matter -- every drive
behaves a little differently.

Furthermore, if the LBA is re-analysed and determined to be
uncorrectable -- regardless of remapping -- this doesn't actually fix
I/O errors on a filesystem level.  The filesystem itself (and more often
than not in the data section of the file/inode, so things like fsck
can't work around this) can still contain references to the LBA which is
uncorrectable, and will still continue to return I/O errors when read.
There has to be a way to tell the filesystem, when formatted, "avoid use
of this LBA".  How UFS/FFS handles this is unknown to me.  I know of
badsect(8) but I don't know if this works.  "Transparent"
remapping I
have never seen work except on SSDs.

If you want me to step you through the procedure of re-testing the LBAs
(assuming they're suspect and not uncorrectable) I can do so, just ask.
Finding the suspect LBAs can be done using a dd loop (I wrote a shell
script for this), or using "smartctl -t select,0-max /dev/XXX" and let
the drive's internal selective test see if it can find them.  From there
it's an issue of submitting a write request to the LBA and seeing what
happens (I do this via dd as well, but the parameters you pass it are
very specific, e.g. don't mix up/misunderstand seek vs. skip).

I've assisted with this time and time again for folks on forums with
varying success.

I've also found some models of drives which claim there's suspect LBAs
yet an internal surface scan passes with no issues (and these are drives
which I myself have, the only difference between my drives and the
individuals' drive is firmware, which leads me to believe a bug on some
drives in the field).

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |

Diane Bruce

2011-Aug-20 00:15 UTC

head link

bad sector in gmirror HDD

On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille
wrote:> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011
> 
> After a recent power failure, I'm seeing this in my logs:
> 
> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable
(pending) sectors
> 
Personally, I'd replace that drive now. 
> Searching on that error message, I was led to believe that identifying the
bad sector and
> running dd to read it would cause the HDD to reallocate that bad block.
No, as otherwise mentioned (Hi Jeremy!) you need to read and write the
block. This could buy you a few more days or a few more weeks. Personally,
I would not wait. Your call.
 > Comments?
...> Dan Langille - http://langille.org
- Diane
-- 
- db@FreeBSD.org db@db.net http://www.db.net/~db
  Why leave money to our children if we don't leave them the Earth?

Dan Langille

2011-Aug-20 18:40 UTC

head link

bad sector in gmirror HDD

On Aug 20, 2011, at 2:36 PM, Jeremy Chadwick wrote:
> Dan, I will respond to your reply sometime tomorrow.  I do not have time
> to review the Email today (~7.7KBytes), but will have time tomorrow.

No worries.  Thank you.

-- 
Dan Langille - http://langille.org

freebsd stable - Aug 2011 - bad sector in gmirror HDD

bad sector in gmirror HDD

bad sector in gmirror HDD

bad sector in gmirror HDD

bad sector in gmirror HDD

bad sector in gmirror HDD