System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 After a recent power failure, I'm seeing this in my logs: Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors And gmirror reports: # gmirror status Name Status Components mirror/gm0 DEGRADED ad0 (100%) ad2 I think the solution is: gmirror rebuild Comments? Searching on that error message, I was led to believe that identifying the bad sector and running dd to read it would cause the HDD to reallocate that bad block. http://smartmontools.sourceforge.net/badblockhowto.html However, since ad2 is one half of a gmirror, I don't think this is the best approach. Comments? More information: smartd, gpart, dh, diskinfo, and fdisk output at http://beta.freebsddiary.org/smart-fixing-bad-sector.php also: # gmirror list Geom name: gm0 State: DEGRADED Components: 2 Balance: round-robin Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 3362720654 Providers: 1. Name: mirror/gm0 Mediasize: 40027028992 (37G) Sectorsize: 512 Mode: r6w5e14 Consumers: 1. Name: ad0 Mediasize: 40027029504 (37G) Sectorsize: 512 Mode: r1w1e1 State: SYNCHRONIZING Priority: 0 Flags: DIRTY, SYNCHRONIZING GenID: 0 SyncID: 1 Synchronized: 100% ID: 949692477 2. Name: ad2 Mediasize: 40027029504 (37G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY, BROKEN GenID: 0 SyncID: 1 ID: 3585934016 -- Dan Langille - http://langille.org
On Aug 19, 2011, at 1:50 PM, Dan Langille wrote:> Searching on that error message, I was led to believe that identifying the bad sector and > running dd to read it would cause the HDD to reallocate that bad block. > > http://smartmontools.sourceforge.net/badblockhowto.html > > However, since ad2 is one half of a gmirror, I don't think this is the best approach. > > Comments?Reading the underlying failing drive with dd will help identify any other questionable sectors. However, your drive temps are too high-- many vendors call out either 50C or 55C as the point where drive reliability becomes significantly degraded. Regards, -- -Chuck
On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:> System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 > > After a recent power failure, I'm seeing this in my logs: > > Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectorsI doubt this is related to a power failure.> Searching on that error message, I was led to believe that identifying the bad sector and > running dd to read it would cause the HDD to reallocate that bad block. > > http://smartmontools.sourceforge.net/badblockhowto.htmlThis is incorrect (meaning you've misunderstood what's written there). Unreadable LBAs can be a result of the LBA being actually bad (as in uncorrectable), or the LBA being marked "suspect". In either case the LBA will return an I/O error when read. If the LBAs are marked "suspect", the drive will perform re-analysis of the LBA (to determine if the LBA can be read and the data re-mapped, or if it cannot then the LBA is marked uncorrectable) when you **write** to the LBA. The above smartd output doesn't tell me much. Providing actual SMART attribute data (smartctl -a) for the drive would help. The brand of the drive, the firmware version, and the model all matter -- every drive behaves a little differently. Furthermore, if the LBA is re-analysed and determined to be uncorrectable -- regardless of remapping -- this doesn't actually fix I/O errors on a filesystem level. The filesystem itself (and more often than not in the data section of the file/inode, so things like fsck can't work around this) can still contain references to the LBA which is uncorrectable, and will still continue to return I/O errors when read. There has to be a way to tell the filesystem, when formatted, "avoid use of this LBA". How UFS/FFS handles this is unknown to me. I know of badsect(8) but I don't know if this works. "Transparent" remapping I have never seen work except on SSDs. If you want me to step you through the procedure of re-testing the LBAs (assuming they're suspect and not uncorrectable) I can do so, just ask. Finding the suspect LBAs can be done using a dd loop (I wrote a shell script for this), or using "smartctl -t select,0-max /dev/XXX" and let the drive's internal selective test see if it can find them. From there it's an issue of submitting a write request to the LBA and seeing what happens (I do this via dd as well, but the parameters you pass it are very specific, e.g. don't mix up/misunderstand seek vs. skip). I've assisted with this time and time again for folks on forums with varying success. I've also found some models of drives which claim there's suspect LBAs yet an internal surface scan passes with no issues (and these are drives which I myself have, the only difference between my drives and the individuals' drive is firmware, which leads me to believe a bug on some drives in the field). -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:> System in question: FreeBSD 8.2-STABLE #3: Thu Mar 3 04:52:04 GMT 2011 > > After a recent power failure, I'm seeing this in my logs: > > Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors >Personally, I'd replace that drive now.> Searching on that error message, I was led to believe that identifying the bad sector and > running dd to read it would cause the HDD to reallocate that bad block.No, as otherwise mentioned (Hi Jeremy!) you need to read and write the block. This could buy you a few more days or a few more weeks. Personally, I would not wait. Your call.> Comments?...> Dan Langille - http://langille.org- Diane -- - db@FreeBSD.org db@db.net http://www.db.net/~db Why leave money to our children if we don't leave them the Earth?
On Aug 20, 2011, at 2:36 PM, Jeremy Chadwick wrote:> Dan, I will respond to your reply sometime tomorrow. I do not have time > to review the Email today (~7.7KBytes), but will have time tomorrow.No worries. Thank you. -- Dan Langille - http://langille.org