thr3ads.net - CentOS - [CentOS] HDD badblocks [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Alessandro Baggi

2016-Jan-17 16:59 UTC

[CentOS] HDD badblocks

Hi list,
I've a notebook with C7 (1511). This notebook has 2 disk (640 GB) and 
I've configured them with MD at level 1. Some days ago I've noticed some
critical slowdown while opening applications.

First of all I've disabled acpi on disks.


I've checked disk for badblocks 4 consecutive times for disk sda and sdb 
and I've noticed a strange behaviour.

On sdb there are not problem but with sda:

1) First run badblocks reports 28 badblocks on disk
2) Second run badblocks reports 32 badblocks
3) Third reports 102 badblocks
4) Last run reports 92 badblocks.


Running smartctl after the last badblocks check I've noticed that 
Current_Pending_Sector was 32 (not 92 as badblocks found).

To force sector reallocation I've filled the disk up to 100%, runned 
again badblocks and 0 badblocks found.
Running again smartctl, Current_Pending_Sector 0 but Reallocated_Event 
Count = 0.

Why each consecutive run of badblocks reports different results?
Why smartctl does not update Reallocated_Event_Count?
Badblocks found on sda increase/decrease without a clean reason. This 
behaviuor can be related with raid (if a disk had badblocks this 
badblock can be replicated on second disk?)?

What other test I can perform to verify disks problems?

Thanks in advance.

Matt Garman

2016-Jan-17 17:05 UTC

head link

[CentOS] HDD badblocks

Have you ran a "long" smart test on the drive?  Smartctl -t long
device

I'm not sure what's going on with your drive. But if it were mine,
I'd want
to replace it. If there are issues, that long smart check ought to turn up
something,  and in my experience, that's enough for a manufacturer to do a
warranty replacement.
On Jan 17, 2016 11:00, "Alessandro Baggi" <alessandro.baggi at
gmail.com>
wrote:
> Hi list,
> I've a notebook with C7 (1511). This notebook has 2 disk (640 GB) and
I've
> configured them with MD at level 1. Some days ago I've noticed some
> critical slowdown while opening applications.
>
> First of all I've disabled acpi on disks.
>
>
> I've checked disk for badblocks 4 consecutive times for disk sda and
sdb
> and I've noticed a strange behaviour.
>
> On sdb there are not problem but with sda:
>
> 1) First run badblocks reports 28 badblocks on disk
> 2) Second run badblocks reports 32 badblocks
> 3) Third reports 102 badblocks
> 4) Last run reports 92 badblocks.
>
>
> Running smartctl after the last badblocks check I've noticed that
> Current_Pending_Sector was 32 (not 92 as badblocks found).
>
> To force sector reallocation I've filled the disk up to 100%, runned
again
> badblocks and 0 badblocks found.
> Running again smartctl, Current_Pending_Sector 0 but Reallocated_Event
> Count = 0.
>
> Why each consecutive run of badblocks reports different results?
> Why smartctl does not update Reallocated_Event_Count?
> Badblocks found on sda increase/decrease without a clean reason. This
> behaviuor can be related with raid (if a disk had badblocks this badblock
> can be replicated on second disk?)?
>
> What other test I can perform to verify disks problems?
>
> Thanks in advance.
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>

Brandon Vincent

2016-Jan-17 17:46 UTC

head link

[CentOS] HDD badblocks

On Sun, Jan 17, 2016 at 10:05 AM, Matt Garman <matthew.garman at
gmail.com> wrote:> I'm not sure what's going on with your drive. But if it were mine,
I'd want
> to replace it. If there are issues, that long smart check ought to turn up
> something,  and in my experience, that's enough for a manufacturer to
do a
> warranty replacement.
I agree with Matt. Go ahead and run a few of the S.M.A.R.T. tests. I
can almost guarantee based off of your description of your problem
that they will fail.

badblocks(8) is a very antiquated tool. Almost every hard drive has a
few bad sectors from the factory. Very old hard drives used to have a
list of the bad sectors printed on the front of the label. When you
first created a filesystem you had to enter all of the bad sectors
from the label so that the filesystem wouldn't store data there. Years
later, more bad sectors would form and you could enter them into the
filesystem by discovering them using a tool like badblocks(8).

Today, drives do all of this work automatically. The manufacturer of a
hard drive will scan the entire surface and write the bad sectors into
a section of the hard drive's electronics known as the P-list. The
controller on the drive will automatically remap these sectors to a
small area of unused sectors set aside for this very purpose. Later if
more bad sectors form, hard drives when they see a bad sector will
enter it into a list known as the G-list and then remap this sector to
other sectors in the unused area of the drive I mentioned earlier.

Basically under normal conditions, the end user should NEVER see bad
sectors from their perspective. If badblocks(8) is reporting bad
sectors, it is very likely that enough bad sectors have formed to the
point where the unused reserved sectors is depleted of replacement
sectors. While in theory you could run badblocks(8) and pass it to the
filesystem, I can ensure you that the growth of bad sectors at this
point has reached a point in which it will continue.

I'd stop using that hard drive, pull any important data, and then
proceed to run S.M.A.R.T. tests so if the drive is under warranty you
can have it replaced.

Brandon Vincent

Warren Young

2016-Jan-19 22:24 UTC

head link

[CentOS] HDD badblocks

On Jan 17, 2016, at 9:59 AM, Alessandro Baggi <alessandro.baggi at
gmail.com> wrote:> 
> On sdb there are not problem but with sda:
> 
> 1) First run badblocks reports 28 badblocks on disk
> 2) Second run badblocks reports 32 badblocks
> 3) Third reports 102 badblocks
> 4) Last run reports 92 badblocks.
It?s dying.  Replace it now.

On a modern hard disk, you should *never* see bad sectors, because the drive is
busy hiding all the bad sectors it does find, then telling you everything is
fine.

Once the drive has swept so many problems under the rug that it is forced to
admit to normal user space programs (e.g. badblocks) that there are bad sectors,
it?s because the spare sector pool is full.  At that point, the only safe
remediation is to replace the disk.
> Running smartctl after the last badblocks check I've noticed that
Current_Pending_Sector was 32 (not 92 as badblocks found).
SMART is allowed to lie to you.  That?s why there?s the RAW_VALUE column, yet
there is no explanation in the manual as to what that value means.  The reason
is, the low-level meanings of these values are documented by the drive
manufacturers.  ?92? is not necessarily a sector count.  For all you know, it is
reporting that there are currently 92 lemmings in midair off the fjords of
Finland.

The only important results here are:

a) the numbers are nonzero
b) the numbers are changing

That is all.  A zero value just means it hasn?t failed *yet*, and a static
nonzero value means the drive has temporarily arrested its failures-in-progress.

There is no such thing as a hard drive with zero actual bad sectors, just one
that has space left in its spare sector pool.  A ?working? drive is one that is
swapping sectors from the spare pool rarely enough that it is expected not to
empty the pool before the warranty expires.
> Why each consecutive run of badblocks reports different results?
Because physics.  The highly competitive nature of the HDD business plus the
relentless drive of Moore?s Business Law ? as it should be called, since it is
not a physical law, just an arbitrary fiction that the tech industry has bought
into as the ground rules for the game ? pushes the manufacturers to design them
right up against the ragged edge of functionality.

HDD manufacturers could solve all of this by making them with 1/4 the capacity
and twice the cost and get 10x the reliability.  And they do: they?re called SAS
drives. :)
> Why smartctl does not update Reallocated_Event_Count?
Because SMART lies.
> What other test I can perform to verify disks problems?
Quit poking the tiger to see if it will bite you.  Replace the bad disk and
resilver that mirror before you lose the other disk, too.

John R Pierce

2016-Jan-19 22:48 UTC

head link

[CentOS] HDD badblocks

On 1/19/2016 2:24 PM, Warren Young wrote:> It?s dying.  Replace it now.
agreed
> On a modern hard disk, you should*never*  see bad sectors, because the
drive is busy hiding all the bad sectors it does find, then telling you
everything is fine.
thats not actually true.    the drive will report 'bad sector' if you 
try and read data that the drive simply can't read.   you wouldn't want 
it to return bad data and say its OK.     many(most?) drives won't 
actually remap to a bad sector until you write new data over that block 
number, since they don't want to copy bad data without any way of 
telling the OS the data is invalid.     these pending remaps are listed 
under smart parameter 197 Current_Pending_Sector



-- 
john r pierce, recycling bits in santa cruz

Chris Murphy

2016-Jan-19 23:46 UTC

head link

[CentOS] HDD badblocks

On Tue, Jan 19, 2016 at 3:24 PM, Warren Young <wyml at etr-usa.com> wrote:
> On a modern hard disk, you should *never* see bad sectors, because the
drive is busy hiding all the bad sectors it does find, then telling you
everything is fine.
This is not a given. Misconfiguration can make persistent bad sectors
very common, and this misconfiguration is the default situation in
RAID setups on Linux, which is why it's so common. This, and user
error, are the top causes for RAID 5 implosion on Linux (both mdadm
and lvm raid). The necessary sequence:

1. The drive needs to know the sector is bad.
2. The drive needs to be asked to read that sector.
3. The drive needs to give up trying to read that sector.
4. The drive needs to report the sector LBA back to the OS.
5. The OS needs to write something back to that same LBA.
6. The drive will write to the sector, and if it fails, will remap the
LBA to a different (reserve) physical sector.

Where this fails on Linux is step 3 and 4. By default consumer drives
either don't support SCT ERC, such as in the case in this thread, or
it's disabled. That condition means the time out for deep recovery of
bad sectors can be very high, 2 or 3 minutes. Usually it's less than
this, but often it's more than the kernel's default SCSI command
timer. When a command to the drive doesn't complete successfully in
the default of 30 seconds, the kernel resets the link to the drive,
which obliterates the entire command queue contents and the work it
was doing to recover the bad sector. Therefore step 4 never happens,
and no steps after that either.

Hence, bad sectors accumulate. And the consequence of this often
doesn't get figured out until a user looks at kernel messages and sees
a bunch of hard link resets and has a WTF moment, and asks questions.
More often they don't see those reset messages, or they don't ask
about them, so the next consequence is a drive fails. When it's a
drive other than one with bad sectors, in effect there are two bad
strips per stripe during reads (including rebuild) and that's when
there's total array collapse even though there was only one bad drive.
As a mask for this problem people are using RAID 6, but it's still a
misconfiguration that can cause RAID6 failures too.

>> Why smartctl does not update Reallocated_Event_Count?
>
> Because SMART lies.
Nope. The drive isn't being asked to write to those bad sectors. If it
can't successfully read the sector without error, it won't migrate the
data on its own (some drives never do this). So it necessitates a
write to the sector to cause the remap to happen.

The other thing is the bad sector count on 512e AF drives is inflated.
The number of bad sectors is in 512 byte sector increments. But there
is no such thing on an AF drive. One bad physical sector will be
reported as 8 bad sectors. And to fix the problem it requires writing
exactly all 8 of those logical sectors at one time in a single command
to the drive. Ergo I've had 'dd if=/dev/zero of=/dev/sda seek=blah
count=8' fail with a read error, due to the command being internally
reinterpreted as read-modify-write. Ridiculous but true. So you have
to use bs=4096 and count=1, and of course adjust seek LBA to be based
on 4096 bytes instead of 512.

So the simplest fix here is:

echo 160 /sys/block/sdX/device/timeout/

That's needed for each member drive. Note this is not a persistent
setting. And then this:

echo repair > /sys/block/mdX/md/sync_action

That's once. You'll see the read errors in dmesg, and md writing back
to the drive with the bad sector.

This problem affects all software raid, including btrfs raid1. The
ideal scenario is you'll use 'smartctl -l scterc,70,70 /dev/sdX' in
startup script, so the drive fails reads on marginally bad sectors
with an error in 7 seconds maximum.

The linux-raid@ list if chock full of this as a recurring theme.

-- 
Chris Murphy

Lamar Owen

2016-Jan-20 14:16 UTC

head link

[CentOS] HDD badblocks

On 01/19/2016 06:46 PM, Chris Murphy wrote:> Hence, bad sectors accumulate. And the consequence of this often
> doesn't get figured out until a user looks at kernel messages and sees
> a bunch of hard link resets....
The standard Unix way of refreshing the disk contents is with badblocks' 
non-destructive read-write test (badblocks -n or as the -cc option to 
e2fsck, for ext2/3/4 filesystems).  The remap will happen on the 
writeback of the contents.  It's been this way with enterprise SCSI 
drives for as long as I can remember there being enterprise-class SCSI 
drives.  ATA drives caught up with the SCSI ones back in the early 90's 
with this feature.  But it's always been true, to the best of my 
recollection, that the remap always happens on a write.  The rationale 
is pretty simple: only on a write error does the drive know that it has 
the valid data in its buffer, and so that's the only safe time to put 
the data elsewhere.
> This problem affects all software raid, including btrfs raid1. The
> ideal scenario is you'll use 'smartctl -l scterc,70,70
/dev/sdX' in
> startup script, so the drive fails reads on marginally bad sectors
> with an error in 7 seconds maximum.
>This is partly why enterprise arrays manage their own per-sector ECC and 
use 528-byte sector sizes.  The drives for these arrays make very poor 
workstation standalone drives, since the drive is no longer doing all 
the error recovery itself, but relying on the storage processor to do 
the work.  Now, the drive is still doing some basic ECC on the sector, 
but the storage processor is getting a much better idea of the health of 
each sector than when the drive's firmware is managing remap.  
Sophisticated enterprise arrays, like NetApp's, EMC's, and Nimble's,
can
do some very accurate predictions and proactive hotsparing when needed.  
That's part of what you pay for when you buy that sort of array.

But the other fact of life of modern consumer-level hard drives is that 
*errored sectors are expected* and not exceptions.  Why else would a 
drive have a TLER in the two minute range like many of the WD Green 
drives do?  And with a consumer-level drive I would be shocked if 
badblocks reported the same number each time it ran through.

Lamar Owen

2016-Jan-20 14:26 UTC

head link

[CentOS] HDD badblocks

On 01/19/2016 06:29 PM, J Martin Rushton wrote:> (Off topic) I also
> remember seeing engineers determine which memory chip was at fault and
> replacing the chip using a soldering iron.  Try that on a DIMM!
As long as the DIMM isn't populated with BGA packages it's about a 
ten-minute job with a hot air rework station, which will only cost you 
around $100 or so if you shop around (and if you have a relatively 
steady hand and either good eyes or a good magnifier). It's doable in a 
DIY way even with BGA, but takes longer and you need a reballing mask 
for that specific package to make it work right.  Any accurately 
controlled oven is good enough to do the reflow (and baking Xbox boards 
is essentially doing a reflow......)

Yeah, I prefer tubes and discretes and through-hole PCB's myself, but at 
this point I've acquired a hot air station and am getting up to speed on 
surface mount, and am finding that it's not really that hard, just 
different.

This is not that different from getting up to speed with something 
really new and different, like systemd.  It just requires being willing 
to take a different approach to the problem.  BGA 
desoldering/resoldering requires a whole different way of looking at the 
soldering operation, that's all.

Chris Murphy

2016-Jan-20 18:43 UTC

head link

[CentOS] HDD badblocks

On Wed, Jan 20, 2016, 7:17 AM Lamar Owen <lowen at pari.edu> wrote:
> On 01/19/2016 06:46 PM, Chris Murphy wrote:
> > Hence, bad sectors accumulate. And the consequence of this often
> > doesn't get figured out until a user looks at kernel messages and
sees
> > a bunch of hard link resets....
>
> The standard Unix way of refreshing the disk contents is with
badblocks'
> non-destructive read-write test (badblocks -n or as the -cc option to
> e2fsck, for ext2/3/4 filesystems).

This isn't applicable to RAID, which is what this thread is about. For
RAID, use scrub, that's what is for.

The badblocks method fixes nothing if the sector is persistently bad and
the drive reports a read error. It fixes nothing if the command timeout is
reached before the drive either recovers or reports a read error. And even
if it works, you're relying on ECC recovered data rather than reading a
likely good copy from mirror or parity and writing that back to the bad
block.

But all of this still requires the proper configuration.

The remap will happen on the> writeback of the contents.  It's been this way with enterprise SCSI
> drives for as long as I can remember there being enterprise-class SCSI
> drives.  ATA drives caught up with the SCSI ones back in the early 90's
> with this feature.  But it's always been true, to the best of my
> recollection, that the remap always happens on a write.

Properly configured, first a read error happens which includes the LBA of
the bad sector. The md driver needs that LBA to know how to find a good
copy of data from mirror or from parity. *Then* it weird to the bad LBA.

In the case of misconfiguration, the command timeout expiration and link
reset prevents the kernel from knowing the LBA if the bad sector and
therefore repair isn't possible.

The rationale> is pretty simple: only on a write error does the drive know that it has
> the valid data in its buffer, and so that's the only safe time to put
> the data elsewhere.
>
> > This problem affects all software raid, including btrfs raid1. The
> > ideal scenario is you'll use 'smartctl -l scterc,70,70
/dev/sdX' in
> > startup script, so the drive fails reads on marginally bad sectors
> > with an error in 7 seconds maximum.
> >
> This is partly why enterprise arrays manage their own per-sector ECC and
> use 528-byte sector sizes.

Not all enterprise drives have 520/528 byte sectors. Those that do are
using T10-PI (formerly DIF) and it requires software support too. It's
pretty rare. It's 8000% easier to use ZFS on Linux or Btrfs.

> But the other fact of life of modern consumer-level hard drives is that
> *errored sectors are expected* and not exceptions.  Why else would a
> drive have a TLER in the two minute range like many of the WD Green
> drives do?  And with a consumer-level drive I would be shocked if
> badblocks reported the same number each time it ran through.
>
All drives expect bad sectors. Consumer drives reporting a read error will
put the host OS into an inconsistent state, so it should be avoided.
Becoming slow is better than implosion. And neither OS X or Windows do link
resets after merely 30 seconds either.

Chris Murphy

Lamar Owen

2016-Jan-21 16:27 UTC

head link

[CentOS] HDD badblocks

On 01/20/2016 01:43 PM, Chris Murphy wrote:> On Wed, Jan 20, 2016, 7:17 AM Lamar Owen <lowen at pari.edu> wrote:
>
>> The standard Unix way of refreshing the disk contents is with 
>> badblocks' non-destructive read-write test (badblocks -n or as the 
>> -cc option to e2fsck, for ext2/3/4 filesystems). 
>
> This isn't applicable to RAID, which is what this thread is about. For
> RAID, use scrub, that's what is for.
The badblocks read/write verification would need to be done on the RAID 
member devices, not the aggregate md device, for member device level 
remap.  It might need to be done with the md offline, not sure.  Scrub?  
There is a scrub command (and package) in CentOS, but it's meant for 
secure data erasure, and is not a non-destructive thing.  Ah, you're 
talking about what md will do if 'check' or 'repair' is written
to the
appropriate location in the sysfs for the md in question.  (This info is 
in the md(4) man page).
> The badblocks method fixes nothing if the sector is persistently bad and
> the drive reports a read error.
The badblocks method will do a one-off read/write verification on a 
member device; no, it won't do it automatically, true enough.
> It fixes nothing if the command timeout is
> reached before the drive either recovers or reports a read error.
Very true.
> And even
> if it works, you're relying on ECC recovered data rather than reading a
> likely good copy from mirror or parity and writing that back to the bad
> block.
Yes, for the member drive this is true.  Since my storage here is 
primarily on EMC Clariion, I'm not sure what the equivalent to EMC's 
background verify would be for mdraid, since I've not needed that 
functionality from mdraid.  (I really don't like the term 'software 
RAID' since at some level all RAID is software RAID, whether on a 
storage processor or in the RAID controller's firmware.....).  It does 
appear that triggering a scrub from sysfs for a particular md might be 
similar functionality, and would do the remap if inconsistent data is 
found.  This is a bit different from the old Unix way, but these are 
newer times and such the way of doing things is different.
> But all of this still requires the proper configuration.Yes, this is very true.

Chris Murphy

2016-Jan-23 20:41 UTC

head link

[CentOS] HDD badblocks

On Thu, Jan 21, 2016 at 9:27 AM, Lamar Owen <lowen at pari.edu>
wrote:> On 01/20/2016 01:43 PM, Chris Murphy wrote:
>>
>> On Wed, Jan 20, 2016, 7:17 AM Lamar Owen <lowen at pari.edu>
wrote:
>>
>>> The standard Unix way of refreshing the disk contents is with
badblocks'
>>> non-destructive read-write test (badblocks -n or as the -cc option
to
>>> e2fsck, for ext2/3/4 filesystems).
>>
>>
>> This isn't applicable to RAID, which is what this thread is about.
For
>> RAID, use scrub, that's what is for.
>
>
> The badblocks read/write verification would need to be done on the RAID
> member devices, not the aggregate md device, for member device level remap.
> It might need to be done with the md offline, not sure.  Scrub?  There is a
> scrub command (and package) in CentOS, but it's meant for secure data
> erasure, and is not a non-destructive thing.  Ah, you're talking about
what
> md will do if 'check' or 'repair' is written to the
appropriate location in
> the sysfs for the md in question.  (This info is in the md(4) man page).

Correct.




-- 
Chris Murphy

Seemingly Similar Threads

Search for more apparently analagous threads

CentOS - Jan 2016 - HDD badblocks

[CentOS] HDD badblocks

[CentOS] HDD badblocks

[CentOS] HDD badblocks

[CentOS] HDD badblocks

[CentOS] HDD badblocks

[CentOS] HDD badblocks

[CentOS] HDD badblocks

[CentOS] HDD badblocks

[CentOS] HDD badblocks

[CentOS] HDD badblocks

[CentOS] HDD badblocks

Seemingly Similar Threads