Hi list, I've a notebook with C7 (1511). This notebook has 2 disk (640 GB) and I've configured them with MD at level 1. Some days ago I've noticed some critical slowdown while opening applications. First of all I've disabled acpi on disks. I've checked disk for badblocks 4 consecutive times for disk sda and sdb and I've noticed a strange behaviour. On sdb there are not problem but with sda: 1) First run badblocks reports 28 badblocks on disk 2) Second run badblocks reports 32 badblocks 3) Third reports 102 badblocks 4) Last run reports 92 badblocks. Running smartctl after the last badblocks check I've noticed that Current_Pending_Sector was 32 (not 92 as badblocks found). To force sector reallocation I've filled the disk up to 100%, runned again badblocks and 0 badblocks found. Running again smartctl, Current_Pending_Sector 0 but Reallocated_Event Count = 0. Why each consecutive run of badblocks reports different results? Why smartctl does not update Reallocated_Event_Count? Badblocks found on sda increase/decrease without a clean reason. This behaviuor can be related with raid (if a disk had badblocks this badblock can be replicated on second disk?)? What other test I can perform to verify disks problems? Thanks in advance.
Have you ran a "long" smart test on the drive? Smartctl -t long device I'm not sure what's going on with your drive. But if it were mine, I'd want to replace it. If there are issues, that long smart check ought to turn up something, and in my experience, that's enough for a manufacturer to do a warranty replacement. On Jan 17, 2016 11:00, "Alessandro Baggi" <alessandro.baggi at gmail.com> wrote:> Hi list, > I've a notebook with C7 (1511). This notebook has 2 disk (640 GB) and I've > configured them with MD at level 1. Some days ago I've noticed some > critical slowdown while opening applications. > > First of all I've disabled acpi on disks. > > > I've checked disk for badblocks 4 consecutive times for disk sda and sdb > and I've noticed a strange behaviour. > > On sdb there are not problem but with sda: > > 1) First run badblocks reports 28 badblocks on disk > 2) Second run badblocks reports 32 badblocks > 3) Third reports 102 badblocks > 4) Last run reports 92 badblocks. > > > Running smartctl after the last badblocks check I've noticed that > Current_Pending_Sector was 32 (not 92 as badblocks found). > > To force sector reallocation I've filled the disk up to 100%, runned again > badblocks and 0 badblocks found. > Running again smartctl, Current_Pending_Sector 0 but Reallocated_Event > Count = 0. > > Why each consecutive run of badblocks reports different results? > Why smartctl does not update Reallocated_Event_Count? > Badblocks found on sda increase/decrease without a clean reason. This > behaviuor can be related with raid (if a disk had badblocks this badblock > can be replicated on second disk?)? > > What other test I can perform to verify disks problems? > > Thanks in advance. > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >
On Sun, Jan 17, 2016 at 10:05 AM, Matt Garman <matthew.garman at gmail.com> wrote:> I'm not sure what's going on with your drive. But if it were mine, I'd want > to replace it. If there are issues, that long smart check ought to turn up > something, and in my experience, that's enough for a manufacturer to do a > warranty replacement.I agree with Matt. Go ahead and run a few of the S.M.A.R.T. tests. I can almost guarantee based off of your description of your problem that they will fail. badblocks(8) is a very antiquated tool. Almost every hard drive has a few bad sectors from the factory. Very old hard drives used to have a list of the bad sectors printed on the front of the label. When you first created a filesystem you had to enter all of the bad sectors from the label so that the filesystem wouldn't store data there. Years later, more bad sectors would form and you could enter them into the filesystem by discovering them using a tool like badblocks(8). Today, drives do all of this work automatically. The manufacturer of a hard drive will scan the entire surface and write the bad sectors into a section of the hard drive's electronics known as the P-list. The controller on the drive will automatically remap these sectors to a small area of unused sectors set aside for this very purpose. Later if more bad sectors form, hard drives when they see a bad sector will enter it into a list known as the G-list and then remap this sector to other sectors in the unused area of the drive I mentioned earlier. Basically under normal conditions, the end user should NEVER see bad sectors from their perspective. If badblocks(8) is reporting bad sectors, it is very likely that enough bad sectors have formed to the point where the unused reserved sectors is depleted of replacement sectors. While in theory you could run badblocks(8) and pass it to the filesystem, I can ensure you that the growth of bad sectors at this point has reached a point in which it will continue. I'd stop using that hard drive, pull any important data, and then proceed to run S.M.A.R.T. tests so if the drive is under warranty you can have it replaced. Brandon Vincent
On Jan 17, 2016, at 9:59 AM, Alessandro Baggi <alessandro.baggi at gmail.com> wrote:> > On sdb there are not problem but with sda: > > 1) First run badblocks reports 28 badblocks on disk > 2) Second run badblocks reports 32 badblocks > 3) Third reports 102 badblocks > 4) Last run reports 92 badblocks.It?s dying. Replace it now. On a modern hard disk, you should *never* see bad sectors, because the drive is busy hiding all the bad sectors it does find, then telling you everything is fine. Once the drive has swept so many problems under the rug that it is forced to admit to normal user space programs (e.g. badblocks) that there are bad sectors, it?s because the spare sector pool is full. At that point, the only safe remediation is to replace the disk.> Running smartctl after the last badblocks check I've noticed that Current_Pending_Sector was 32 (not 92 as badblocks found).SMART is allowed to lie to you. That?s why there?s the RAW_VALUE column, yet there is no explanation in the manual as to what that value means. The reason is, the low-level meanings of these values are documented by the drive manufacturers. ?92? is not necessarily a sector count. For all you know, it is reporting that there are currently 92 lemmings in midair off the fjords of Finland. The only important results here are: a) the numbers are nonzero b) the numbers are changing That is all. A zero value just means it hasn?t failed *yet*, and a static nonzero value means the drive has temporarily arrested its failures-in-progress. There is no such thing as a hard drive with zero actual bad sectors, just one that has space left in its spare sector pool. A ?working? drive is one that is swapping sectors from the spare pool rarely enough that it is expected not to empty the pool before the warranty expires.> Why each consecutive run of badblocks reports different results?Because physics. The highly competitive nature of the HDD business plus the relentless drive of Moore?s Business Law ? as it should be called, since it is not a physical law, just an arbitrary fiction that the tech industry has bought into as the ground rules for the game ? pushes the manufacturers to design them right up against the ragged edge of functionality. HDD manufacturers could solve all of this by making them with 1/4 the capacity and twice the cost and get 10x the reliability. And they do: they?re called SAS drives. :)> Why smartctl does not update Reallocated_Event_Count?Because SMART lies.> What other test I can perform to verify disks problems?Quit poking the tiger to see if it will bite you. Replace the bad disk and resilver that mirror before you lose the other disk, too.
On 1/19/2016 2:24 PM, Warren Young wrote:> It?s dying. Replace it now.agreed> On a modern hard disk, you should*never* see bad sectors, because the drive is busy hiding all the bad sectors it does find, then telling you everything is fine.thats not actually true. the drive will report 'bad sector' if you try and read data that the drive simply can't read. you wouldn't want it to return bad data and say its OK. many(most?) drives won't actually remap to a bad sector until you write new data over that block number, since they don't want to copy bad data without any way of telling the OS the data is invalid. these pending remaps are listed under smart parameter 197 Current_Pending_Sector -- john r pierce, recycling bits in santa cruz
On Tue, Jan 19, 2016 at 3:24 PM, Warren Young <wyml at etr-usa.com> wrote:> On a modern hard disk, you should *never* see bad sectors, because the drive is busy hiding all the bad sectors it does find, then telling you everything is fine.This is not a given. Misconfiguration can make persistent bad sectors very common, and this misconfiguration is the default situation in RAID setups on Linux, which is why it's so common. This, and user error, are the top causes for RAID 5 implosion on Linux (both mdadm and lvm raid). The necessary sequence: 1. The drive needs to know the sector is bad. 2. The drive needs to be asked to read that sector. 3. The drive needs to give up trying to read that sector. 4. The drive needs to report the sector LBA back to the OS. 5. The OS needs to write something back to that same LBA. 6. The drive will write to the sector, and if it fails, will remap the LBA to a different (reserve) physical sector. Where this fails on Linux is step 3 and 4. By default consumer drives either don't support SCT ERC, such as in the case in this thread, or it's disabled. That condition means the time out for deep recovery of bad sectors can be very high, 2 or 3 minutes. Usually it's less than this, but often it's more than the kernel's default SCSI command timer. When a command to the drive doesn't complete successfully in the default of 30 seconds, the kernel resets the link to the drive, which obliterates the entire command queue contents and the work it was doing to recover the bad sector. Therefore step 4 never happens, and no steps after that either. Hence, bad sectors accumulate. And the consequence of this often doesn't get figured out until a user looks at kernel messages and sees a bunch of hard link resets and has a WTF moment, and asks questions. More often they don't see those reset messages, or they don't ask about them, so the next consequence is a drive fails. When it's a drive other than one with bad sectors, in effect there are two bad strips per stripe during reads (including rebuild) and that's when there's total array collapse even though there was only one bad drive. As a mask for this problem people are using RAID 6, but it's still a misconfiguration that can cause RAID6 failures too.>> Why smartctl does not update Reallocated_Event_Count? > > Because SMART lies.Nope. The drive isn't being asked to write to those bad sectors. If it can't successfully read the sector without error, it won't migrate the data on its own (some drives never do this). So it necessitates a write to the sector to cause the remap to happen. The other thing is the bad sector count on 512e AF drives is inflated. The number of bad sectors is in 512 byte sector increments. But there is no such thing on an AF drive. One bad physical sector will be reported as 8 bad sectors. And to fix the problem it requires writing exactly all 8 of those logical sectors at one time in a single command to the drive. Ergo I've had 'dd if=/dev/zero of=/dev/sda seek=blah count=8' fail with a read error, due to the command being internally reinterpreted as read-modify-write. Ridiculous but true. So you have to use bs=4096 and count=1, and of course adjust seek LBA to be based on 4096 bytes instead of 512. So the simplest fix here is: echo 160 /sys/block/sdX/device/timeout/ That's needed for each member drive. Note this is not a persistent setting. And then this: echo repair > /sys/block/mdX/md/sync_action That's once. You'll see the read errors in dmesg, and md writing back to the drive with the bad sector. This problem affects all software raid, including btrfs raid1. The ideal scenario is you'll use 'smartctl -l scterc,70,70 /dev/sdX' in startup script, so the drive fails reads on marginally bad sectors with an error in 7 seconds maximum. The linux-raid@ list if chock full of this as a recurring theme. -- Chris Murphy
On 01/19/2016 06:46 PM, Chris Murphy wrote:> Hence, bad sectors accumulate. And the consequence of this often > doesn't get figured out until a user looks at kernel messages and sees > a bunch of hard link resets....The standard Unix way of refreshing the disk contents is with badblocks' non-destructive read-write test (badblocks -n or as the -cc option to e2fsck, for ext2/3/4 filesystems). The remap will happen on the writeback of the contents. It's been this way with enterprise SCSI drives for as long as I can remember there being enterprise-class SCSI drives. ATA drives caught up with the SCSI ones back in the early 90's with this feature. But it's always been true, to the best of my recollection, that the remap always happens on a write. The rationale is pretty simple: only on a write error does the drive know that it has the valid data in its buffer, and so that's the only safe time to put the data elsewhere.> This problem affects all software raid, including btrfs raid1. The > ideal scenario is you'll use 'smartctl -l scterc,70,70 /dev/sdX' in > startup script, so the drive fails reads on marginally bad sectors > with an error in 7 seconds maximum. >This is partly why enterprise arrays manage their own per-sector ECC and use 528-byte sector sizes. The drives for these arrays make very poor workstation standalone drives, since the drive is no longer doing all the error recovery itself, but relying on the storage processor to do the work. Now, the drive is still doing some basic ECC on the sector, but the storage processor is getting a much better idea of the health of each sector than when the drive's firmware is managing remap. Sophisticated enterprise arrays, like NetApp's, EMC's, and Nimble's, can do some very accurate predictions and proactive hotsparing when needed. That's part of what you pay for when you buy that sort of array. But the other fact of life of modern consumer-level hard drives is that *errored sectors are expected* and not exceptions. Why else would a drive have a TLER in the two minute range like many of the WD Green drives do? And with a consumer-level drive I would be shocked if badblocks reported the same number each time it ran through.
On 01/19/2016 06:29 PM, J Martin Rushton wrote:> (Off topic) I also > remember seeing engineers determine which memory chip was at fault and > replacing the chip using a soldering iron. Try that on a DIMM!As long as the DIMM isn't populated with BGA packages it's about a ten-minute job with a hot air rework station, which will only cost you around $100 or so if you shop around (and if you have a relatively steady hand and either good eyes or a good magnifier). It's doable in a DIY way even with BGA, but takes longer and you need a reballing mask for that specific package to make it work right. Any accurately controlled oven is good enough to do the reflow (and baking Xbox boards is essentially doing a reflow......) Yeah, I prefer tubes and discretes and through-hole PCB's myself, but at this point I've acquired a hot air station and am getting up to speed on surface mount, and am finding that it's not really that hard, just different. This is not that different from getting up to speed with something really new and different, like systemd. It just requires being willing to take a different approach to the problem. BGA desoldering/resoldering requires a whole different way of looking at the soldering operation, that's all.
On Wed, Jan 20, 2016, 7:17 AM Lamar Owen <lowen at pari.edu> wrote:> On 01/19/2016 06:46 PM, Chris Murphy wrote: > > Hence, bad sectors accumulate. And the consequence of this often > > doesn't get figured out until a user looks at kernel messages and sees > > a bunch of hard link resets.... > > The standard Unix way of refreshing the disk contents is with badblocks' > non-destructive read-write test (badblocks -n or as the -cc option to > e2fsck, for ext2/3/4 filesystems).This isn't applicable to RAID, which is what this thread is about. For RAID, use scrub, that's what is for. The badblocks method fixes nothing if the sector is persistently bad and the drive reports a read error. It fixes nothing if the command timeout is reached before the drive either recovers or reports a read error. And even if it works, you're relying on ECC recovered data rather than reading a likely good copy from mirror or parity and writing that back to the bad block. But all of this still requires the proper configuration. The remap will happen on the> writeback of the contents. It's been this way with enterprise SCSI > drives for as long as I can remember there being enterprise-class SCSI > drives. ATA drives caught up with the SCSI ones back in the early 90's > with this feature. But it's always been true, to the best of my > recollection, that the remap always happens on a write.Properly configured, first a read error happens which includes the LBA of the bad sector. The md driver needs that LBA to know how to find a good copy of data from mirror or from parity. *Then* it weird to the bad LBA. In the case of misconfiguration, the command timeout expiration and link reset prevents the kernel from knowing the LBA if the bad sector and therefore repair isn't possible. The rationale> is pretty simple: only on a write error does the drive know that it has > the valid data in its buffer, and so that's the only safe time to put > the data elsewhere. > > > This problem affects all software raid, including btrfs raid1. The > > ideal scenario is you'll use 'smartctl -l scterc,70,70 /dev/sdX' in > > startup script, so the drive fails reads on marginally bad sectors > > with an error in 7 seconds maximum. > > > This is partly why enterprise arrays manage their own per-sector ECC and > use 528-byte sector sizes.Not all enterprise drives have 520/528 byte sectors. Those that do are using T10-PI (formerly DIF) and it requires software support too. It's pretty rare. It's 8000% easier to use ZFS on Linux or Btrfs.> But the other fact of life of modern consumer-level hard drives is that > *errored sectors are expected* and not exceptions. Why else would a > drive have a TLER in the two minute range like many of the WD Green > drives do? And with a consumer-level drive I would be shocked if > badblocks reported the same number each time it ran through. >All drives expect bad sectors. Consumer drives reporting a read error will put the host OS into an inconsistent state, so it should be avoided. Becoming slow is better than implosion. And neither OS X or Windows do link resets after merely 30 seconds either. Chris Murphy
On 01/20/2016 01:43 PM, Chris Murphy wrote:> On Wed, Jan 20, 2016, 7:17 AM Lamar Owen <lowen at pari.edu> wrote: > >> The standard Unix way of refreshing the disk contents is with >> badblocks' non-destructive read-write test (badblocks -n or as the >> -cc option to e2fsck, for ext2/3/4 filesystems). > > This isn't applicable to RAID, which is what this thread is about. For > RAID, use scrub, that's what is for.The badblocks read/write verification would need to be done on the RAID member devices, not the aggregate md device, for member device level remap. It might need to be done with the md offline, not sure. Scrub? There is a scrub command (and package) in CentOS, but it's meant for secure data erasure, and is not a non-destructive thing. Ah, you're talking about what md will do if 'check' or 'repair' is written to the appropriate location in the sysfs for the md in question. (This info is in the md(4) man page).> The badblocks method fixes nothing if the sector is persistently bad and > the drive reports a read error.The badblocks method will do a one-off read/write verification on a member device; no, it won't do it automatically, true enough.> It fixes nothing if the command timeout is > reached before the drive either recovers or reports a read error.Very true.> And even > if it works, you're relying on ECC recovered data rather than reading a > likely good copy from mirror or parity and writing that back to the bad > block.Yes, for the member drive this is true. Since my storage here is primarily on EMC Clariion, I'm not sure what the equivalent to EMC's background verify would be for mdraid, since I've not needed that functionality from mdraid. (I really don't like the term 'software RAID' since at some level all RAID is software RAID, whether on a storage processor or in the RAID controller's firmware.....). It does appear that triggering a scrub from sysfs for a particular md might be similar functionality, and would do the remap if inconsistent data is found. This is a bit different from the old Unix way, but these are newer times and such the way of doing things is different.> But all of this still requires the proper configuration.Yes, this is very true.
On Thu, Jan 21, 2016 at 9:27 AM, Lamar Owen <lowen at pari.edu> wrote:> On 01/20/2016 01:43 PM, Chris Murphy wrote: >> >> On Wed, Jan 20, 2016, 7:17 AM Lamar Owen <lowen at pari.edu> wrote: >> >>> The standard Unix way of refreshing the disk contents is with badblocks' >>> non-destructive read-write test (badblocks -n or as the -cc option to >>> e2fsck, for ext2/3/4 filesystems). >> >> >> This isn't applicable to RAID, which is what this thread is about. For >> RAID, use scrub, that's what is for. > > > The badblocks read/write verification would need to be done on the RAID > member devices, not the aggregate md device, for member device level remap. > It might need to be done with the md offline, not sure. Scrub? There is a > scrub command (and package) in CentOS, but it's meant for secure data > erasure, and is not a non-destructive thing. Ah, you're talking about what > md will do if 'check' or 'repair' is written to the appropriate location in > the sysfs for the md in question. (This info is in the md(4) man page).Correct. -- Chris Murphy