Lanny Marcus
2009-May-24 22:10 UTC
[CentOS] OT: SMART warning on hard drive, same warning for 2 1 /2 years
My wife's box has a very intermittent problem, when booting from the Maxtor IDE hard drive. This has been going on for about 2 1/2 years.... The box is a Compaq EVO D300v for the Enterprise. When it boots, there is a SMART advisory from the BIOS that says failure is immenient. Occasionally, it will not boot, because the BIOS does not see the hard drive. I replaced the EIDE cable, but the problem of sometimes not seeing the hard drive on boot continues. I suspect it has to do with something loose in the electronics of the drive, because if I press on both ends of the EIDE cable, the problem goes away and then it will boot OK. The box is currently M$ Windows only. I just booted it from my Knoppix Live CD and ran smartctl on it. Below are the results. When I ran the Maxtor Diagnostics on the hard drive, 3 times, each time the quick 90 second SMART test said that I should run the Read test, which takes about one hour. Each time I ran the Read test, it passed OK. 3 times. Should I suggest to my wife that she let me replace the hard drive, now, at her convenience, before it fails completely? Is there anything in the smartctl results that indicates that is not the appropriate thing to do, considering the length of time this problem has existed? I did not run the Maxtor Burn In test or low level format, because I do not want to reinstall everything on this hard drive. The smartctl results certainly seem to indicate something badly awry, which the Maxtor Diagnostics, on the Read only test, did not pick up. TIA! Lanny root at Knoppix:~# smartctl -d ata -H /dev/hda smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA. Failed Attributes: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 10 Spin_Retry_Count 0x002b 222 215 223 Pre-fail Always FAILING_NOW 29 root at Knoppix:~# smartctl -d ata -a /dev/hda smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION ==Model Family: Maxtor DiamondMax D540X-4D Device Model: Maxtor 4D080H4 Serial Number: D40SBVYE Firmware Version: DAH017K0 User Capacity: 81,964,302,336 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 6 ATA Standard is: ATA/ATAPI-6 T13 1410D revision 0 Local Time is: Sun May 24 17:44:28 2009 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA. See vendor-specific Attribute list for failed Attributes. General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 64) The previous self-test completed having a test element that failed and the test element that failed is not known. Total time to complete Offline data collection: ( 30) seconds. Offline data collection capabilities: (0x1b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 51) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 3 Spin_Up_Time 0x0027 202 199 063 Pre-fail Always - 18883 4 Start_Stop_Count 0x0032 252 252 000 Old_age Always - 2809 5 Reallocated_Sector_Ct 0x0033 239 239 063 Pre-fail Always - 37 6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0 7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0 8 Seek_Time_Performance 0x0027 250 243 187 Pre-fail Always - 47480 9 Power_On_Minutes 0x0032 253 250 000 Old_age Always - 0h+18m 10 Spin_Retry_Count 0x002b 222 215 223 Pre-fail Always FAILING_NOW 29 11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 249 249 000 Old_age Always - 1722 192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0 194 Unknown_Attribute 0x0032 253 253 000 Old_age Always - 0 195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always - 24 196 Reallocated_Event_Count 0x0008 251 251 000 Old_age Offline - 2 197 Current_Pending_Sector 0x0008 253 249 000 Old_age Offline - 0 198 Offline_Uncorrectable 0x0008 253 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline - 0 200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always - 0 202 TA_Increase_Count 0x000a 253 251 000 Old_age Always - 0 203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always - 0 204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age Always - 0 205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always - 0 207 Spin_High_Current 0x002a 239 235 000 Old_age Always - 13 208 Spin_Buzz 0x002a 245 242 000 Old_age Always - 8 209 Offline_Seek_Performnce 0x0024 253 253 000 Old_age Offline - 0 99 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 100 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 101 Unknown_Attribute 0x0004 253 253 000 Old_age Offline - 0 SMART Error Log Version: 1 Warning: ATA error count 3379 inconsistent with error log pointer 5 ATA Error Count: 3379 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 3379 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 01 01 a5 5a a0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 08 d6 01 01 a5 5a a0 02 03:14:14.480 DEVICE RESET b0 d6 01 9f 4f c2 a0 00 03:12:29.984 SMART WRITE LOG b0 d5 01 9f 4f c2 a0 00 03:12:29.968 SMART READ LOG b0 d6 01 50 4f c2 a0 00 03:12:26.512 SMART WRITE LOG b0 d9 01 00 4f c2 a0 00 03:12:26.480 SMART DISABLE OPERATIONS Error 3378 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 01 0b 4f c2 a0 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- b0 d6 01 9f 4f c2 a0 00 03:12:29.984 SMART WRITE LOG b0 d5 01 9f 4f c2 a0 00 03:12:29.968 SMART READ LOG b0 d6 01 50 4f c2 a0 00 03:12:26.512 SMART WRITE LOG b0 d9 01 00 4f c2 a0 00 03:12:26.480 SMART DISABLE OPERATIONS b0 d6 01 50 4f c2 a0 00 03:12:26.416 SMART WRITE LOG Error 3377 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 01 0b 4f c2 a0 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- b0 d5 01 9f 4f c2 a0 00 03:12:29.968 SMART READ LOG b0 d6 01 50 4f c2 a0 00 03:12:26.512 SMART WRITE LOG b0 d9 01 00 4f c2 a0 00 03:12:26.480 SMART DISABLE OPERATIONS b0 d6 01 50 4f c2 a0 00 03:12:26.416 SMART WRITE LOG 41 ff 00 00 b9 8a e9 00 03:12:26.416 READ VERIFY SECTOR(S) [OBS-5] Error 3376 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 01 0b 4f c2 a0 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- b0 d6 01 50 4f c2 a0 00 03:12:26.512 SMART WRITE LOG b0 d9 01 00 4f c2 a0 00 03:12:26.480 SMART DISABLE OPERATIONS b0 d6 01 50 4f c2 a0 00 03:12:26.416 SMART WRITE LOG 41 ff 00 00 b9 8a e9 00 03:12:26.416 READ VERIFY SECTOR(S) [OBS-5] 41 ff 00 00 b8 8a e9 00 03:12:26.400 READ VERIFY SECTOR(S) [OBS-5] Error 3375 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was in an unknown state. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 01 0b 4f c2 a0 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- b0 d6 01 50 4f c2 a0 00 03:12:26.416 SMART WRITE LOG 41 ff 00 00 b9 8a e9 00 03:12:26.416 READ VERIFY SECTOR(S) [OBS-5] 41 ff 00 00 b8 8a e9 00 03:12:26.400 READ VERIFY SECTOR(S) [OBS-5] 41 ff 00 00 b7 8a e9 00 03:12:26.400 READ VERIFY SECTOR(S) [OBS-5] 41 ff 00 00 b6 8a e9 00 03:12:26.400 READ VERIFY SECTOR(S) [OBS-5] SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: unknown failure 00% 0 - # 2 Short offline Completed: unknown failure 00% 0 - # 3 Short offline Completed: unknown failure 00% 997 - # 4 Short offline Completed without error 00% 905 - # 5 Short offline Completed without error 00% 664 - # 6 Short offline Completed without error 00% 664 - # 7 Short offline Completed: unknown failure 00% 1 - # 8 Short offline Completed: unknown failure 00% 9 - # 9 Short offline Completed: unknown failure 00% 215 - #10 Short offline Completed without error 00% 215 - #11 Extended offline Completed without error 00% 213 - #12 Short offline Completed: read failure 60% 187 80417451 #13 Extended offline Completed: read failure 20% 184 80417451 #14 Short offline Completed without error 00% 181 - #15 Extended offline Completed: read failure 20% 151 80417451 #16 Short offline Completed without error 00% 151 - #17 Short offline Completed without error 00% 139 - #18 Short offline Completed: read failure 60% 45 208052 #19 Short offline Completed without error 00% 5 - #20 Extended offline Completed without error 00% 4 - #21 Short offline Completed without error 00% 3 - Device does not support Selective Self Tests/Logging root at Knoppix:~#
MHR
2009-May-24 22:29 UTC
[CentOS] OT: SMART warning on hard drive, same warning for 2 1 /2 years
On Sun, May 24, 2009 at 3:10 PM, Lanny Marcus <lmmailinglists at gmail.com> wrote:> My wife's box has a very intermittent problem,Sounds kinda personal to me....> when booting from the > Maxtor IDE hard drive. This has been going on for about 2 1/2Oh. Never mind (bad humor).> years.... The box is a Compaq EVO D300v for the Enterprise. When it > boots, there is a SMART advisory from the BIOS that says failure is > immenient. Occasionally, it will not boot, because the BIOS does not > see the hard drive.<snip> Seriously, now.... If there's a hardware problem with the drive, like loose connections, I'd get rid of it. I would have suggested a warranty swap-out, which you might be able to do if you still have the receipt or you registered the purchase (and sometimes even if not - check with Maxtor's web site), but it sounds like the drive may be out of warranty. Like I said, though, check with Maxtor - some of their older drives had 5 year warranties, and they have a general warranty period for all of their disks that yours might fall within. I don't trust the SMART advisories all the time, mainly because I have a 1+ year old Seagate SATA drive that gets a smartd error every 30 minutes when the checks are performed. I have done numerous tests, including the ones supplied by Seagate in their Linux seatools package, and they all say that the drive is fine. (The error is suspicious anyway - it claims that there are 4294967295 unreadable or offline sectors, which is way more than the drive could possibly have, but that also just happens to be 0xFFFFFFFF.... Since I'm running an AMD 64x2, I'd bet that it's a 32-64 bit compatibility issue with the drive itself. Neither of my other two disks, which are also SATA, get any errors, and they're both WDs.) That said, yours look much more serious. Conclusion: you are showing too many points of failure to warrant keeping the drive. Try to get a warranty replacement if you can, and if that doesn't work, just get a new drive. Since warranty replacements are usually refurbished disks, and the new warranty ends at the same time, you'd probably be better off with a new disk. Them's my $0.04 (inflation, y'know). HTH mhr
Rainer Duffner
2009-May-24 22:35 UTC
[CentOS] OT: SMART warning on hard drive, same warning for 2 1 /2 years
Am 25.05.2009 um 00:10 schrieb Lanny Marcus:> My wife's box has a very intermittent problem, when booting from the > Maxtor IDE hard drive. This has been going on for about 2 1/2 > years....What did stop you from replacing it 2.5 years ago, BTW? What's the warranty-policy for OEM-drives of Maxtor (now Seagate?)? There's drives and there's OEM drives... Rainer
Robert Nichols
2009-May-25 04:39 UTC
[CentOS] OT: SMART warning on hard drive, same warning for 2 1 /2 years
Lanny Marcus wrote:> My wife's box has a very intermittent problem, when booting from the > Maxtor IDE hard drive. This has been going on for about 2 1/2 > years.... The box is a Compaq EVO D300v for the Enterprise. When it > boots, there is a SMART advisory from the BIOS that says failure is > immenient. Occasionally, it will not boot, because the BIOS does not > see the hard drive. I replaced the EIDE cable, but the problem of > sometimes not seeing the hard drive on boot continues. I suspect it > has to do with something loose in the electronics of the drive, > because if I press on both ends of the EIDE cable, the problem goes > away and then it will boot OK.[SNIP]> === START OF READ SMART DATA SECTION ==> SMART overall-health self-assessment test result: FAILED! > Drive failure expected in less than 24 hours. SAVE ALL DATA. > Failed Attributes: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 10 Spin_Retry_Count 0x002b 222 215 223 Pre-fail > Always FAILING_NOW 29A spin-up failure could be caused by a weak power supply or a power connector that is not making good contact. It is not likely to be a problem with the EIDE cable. Note that even if you do correct the problem, the SMART advisory will likely remain due to the accumulated failure count, but the boot failures should stop. -- Bob Nichols "NOSPAM" is really part of my email address. Do NOT delete it.