Lanny Marcus
2009-May-24 22:10 UTC
[CentOS] OT: SMART warning on hard drive, same warning for 2 1 /2 years
My wife's box has a very intermittent problem, when booting from the
Maxtor IDE hard drive. This has been going on for about 2 1/2
years.... The box is a Compaq EVO D300v for the Enterprise. When it
boots, there is a SMART advisory from the BIOS that says failure is
immenient. Occasionally, it will not boot, because the BIOS does not
see the hard drive. I replaced the EIDE cable, but the problem of
sometimes not seeing the hard drive on boot continues. I suspect it
has to do with something loose in the electronics of the drive,
because if I press on both ends of the EIDE cable, the problem goes
away and then it will boot OK. The box is currently M$ Windows only. I
just booted it from my Knoppix Live CD and ran smartctl on it. Below
are the results. When I ran the Maxtor Diagnostics on the hard drive,
3 times, each time the quick 90 second SMART test said that I should
run the Read test, which takes about one hour. Each time I ran the
Read test, it passed OK. 3 times. Should I suggest to my wife that
she let me replace the hard drive, now, at her convenience, before it
fails completely? Is there anything in the smartctl results that
indicates that is not the appropriate thing to do, considering the
length of time this problem has existed? I did not run the Maxtor
Burn In test or low level format, because I do not want to reinstall
everything on this hard drive. The smartctl results certainly seem to
indicate something badly awry, which the Maxtor Diagnostics, on the
Read only test, did not pick up. TIA! Lanny
root at Knoppix:~# smartctl -d ata -H /dev/hda
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test
result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
10 Spin_Retry_Count 0x002b 222 215 223 Pre-fail
Always FAILING_NOW 29
root at Knoppix:~# smartctl -d ata -a /dev/hda
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ==Model Family: Maxtor DiamondMax D540X-4D
Device Model: Maxtor 4D080H4
Serial Number: D40SBVYE
Firmware Version: DAH017K0
User Capacity: 81,964,302,336 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 0
Local Time is: Sun May 24 17:44:28 2009 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test
result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 64) The previous self-test completed having
a test element that failed and the test
element that failed is not known.
Total time to complete Offline
data collection: ( 30) seconds.
Offline data collection
capabilities: (0x1b) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 51) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 202 199 063 Pre-fail
Always - 18883
4 Start_Stop_Count 0x0032 252 252 000 Old_age
Always - 2809
5 Reallocated_Sector_Ct 0x0033 239 239 063 Pre-fail
Always - 37
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail
Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age
Always - 0
8 Seek_Time_Performance 0x0027 250 243 187 Pre-fail
Always - 47480
9 Power_On_Minutes 0x0032 253 250 000 Old_age
Always - 0h+18m
10 Spin_Retry_Count 0x002b 222 215 223 Pre-fail
Always FAILING_NOW 29
11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 249 249 000 Old_age
Always - 1722
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age
Always - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age
Always - 0
194 Unknown_Attribute 0x0032 253 253 000 Old_age
Always - 0
195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age
Always - 24
196 Reallocated_Event_Count 0x0008 251 251 000 Old_age
Offline - 2
197 Current_Pending_Sector 0x0008 253 249 000 Old_age
Offline - 0
198 Offline_Uncorrectable 0x0008 253 252 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age
Offline - 0
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age
Always - 0
201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age
Always - 0
202 TA_Increase_Count 0x000a 253 251 000 Old_age
Always - 0
203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail
Always - 0
204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age
Always - 0
205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age
Always - 0
207 Spin_High_Current 0x002a 239 235 000 Old_age
Always - 13
208 Spin_Buzz 0x002a 245 242 000 Old_age
Always - 8
209 Offline_Seek_Performnce 0x0024 253 253 000 Old_age
Offline - 0
99 Unknown_Attribute 0x0004 253 253 000 Old_age
Offline - 0
100 Unknown_Attribute 0x0004 253 253 000 Old_age
Offline - 0
101 Unknown_Attribute 0x0004 253 253 000 Old_age
Offline - 0
SMART Error Log Version: 1
Warning: ATA error count 3379 inconsistent with error log pointer 5
ATA Error Count: 3379 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 3379 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was in
an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 51 01 01 a5 5a a0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
08 d6 01 01 a5 5a a0 02 03:14:14.480 DEVICE RESET
b0 d6 01 9f 4f c2 a0 00 03:12:29.984 SMART WRITE LOG
b0 d5 01 9f 4f c2 a0 00 03:12:29.968 SMART READ LOG
b0 d6 01 50 4f c2 a0 00 03:12:26.512 SMART WRITE LOG
b0 d9 01 00 4f c2 a0 00 03:12:26.480 SMART DISABLE OPERATIONS
Error 3378 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was in
an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 51 01 0b 4f c2 a0 Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
b0 d6 01 9f 4f c2 a0 00 03:12:29.984 SMART WRITE LOG
b0 d5 01 9f 4f c2 a0 00 03:12:29.968 SMART READ LOG
b0 d6 01 50 4f c2 a0 00 03:12:26.512 SMART WRITE LOG
b0 d9 01 00 4f c2 a0 00 03:12:26.480 SMART DISABLE OPERATIONS
b0 d6 01 50 4f c2 a0 00 03:12:26.416 SMART WRITE LOG
Error 3377 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was in
an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 51 01 0b 4f c2 a0 Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
b0 d5 01 9f 4f c2 a0 00 03:12:29.968 SMART READ LOG
b0 d6 01 50 4f c2 a0 00 03:12:26.512 SMART WRITE LOG
b0 d9 01 00 4f c2 a0 00 03:12:26.480 SMART DISABLE OPERATIONS
b0 d6 01 50 4f c2 a0 00 03:12:26.416 SMART WRITE LOG
41 ff 00 00 b9 8a e9 00 03:12:26.416 READ VERIFY SECTOR(S) [OBS-5]
Error 3376 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was in
an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 51 01 0b 4f c2 a0 Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
b0 d6 01 50 4f c2 a0 00 03:12:26.512 SMART WRITE LOG
b0 d9 01 00 4f c2 a0 00 03:12:26.480 SMART DISABLE OPERATIONS
b0 d6 01 50 4f c2 a0 00 03:12:26.416 SMART WRITE LOG
41 ff 00 00 b9 8a e9 00 03:12:26.416 READ VERIFY SECTOR(S) [OBS-5]
41 ff 00 00 b8 8a e9 00 03:12:26.400 READ VERIFY SECTOR(S) [OBS-5]
Error 3375 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was in
an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 51 01 0b 4f c2 a0 Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
b0 d6 01 50 4f c2 a0 00 03:12:26.416 SMART WRITE LOG
41 ff 00 00 b9 8a e9 00 03:12:26.416 READ VERIFY SECTOR(S) [OBS-5]
41 ff 00 00 b8 8a e9 00 03:12:26.400 READ VERIFY SECTOR(S) [OBS-5]
41 ff 00 00 b7 8a e9 00 03:12:26.400 READ VERIFY SECTOR(S) [OBS-5]
41 ff 00 00 b6 8a e9 00 03:12:26.400 READ VERIFY SECTOR(S) [OBS-5]
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: unknown failure 00% 0 -
# 2 Short offline Completed: unknown failure 00% 0 -
# 3 Short offline Completed: unknown failure 00% 997 -
# 4 Short offline Completed without error 00% 905 -
# 5 Short offline Completed without error 00% 664 -
# 6 Short offline Completed without error 00% 664 -
# 7 Short offline Completed: unknown failure 00% 1 -
# 8 Short offline Completed: unknown failure 00% 9 -
# 9 Short offline Completed: unknown failure 00% 215 -
#10 Short offline Completed without error 00% 215 -
#11 Extended offline Completed without error 00% 213 -
#12 Short offline Completed: read failure 60% 187
80417451
#13 Extended offline Completed: read failure 20% 184
80417451
#14 Short offline Completed without error 00% 181 -
#15 Extended offline Completed: read failure 20% 151
80417451
#16 Short offline Completed without error 00% 151 -
#17 Short offline Completed without error 00% 139 -
#18 Short offline Completed: read failure 60% 45
208052
#19 Short offline Completed without error 00% 5 -
#20 Extended offline Completed without error 00% 4 -
#21 Short offline Completed without error 00% 3 -
Device does not support Selective Self Tests/Logging
root at Knoppix:~#
MHR
2009-May-24 22:29 UTC
[CentOS] OT: SMART warning on hard drive, same warning for 2 1 /2 years
On Sun, May 24, 2009 at 3:10 PM, Lanny Marcus <lmmailinglists at gmail.com> wrote:> My wife's box has a very intermittent problem,Sounds kinda personal to me....> when booting from the > Maxtor IDE hard drive. This has been going on for about 2 1/2Oh. Never mind (bad humor).> years.... The box is a Compaq EVO D300v for the Enterprise. When it > boots, there is a SMART advisory from the BIOS that says failure is > immenient. Occasionally, it will not boot, because the BIOS does not > see the hard drive.<snip> Seriously, now.... If there's a hardware problem with the drive, like loose connections, I'd get rid of it. I would have suggested a warranty swap-out, which you might be able to do if you still have the receipt or you registered the purchase (and sometimes even if not - check with Maxtor's web site), but it sounds like the drive may be out of warranty. Like I said, though, check with Maxtor - some of their older drives had 5 year warranties, and they have a general warranty period for all of their disks that yours might fall within. I don't trust the SMART advisories all the time, mainly because I have a 1+ year old Seagate SATA drive that gets a smartd error every 30 minutes when the checks are performed. I have done numerous tests, including the ones supplied by Seagate in their Linux seatools package, and they all say that the drive is fine. (The error is suspicious anyway - it claims that there are 4294967295 unreadable or offline sectors, which is way more than the drive could possibly have, but that also just happens to be 0xFFFFFFFF.... Since I'm running an AMD 64x2, I'd bet that it's a 32-64 bit compatibility issue with the drive itself. Neither of my other two disks, which are also SATA, get any errors, and they're both WDs.) That said, yours look much more serious. Conclusion: you are showing too many points of failure to warrant keeping the drive. Try to get a warranty replacement if you can, and if that doesn't work, just get a new drive. Since warranty replacements are usually refurbished disks, and the new warranty ends at the same time, you'd probably be better off with a new disk. Them's my $0.04 (inflation, y'know). HTH mhr
Rainer Duffner
2009-May-24 22:35 UTC
[CentOS] OT: SMART warning on hard drive, same warning for 2 1 /2 years
Am 25.05.2009 um 00:10 schrieb Lanny Marcus:> My wife's box has a very intermittent problem, when booting from the > Maxtor IDE hard drive. This has been going on for about 2 1/2 > years....What did stop you from replacing it 2.5 years ago, BTW? What's the warranty-policy for OEM-drives of Maxtor (now Seagate?)? There's drives and there's OEM drives... Rainer
Robert Nichols
2009-May-25 04:39 UTC
[CentOS] OT: SMART warning on hard drive, same warning for 2 1 /2 years
Lanny Marcus wrote:> My wife's box has a very intermittent problem, when booting from the > Maxtor IDE hard drive. This has been going on for about 2 1/2 > years.... The box is a Compaq EVO D300v for the Enterprise. When it > boots, there is a SMART advisory from the BIOS that says failure is > immenient. Occasionally, it will not boot, because the BIOS does not > see the hard drive. I replaced the EIDE cable, but the problem of > sometimes not seeing the hard drive on boot continues. I suspect it > has to do with something loose in the electronics of the drive, > because if I press on both ends of the EIDE cable, the problem goes > away and then it will boot OK.[SNIP]> === START OF READ SMART DATA SECTION ==> SMART overall-health self-assessment test result: FAILED! > Drive failure expected in less than 24 hours. SAVE ALL DATA. > Failed Attributes: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 10 Spin_Retry_Count 0x002b 222 215 223 Pre-fail > Always FAILING_NOW 29A spin-up failure could be caused by a weak power supply or a power connector that is not making good contact. It is not likely to be a problem with the EIDE cable. Note that even if you do correct the problem, the SMART advisory will likely remain due to the accumulated failure count, but the boot failures should stop. -- Bob Nichols "NOSPAM" is really part of my email address. Do NOT delete it.