I have a 4 port eSata PCIe card with 3 external port multipliers attached on an AMD64 box (8G of RAM), RELENG8 from Feb1st. siis0@pci0:5:0:0: class=0x010400 card=0x71241095 chip=0x31241095 rev=0x02 hdr=0x00 vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' device = 'PCI-X to Serial ATA Controller (SiI 3124)' class = mass storage subclass = RAID bar [10] = type Memory, range 64, base 0xb4408000, size 128, enabled bar [18] = type Memory, range 64, base 0xb4400000, size 32768, enabled bar [20] = type I/O Port, range 32, base 0x3000, size 16, enabled cap 01[64] = powerspec 2 supports D0 D1 D2 D3 current D0 cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message siis0: <SiI3124 SATA controller> port 0x3000-0x300f mem 0xb4408000-0xb440807f,0xb4400000-0xb4407fff irq 19 at device 0.0 on pci5 siis0: [ITHREAD] siisch0: <SIIS channel> at channel 0 on siis0 siisch0: [ITHREAD] siisch1: <SIIS channel> at channel 1 on siis0 siisch1: [ITHREAD] siisch2: <SIIS channel> at channel 2 on siis0 siisch2: [ITHREAD] siisch3: <SIIS channel> at channel 3 on siis0 siisch3: [ITHREAD] # camcontrol devlist <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 0 lun 0 (pass0,ada0) <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 1 lun 0 (pass1,ada1) <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 2 lun 0 (pass2,ada2) <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 3 lun 0 (pass3,ada3) <Port Multiplier 47261095 1f06> at scbus0 target 15 lun 0 (pass4,pmp1) <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 0 lun 0 (pass5,ada4) <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 1 lun 0 (pass6,ada5) <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 2 lun 0 (pass7,ada6) <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 3 lun 0 (pass8,ada7) <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 4 lun 0 (pass9,ada8) <Port Multiplier 37261095 1706> at scbus1 target 15 lun 0 (pass10,pmp0) <Areca usrvar R001> at scbus4 target 0 lun 0 (pass11,da0) <Areca backup1 R001> at scbus4 target 0 lun 1 (pass12,da1) <Areca RAID controller R001> at scbus4 target 16 lun 0 (pass13) <AMCC 9650SE-2LP DISK 4.10> at scbus5 target 0 lun 0 (pass14,da2) <ST31000333AS SD35> at scbus6 target 0 lun 0 (pass15,ada9) <ST31000528AS CC35> at scbus7 target 0 lun 0 (pass16,ada10) <ST31000340AS SD1A> at scbus8 target 0 lun 0 (pass17,ada11) <WDC WD1002FAEX-00Z3A0 05.01D05> at scbus11 target 0 lun 0 (pass18,ada12) Ever since I added a new PM, I have been seeing a new error (READ LOG EXT) along with a the odd slot timeout error. Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 47000000 Feb 7 23:49:32 backup3 kernel: siisch1: Timeout on slot 26 Feb 7 23:49:32 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 43000000 Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 30 Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 03000000 Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 25 Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 01000000 Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 24 Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 Feb 7 23:57:59 backup3 kernel: siisch1: Error while READ LOG EXT Feb 8 00:13:36 backup3 kernel: siisch1: Error while READ LOG EXT Feb 8 00:21:53 backup3 kernel: siisch1: Error while READ LOG EXT Feb 8 00:22:16 backup3 kernel: siisch1: Error while READ LOG EXT Feb 8 00:39:13 backup3 kernel: siisch1: Error while READ LOG EXT Feb 8 01:24:25 backup3 kernel: siisch1: Error while READ LOG EXT Feb 8 01:33:52 backup3 last message repeated 2 times Feb 8 01:43:45 backup3 kernel: siisch1: Error while READ LOG EXT Feb 8 01:50:31 backup3 last message repeated 2 times Feb 8 01:55:20 backup3 kernel: siisch1: Error while READ LOG EXT Feb 8 02:26:26 backup3 kernel: siisch1: Error while READ LOG EXT Feb 8 02:27:24 backup3 kernel: siisch1: Error while READ LOG EXT Feb 8 03:16:28 backup3 kernel: siisch1: Error while READ LOG EXT Feb 8 03:36:20 backup3 kernel: siisch1: Error while READ LOG EXT Feb 8 04:04:05 backup3 kernel: siisch1: Error while READ LOG EXT smartctl doesnt show any issues on the drives other than one that has some historical errors from a while ago. What are these errors and do I need to worry about them ? The "READ LOG EXT" ones are new. This is the only drive with anything in its logs so not sure if this is causing the driver to complain smartctl -a /dev/ada9 smartctl 5.41 2011-06-09 r3365 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION ==Model Family: Seagate Barracuda 7200.11 Device Model: ST31000333AS Serial Number: 9TE14SRV LU WWN Device Id: 5 000c50 010a39664 Firmware Version: SD35 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Wed Feb 8 15:49:12 2012 EST ==> WARNING: There are known problems with these drives, see the following Seagate web pages: http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931 http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207951 http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207957 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 617) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 203) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103b) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 41201023 3 Spin_Up_Time 0x0003 093 092 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 68 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2 7 Seek_Error_Rate 0x000f 088 060 030 Pre-fail Always - 791743293 9 Power_On_Hours 0x0032 075 075 000 Old_age Always - 22755 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 2 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 68 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 095 095 000 Old_age Always - 5 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 001 001 000 Old_age Always - 961 190 Airflow_Temperature_Cel 0x0022 065 056 045 Old_age Always - 35 (Min/Max 33/37) 194 Temperature_Celsius 0x0022 035 044 000 Old_age Always - 35 (0 25 0 0) 195 Hardware_ECC_Recovered 0x001a 049 030 000 Old_age Always - 41201023 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 5 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 5 occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 1a ff ff ff 4f 00 11d+02:29:18.542 READ FPDMA QUEUED 60 00 1a ff ff ff 4f 00 11d+02:29:18.542 READ FPDMA QUEUED 60 00 1b ff ff ff 4f 00 11d+02:29:18.541 READ FPDMA QUEUED 60 00 19 ff ff ff 4f 00 11d+02:29:18.541 READ FPDMA QUEUED 60 00 1c ff ff ff 4f 00 11d+02:29:18.541 READ FPDMA QUEUED Error 4 occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 1a ff ff ff 4f 00 11d+02:29:15.783 READ FPDMA QUEUED 60 00 1a ff ff ff 4f 00 11d+02:29:15.780 READ FPDMA QUEUED 60 00 1b ff ff ff 4f 00 11d+02:29:15.732 READ FPDMA QUEUED 60 00 19 ff ff ff 4f 00 11d+02:29:15.732 READ FPDMA QUEUED 60 00 1c ff ff ff 4f 00 11d+02:29:15.731 READ FPDMA QUEUED Error 3 occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 1b ff ff ff 4f 00 11d+02:29:12.889 READ FPDMA QUEUED 60 00 19 ff ff ff 4f 00 11d+02:29:12.889 READ FPDMA QUEUED 60 00 1c ff ff ff 4f 00 11d+02:29:12.888 READ FPDMA QUEUED 60 00 1c ff ff ff 4f 00 11d+02:29:12.888 READ FPDMA QUEUED 60 00 1a ff ff ff 4f 00 11d+02:29:12.888 READ FPDMA QUEUED Error 2 occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 1b ff ff ff 4f 00 11d+02:29:10.011 READ FPDMA QUEUED 60 00 19 ff ff ff 4f 00 11d+02:29:10.011 READ FPDMA QUEUED 60 00 1c ff ff ff 4f 00 11d+02:29:10.010 READ FPDMA QUEUED 60 00 1c ff ff ff 4f 00 11d+02:29:10.010 READ FPDMA QUEUED 60 00 1a ff ff ff 4f 00 11d+02:29:10.010 READ FPDMA QUEUED Error 1 occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 1b ff ff ff 4f 00 11d+02:29:07.148 READ FPDMA QUEUED 60 00 19 ff ff ff 4f 00 11d+02:29:07.140 READ FPDMA QUEUED 60 00 1c ff ff ff 4f 00 11d+02:29:07.131 READ FPDMA QUEUED 60 00 1c ff ff ff 4f 00 11d+02:29:07.117 READ FPDMA QUEUED 60 00 35 ff ff ff 4f 00 11d+02:29:07.111 READ FPDMA QUEUED SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. ---Mike -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/
On Wed, Feb 08, 2012 at 04:00:57PM -0500, Mike Tancsa wrote:> I have a 4 port eSata PCIe card with 3 external port multipliers attached on an AMD64 box (8G of RAM), RELENG8 from Feb1st. > > siis0@pci0:5:0:0: class=0x010400 card=0x71241095 chip=0x31241095 rev=0x02 hdr=0x00 > vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' > device = 'PCI-X to Serial ATA Controller (SiI 3124)' > class = mass storage > subclass = RAID > bar [10] = type Memory, range 64, base 0xb4408000, size 128, enabled > bar [18] = type Memory, range 64, base 0xb4400000, size 32768, enabled > bar [20] = type I/O Port, range 32, base 0x3000, size 16, enabled > cap 01[64] = powerspec 2 supports D0 D1 D2 D3 current D0 > cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions > cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message > > siis0: <SiI3124 SATA controller> port 0x3000-0x300f mem 0xb4408000-0xb440807f,0xb4400000-0xb4407fff irq 19 at device 0.0 on pci5 > siis0: [ITHREAD] > siisch0: <SIIS channel> at channel 0 on siis0 > siisch0: [ITHREAD] > siisch1: <SIIS channel> at channel 1 on siis0 > siisch1: [ITHREAD] > siisch2: <SIIS channel> at channel 2 on siis0 > siisch2: [ITHREAD] > siisch3: <SIIS channel> at channel 3 on siis0 > siisch3: [ITHREAD] > > # camcontrol devlist > <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 0 lun 0 (pass0,ada0) > <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 1 lun 0 (pass1,ada1) > <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 2 lun 0 (pass2,ada2) > <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 3 lun 0 (pass3,ada3) > <Port Multiplier 47261095 1f06> at scbus0 target 15 lun 0 (pass4,pmp1) > <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 0 lun 0 (pass5,ada4) > <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 1 lun 0 (pass6,ada5) > <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 2 lun 0 (pass7,ada6) > <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 3 lun 0 (pass8,ada7) > <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 4 lun 0 (pass9,ada8) > <Port Multiplier 37261095 1706> at scbus1 target 15 lun 0 (pass10,pmp0) > <Areca usrvar R001> at scbus4 target 0 lun 0 (pass11,da0) > <Areca backup1 R001> at scbus4 target 0 lun 1 (pass12,da1) > <Areca RAID controller R001> at scbus4 target 16 lun 0 (pass13) > <AMCC 9650SE-2LP DISK 4.10> at scbus5 target 0 lun 0 (pass14,da2) > <ST31000333AS SD35> at scbus6 target 0 lun 0 (pass15,ada9) > <ST31000528AS CC35> at scbus7 target 0 lun 0 (pass16,ada10) > <ST31000340AS SD1A> at scbus8 target 0 lun 0 (pass17,ada11) > <WDC WD1002FAEX-00Z3A0 05.01D05> at scbus11 target 0 lun 0 (pass18,ada12) > > > Ever since I added a new PM, I have been seeing a new error (READ LOG EXT) along with a the odd slot timeout error. > > > Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 47000000 > Feb 7 23:49:32 backup3 kernel: siisch1: Timeout on slot 26 > Feb 7 23:49:32 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 43000000 > Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 30 > Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 03000000 > Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 25 > Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 01000000 > Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 24 > Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000This indicates the controller on channel 1 (siisch1) is "stalled" waiting for underlying communication with the device attached to it.> Feb 7 23:57:59 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 00:13:36 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 00:21:53 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 00:22:16 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 00:39:13 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 01:24:25 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 01:33:52 backup3 last message repeated 2 times > Feb 8 01:43:45 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 01:50:31 backup3 last message repeated 2 times > Feb 8 01:55:20 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 02:26:26 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 02:27:24 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 03:16:28 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 03:36:20 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 04:04:05 backup3 kernel: siisch1: Error while READ LOG EXTThis indicates the underlying device was handed a READ LOG EXT ATA command (command 0x2f) and the device did not respond promptly (resulting in the timeout messages you see).> smartctl doesnt show any issues on the drives other than one that has some historical errors from a while ago. What are these errors and do I need to worry about them ? The "READ LOG EXT" ones are new. > > {snipping SMART stats}You're focused heavily on the READ LOG EXT command. READ LOG EXT is intended for accessing the GP Log section of a drive. EXT stands for "Extended". "GP Log" means "General Purpose Log", and is where all sorts of logging information regarding drive performance is stored. It's usually stored within a reserved section of the platters, or in the HPA area. It's not within a "standard" user-accessible LBA/sector region. This is a completely separate log from that of SMART logs. You can review the different types of "logs" on a device by reviewing the ATA8-ACS specification here. See Annex A, section A.1, page 362: http://www.t13.org/documents/UploadedDocuments/docs2007/D1699r4a-ATA8-ACS.pdf This is almost certainly a lower level problem with the disk that cannot be addressed/solved via normal means. Thus, my recommendation is to replace the disk. If you would rather not replace the disk, I can try to step you through looking at the GPLog sections of the disk to see if you can trigger the problem -- and I have a feeling you'll be able to, but I won't necessarily be able to tell you where the actual problem lies hardware-wise, nor will I be able to solve the problem. Regarding the repeated errors at semi-regular (but not entirely) intervals: are you using smartd? Do you have a cronjob that issues smartctl -a or smartctl -x commands at intervals? I imagine any of these could be tickling something lower level. Also, please upgrade your smartmontools to 5.42. It does provide some further enhancements that are useful. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On 08.02.2012 23:27, Jeremy Chadwick wrote:> On Wed, Feb 08, 2012 at 04:00:57PM -0500, Mike Tancsa wrote: >> I have a 4 port eSata PCIe card with 3 external port multipliers attached on an AMD64 box (8G of RAM), RELENG8 from Feb1st. >> >> siis0@pci0:5:0:0: class=0x010400 card=0x71241095 chip=0x31241095 rev=0x02 hdr=0x00 >> vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' >> device = 'PCI-X to Serial ATA Controller (SiI 3124)' >> class = mass storage >> subclass = RAID >> bar [10] = type Memory, range 64, base 0xb4408000, size 128, enabled >> bar [18] = type Memory, range 64, base 0xb4400000, size 32768, enabled >> bar [20] = type I/O Port, range 32, base 0x3000, size 16, enabled >> cap 01[64] = powerspec 2 supports D0 D1 D2 D3 current D0 >> cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions >> cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message >> >> siis0:<SiI3124 SATA controller> port 0x3000-0x300f mem 0xb4408000-0xb440807f,0xb4400000-0xb4407fff irq 19 at device 0.0 on pci5 >> siis0: [ITHREAD] >> siisch0:<SIIS channel> at channel 0 on siis0 >> siisch0: [ITHREAD] >> siisch1:<SIIS channel> at channel 1 on siis0 >> siisch1: [ITHREAD] >> siisch2:<SIIS channel> at channel 2 on siis0 >> siisch2: [ITHREAD] >> siisch3:<SIIS channel> at channel 3 on siis0 >> siisch3: [ITHREAD] >> >> # camcontrol devlist >> <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 0 lun 0 (pass0,ada0) >> <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 1 lun 0 (pass1,ada1) >> <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 2 lun 0 (pass2,ada2) >> <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 3 lun 0 (pass3,ada3) >> <Port Multiplier 47261095 1f06> at scbus0 target 15 lun 0 (pass4,pmp1) >> <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 0 lun 0 (pass5,ada4) >> <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 1 lun 0 (pass6,ada5) >> <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 2 lun 0 (pass7,ada6) >> <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 3 lun 0 (pass8,ada7) >> <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 4 lun 0 (pass9,ada8) >> <Port Multiplier 37261095 1706> at scbus1 target 15 lun 0 (pass10,pmp0) >> <Areca usrvar R001> at scbus4 target 0 lun 0 (pass11,da0) >> <Areca backup1 R001> at scbus4 target 0 lun 1 (pass12,da1) >> <Areca RAID controller R001> at scbus4 target 16 lun 0 (pass13) >> <AMCC 9650SE-2LP DISK 4.10> at scbus5 target 0 lun 0 (pass14,da2) >> <ST31000333AS SD35> at scbus6 target 0 lun 0 (pass15,ada9) >> <ST31000528AS CC35> at scbus7 target 0 lun 0 (pass16,ada10) >> <ST31000340AS SD1A> at scbus8 target 0 lun 0 (pass17,ada11) >> <WDC WD1002FAEX-00Z3A0 05.01D05> at scbus11 target 0 lun 0 (pass18,ada12) >> >> >> Ever since I added a new PM, I have been seeing a new error (READ LOG EXT) along with a the odd slot timeout error. >> >> >> Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 47000000 >> Feb 7 23:49:32 backup3 kernel: siisch1: Timeout on slot 26 >> Feb 7 23:49:32 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 >> Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 43000000 >> Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 30 >> Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 >> Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 03000000 >> Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 25 >> Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 >> Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 01000000 >> Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 24 >> Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > > This indicates the controller on channel 1 (siisch1) is "stalled" > waiting for underlying communication with the device attached to it. > >> Feb 7 23:57:59 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 00:13:36 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 00:21:53 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 00:22:16 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 00:39:13 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 01:24:25 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 01:33:52 backup3 last message repeated 2 times >> Feb 8 01:43:45 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 01:50:31 backup3 last message repeated 2 times >> Feb 8 01:55:20 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 02:26:26 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 02:27:24 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 03:16:28 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 03:36:20 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 04:04:05 backup3 kernel: siisch1: Error while READ LOG EXT > > This indicates the underlying device was handed a READ LOG EXT ATA > command (command 0x2f) and the device did not respond promptly > (resulting in the timeout messages you see).There are hours between timeouts and READ LOG EXT errors. they are not directly related, but may have the same reason.>> smartctl doesnt show any issues on the drives other than one that has some historical errors from a while ago. What are these errors and do I need to worry about them ? The "READ LOG EXT" ones are new. >> >> {snipping SMART stats} > > You're focused heavily on the READ LOG EXT command. READ LOG EXT is > intended for accessing the GP Log section of a drive. EXT stands for > "Extended". "GP Log" means "General Purpose Log", and is where all > sorts of logging information regarding drive performance is stored. > It's usually stored within a reserved section of the platters, or in the > HPA area. It's not within a "standard" user-accessible LBA/sector > region. This is a completely separate log from that of SMART logs.READ LOG EXT commands here used to fetch status of some failed NCQ commands. It is normal (the only) way to get detailed error status in that case. Error of the READ LOG EXT commands may mean that it is not regular media error, but may be problem with communication, firmware or something else.> You can review the different types of "logs" on a device by reviewing > the ATA8-ACS specification here. See Annex A, section A.1, page 362: > > http://www.t13.org/documents/UploadedDocuments/docs2007/D1699r4a-ATA8-ACS.pdf > > This is almost certainly a lower level problem with the disk that cannot > be addressed/solved via normal means. Thus, my recommendation is to > replace the disk. > > If you would rather not replace the disk, I can try to step you through > looking at the GPLog sections of the disk to see if you can trigger the > problem -- and I have a feeling you'll be able to, but I won't > necessarily be able to tell you where the actual problem lies > hardware-wise, nor will I be able to solve the problem. > > Regarding the repeated errors at semi-regular (but not entirely) > intervals: are you using smartd? Do you have a cronjob that issues > smartctl -a or smartctl -x commands at intervals? I imagine any of > these could be tickling something lower level. > > Also, please upgrade your smartmontools to 5.42. It does provide some > further enhancements that are useful.-- Alexander Motin
On 2/8/2012 4:27 PM, Jeremy Chadwick wrote:> > This indicates the controller on channel 1 (siisch1) is "stalled" > waiting for underlying communication with the device attached to it.Hi, But which device ? the PM itself, or the disks behind it ? And which disk ?> > > This is almost certainly a lower level problem with the disk that cannot > be addressed/solved via normal means. Thus, my recommendation is to > replace the disk.I would gladly replace it if I knew which one :)> > Regarding the repeated errors at semi-regular (but not entirely) > intervals: are you using smartd? Do you have a cronjob that issues > smartctl -a or smartctl -x commands at intervals? I imagine any of > these could be tickling something lower level.Dont have smartd running. The box takes a lot of backups as well as a constant stream of netflow data, so a lot of writes to it.> > Also, please upgrade your smartmontools to 5.42. It does provide some > further enhancements that are useful. >Done. # smartctl -x /dev/ada9 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION ==Model Family: Seagate Barracuda 7200.11 Device Model: ST31000333AS Serial Number: 9TE14SRV LU WWN Device Id: 5 000c50 010a39664 Firmware Version: SD35 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Wed Feb 8 20:00:47 2012 EST ==> WARNING: There are known problems with these drives, see the following Seagate web pages: http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931 http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207951 http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207957 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION ==SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 617) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 203) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103b) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 112 099 006 - 44490692 3 Spin_Up_Time PO---- 093 092 000 - 0 4 Start_Stop_Count -O--CK 100 100 020 - 68 5 Reallocated_Sector_Ct PO--CK 100 100 036 - 2 7 Seek_Error_Rate POSR-- 088 060 030 - 791764702 9 Power_On_Hours -O--CK 075 075 000 - 22759 10 Spin_Retry_Count PO--C- 100 100 097 - 2 12 Power_Cycle_Count -O--CK 100 100 020 - 68 184 End-to-End_Error -O--CK 100 100 099 - 0 187 Reported_Uncorrect -O--CK 095 095 000 - 5 188 Command_Timeout -O--CK 100 100 000 - 0 189 High_Fly_Writes -O-RCK 001 001 000 - 961 190 Airflow_Temperature_Cel -O---K 066 056 045 - 34 (Min/Max 33/37) 194 Temperature_Celsius -O---K 034 044 000 - 34 (0 25 0 0 0) 195 Hardware_ECC_Recovered -O-RC- 050 030 000 - 44490692 197 Current_Pending_Sector -O--C- 100 100 000 - 0 198 Offline_Uncorrectable ----C- 100 100 000 - 0 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] GP/S Log at address 0x00 has 1 sectors [Log Directory] GP/S Log at address 0x01 has 1 sectors [Summary SMART error log] GP/S Log at address 0x02 has 5 sectors [Comprehensive SMART error log] GP/S Log at address 0x03 has 5 sectors [Ext. Comprehensive SMART error log] GP/S Log at address 0x06 has 1 sectors [SMART self-test log] GP/S Log at address 0x07 has 1 sectors [Extended self-test log] GP/S Log at address 0x09 has 1 sectors [Selective self-test log] GP/S Log at address 0x10 has 1 sectors [NCQ Command Error log] GP/S Log at address 0x11 has 1 sectors [SATA Phy Event Counters] GP/S Log at address 0x21 has 1 sectors [Write stream error log] GP/S Log at address 0x22 has 1 sectors [Read stream error log] GP/S Log at address 0x80 has 16 sectors [Host vendor specific log] GP/S Log at address 0x81 has 16 sectors [Host vendor specific log] GP/S Log at address 0x82 has 16 sectors [Host vendor specific log] GP/S Log at address 0x83 has 16 sectors [Host vendor specific log] GP/S Log at address 0x84 has 16 sectors [Host vendor specific log] GP/S Log at address 0x85 has 16 sectors [Host vendor specific log] GP/S Log at address 0x86 has 16 sectors [Host vendor specific log] GP/S Log at address 0x87 has 16 sectors [Host vendor specific log] GP/S Log at address 0x88 has 16 sectors [Host vendor specific log] GP/S Log at address 0x89 has 16 sectors [Host vendor specific log] GP/S Log at address 0x8a has 16 sectors [Host vendor specific log] GP/S Log at address 0x8b has 16 sectors [Host vendor specific log] GP/S Log at address 0x8c has 16 sectors [Host vendor specific log] GP/S Log at address 0x8d has 16 sectors [Host vendor specific log] GP/S Log at address 0x8e has 16 sectors [Host vendor specific log] GP/S Log at address 0x8f has 16 sectors [Host vendor specific log] GP/S Log at address 0x90 has 16 sectors [Host vendor specific log] GP/S Log at address 0x91 has 16 sectors [Host vendor specific log] GP/S Log at address 0x92 has 16 sectors [Host vendor specific log] GP/S Log at address 0x93 has 16 sectors [Host vendor specific log] GP/S Log at address 0x94 has 16 sectors [Host vendor specific log] GP/S Log at address 0x95 has 16 sectors [Host vendor specific log] GP/S Log at address 0x96 has 16 sectors [Host vendor specific log] GP/S Log at address 0x97 has 16 sectors [Host vendor specific log] GP/S Log at address 0x98 has 16 sectors [Host vendor specific log] GP/S Log at address 0x99 has 16 sectors [Host vendor specific log] GP/S Log at address 0x9a has 16 sectors [Host vendor specific log] GP/S Log at address 0x9b has 16 sectors [Host vendor specific log] GP/S Log at address 0x9c has 16 sectors [Host vendor specific log] GP/S Log at address 0x9d has 16 sectors [Host vendor specific log] GP/S Log at address 0x9e has 16 sectors [Host vendor specific log] GP/S Log at address 0x9f has 16 sectors [Host vendor specific log] GP/S Log at address 0xa1 has 20 sectors [Device vendor specific log] GP Log at address 0xa2 has 2248 sectors [Device vendor specific log] GP/S Log at address 0xa8 has 20 sectors [Device vendor specific log] GP/S Log at address 0xa9 has 1 sectors [Device vendor specific log] GP Log at address 0xb0 has 2819 sectors [Device vendor specific log] GP Log at address 0xbe has 65535 sectors [Device vendor specific log] GP Log at address 0xbf has 65535 sectors [Device vendor specific log] GP/S Log at address 0xe0 has 1 sectors [SCT Command/Status] GP/S Log at address 0xe1 has 1 sectors [SCT Data Transfer] SMART Extended Comprehensive Error Log Version: 1 (5 sectors) Device Error Count: 5 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 5 [4] occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 4d 08 00 db 10 00 00 Error: UNC at LBA = 0x4d0800db10 = 330846755600 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 00 00 1a 00 19 72 00 2f a2 40 00 11d+02:29:18.542 READ FPDMA QUEUED 60 00 00 00 1a 00 19 3e 00 2d 83 40 00 11d+02:29:18.542 READ FPDMA QUEUED 60 00 00 00 1b 00 19 38 00 2a 07 40 00 11d+02:29:18.541 READ FPDMA QUEUED 60 00 00 00 19 00 19 35 00 2a 2b 40 00 11d+02:29:18.541 READ FPDMA QUEUED 60 00 00 00 1c 00 19 b4 00 28 ec 40 00 11d+02:29:18.541 READ FPDMA QUEUED Error 4 [3] occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 4d 08 00 db 10 00 00 Error: UNC at LBA = 0x4d0800db10 = 330846755600 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 00 00 1a 00 19 72 00 2f a2 40 00 11d+02:29:15.783 READ FPDMA QUEUED 60 00 00 00 1a 00 19 3e 00 2d 83 40 00 11d+02:29:15.780 READ FPDMA QUEUED 60 00 00 00 1b 00 19 38 00 2a 07 40 00 11d+02:29:15.732 READ FPDMA QUEUED 60 00 00 00 19 00 19 35 00 2a 2b 40 00 11d+02:29:15.732 READ FPDMA QUEUED 60 00 00 00 1c 00 19 b4 00 28 ec 40 00 11d+02:29:15.731 READ FPDMA QUEUED Error 3 [2] occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 4d 08 00 db 10 00 00 Error: UNC at LBA = 0x4d0800db10 = 330846755600 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 00 00 1b 00 19 38 00 2a 07 40 00 11d+02:29:12.889 READ FPDMA QUEUED 60 00 00 00 19 00 19 35 00 2a 2b 40 00 11d+02:29:12.889 READ FPDMA QUEUED 60 00 00 00 1c 00 19 b4 00 28 ec 40 00 11d+02:29:12.888 READ FPDMA QUEUED 60 00 00 00 1c 00 4d 4f 00 e3 b5 40 00 11d+02:29:12.888 READ FPDMA QUEUED 60 00 00 00 1a 00 4d 07 00 db fc 40 00 11d+02:29:12.888 READ FPDMA QUEUED Error 2 [1] occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 4d 08 00 db 10 00 00 Error: UNC at LBA = 0x4d0800db10 = 330846755600 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 00 00 1b 00 19 38 00 2a 07 40 00 11d+02:29:10.011 READ FPDMA QUEUED 60 00 00 00 19 00 19 35 00 2a 2b 40 00 11d+02:29:10.011 READ FPDMA QUEUED 60 00 00 00 1c 00 19 b4 00 28 ec 40 00 11d+02:29:10.010 READ FPDMA QUEUED 60 00 00 00 1c 00 4d 4f 00 e3 b5 40 00 11d+02:29:10.010 READ FPDMA QUEUED 60 00 00 00 1a 00 4d 07 00 db fc 40 00 11d+02:29:10.010 READ FPDMA QUEUED Error 1 [0] occurred at disk power-on lifetime: 18292 hours (762 days + 4 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 4d 08 00 db 10 00 00 Error: UNC at LBA = 0x4d0800db10 = 330846755600 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 00 00 1b 00 19 38 00 2a 07 40 00 11d+02:29:07.148 READ FPDMA QUEUED 60 00 00 00 19 00 19 35 00 2a 2b 40 00 11d+02:29:07.140 READ FPDMA QUEUED 60 00 00 00 1c 00 19 b4 00 28 ec 40 00 11d+02:29:07.131 READ FPDMA QUEUED 60 00 00 00 1c 00 4d 4f 00 e3 b5 40 00 11d+02:29:07.117 READ FPDMA QUEUED 60 00 00 00 35 00 4d c0 00 e0 09 40 00 11d+02:29:07.111 READ FPDMA QUEUED SMART Extended Self-test Log Version: 1 (1 sectors) No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 522 (0x020a) SCT Support Level: 1 Device State: Active (0) Current Temperature: 34 Celsius Power Cycle Min/Max Temperature: 33/37 Celsius Lifetime Min/Max Temperature: 25/44 Celsius Under/Over Temperature Limit Count: 0/10423 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/ 0 Celsius Min/Max Temperature Limit: 0/ 0 Celsius Temperature History Size (Index): 128 (24) Index Estimated Time Temperature Celsius 25 2012-02-08 17:53 35 **************** ... ..( 53 skipped). .. **************** 79 2012-02-08 18:47 35 **************** 80 2012-02-08 18:48 34 *************** ... ..( 71 skipped). .. *************** 24 2012-02-08 20:00 34 *************** SCT Error Recovery Control: Read: Disabled Write: Disabled SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x000a 2 12 Device-to-host register FISes sent due to a COMRESET 0x0001 2 0 Command failed due to ICRC error 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS smartctl -l gplog,0x10 /dev/ada9 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net General Purpose Log 0x10 [NCQ Command Error log], Page 0-0 (of 1) 0000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00001a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00001b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00001c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00001d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00001e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0(backup3)# smartctl -l sataphy /dev/ada9 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x000a 2 12 Device-to-host register FISes sent due to a COMRESET 0x0001 2 0 Command failed due to ICRC error 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0(backup3)# smartctl -l sataphy /dev/ada10 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x000a 2 11 Device-to-host register FISes sent due to a COMRESET 0x0001 2 0 Command failed due to ICRC error 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0(backup3)# smartctl -l sataphy /dev/ada11 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x000a 2 12 Device-to-host register FISes sent due to a COMRESET 0x0001 2 0 Command failed due to ICRC error 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0(backup3)# smartctl -l sataphy /dev/ada12 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x000a 2 5 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x8000 4 625971 Vendor specific 0(backup3)# ---Mike -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/
the drive has reallocated some sectors and normally drives should never reallocate sectors unless it has trouble reading/writing to them. also, that drive has known firmware problems so it sounds like the drive needs replacing On Feb 9, 2012 10:38 AM, "Mike Tancsa" <mike@sentex.net> wrote:> On 2/9/2012 11:34 AM, Jeremy Chadwick wrote: > > > > You will probably need to "track these drives" on a regular basis. That > > is to say, set up some cronjob or similar that logs the above output to > > a file (appends data to it), specifically output from smartctl -A (not > > -a and not -x) and smartctl -l sataphy on a per-disk basis. smartd can > > track SMART attribute changes, but does not track GPLog changes. Make > > sure to put timestamps in your logs. > > Thanks very much for having a look, and the suggestions. It think this > is the way to go to see which drive my have errors incrementing. > Alexander, is there a better way you can suggest ? > > > > > As for fixing the problem: I have no idea how you would go about this. > > Use of port multipliers involves additional cables, possibly of shoddy > > quality, or other components which may not be decent/reliable. > > > Possibly. Cables are one of those things I am happy to "pay extra for > better quality" but how does one assess quality of such parts. > > > > > Overall, this is just one of the many reasons why I avoid PMs, as well > > as avoid eSATA (especially eSATA). > > Yeah, at some point it doesnt really work with too many PMs, especially > if you cant query the thing to find out where things are "bad". I think > for the next version of this box I will use the newer generation 3ware > SAS/SATA controller > > ---Mike > > > > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, mike@sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/ > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >
On 2/9/2012 1:37 PM, Mike Tancsa wrote:> On 2/9/2012 11:34 AM, Jeremy Chadwick wrote: >> >> You will probably need to "track these drives" on a regular basis. That >> is to say, set up some cronjob or similar that logs the above output to >> a file (appends data to it), specifically output from smartctl -A (not >> -a and not -x) and smartctl -l sataphy on a per-disk basis. smartd can >> track SMART attribute changes, but does not track GPLog changes. Make >> sure to put timestamps in your logs. > > Thanks very much for having a look, and the suggestions. It think this > is the way to go to see which drive my have errors incrementing. > Alexander, is there a better way you can suggest ?Got a few more of the READ LOG EXT errors and I did a snapshot of all the disks post error to compare with the snapshots from cron this AM. Unfortunately some of the deltas were on the one new port multiplier and some were on the motherboard sata. Feb 9 04:34:55 backup3 kernel: siisch1: Error while READ LOG EXT Feb 10 16:05:53 backup3 kernel: siisch1: Error while READ LOG EXT Feb 10 16:06:53 backup3 kernel: siisch1: Error while READ LOG EXT Feb 10 16:07:06 backup3 last message repeated 3 times Feb 10 16:18:24 backup3 last message repeated 16 times Feb 10 16:18:24 backup3 kernel: Feb 10 16:18:39 backup3 kernel: siisch1: Error while READ LOG EXT Feb 10 16:19:10 backup3 kernel: siisch1: Error while READ LOG EXT Feb 10 16:20:27 backup3 last message repeated 4 times Feb 10 16:20:27 backup3 kernel: Feb 10 16:20:30 backup3 kernel: siisch1: Error while READ LOG EXT Feb 10 16:21:33 backup3 kernel: siisch1: Error while READ LOG EXT Feb 10 16:23:23 backup3 last message repeated 8 times On ada4, -199 UDMA_CRC_Error_Count -O--CK 200 199 000 - 13 +199 UDMA_CRC_Error_Count -O--CK 200 199 000 - 32 SATA Phy Event Counters (GP Log 0x11) ID Size Value Description -0x0001 2 13 Command failed due to ICRC error -0x0002 2 13 R_ERR response for data FIS -0x0003 2 13 R_ERR response for device-to-host data FIS +0x0001 2 32 Command failed due to ICRC error +0x0002 2 32 R_ERR response for data FIS +0x0003 2 32 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS -0x0005 2 0 R_ERR response for non-data FIS -0x0006 2 0 R_ERR response for device-to-host non-data FIS +0x0005 2 1 R_ERR response for non-data FIS +0x0006 2 1 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x000a 2 0 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS -0x8000 4 744462 Vendor specific +0x8000 4 785195 Vendor specific General Purpose Log 0x10 [NCQ Command Error log], Page 0-0 (of 1) -0000000: 05 00 41 84 04 9a 53 40 00 00 00 00 00 00 00 00 |..A...S@........| +0000000: 06 00 41 84 f2 39 6d 40 2d 00 00 00 00 00 00 00 |..A..9m@-.......| -00001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa |................| +00001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 25 |...............%| ada5 -199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 11 +199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 22 -0x0001 2 11 Command failed due to ICRC error -0x0002 2 11 R_ERR response for data FIS -0x0003 2 11 R_ERR response for device-to-host data FIS +0x0001 2 22 Command failed due to ICRC error +0x0002 2 22 R_ERR response for data FIS +0x0003 2 22 R_ERR response for device-to-host data FIS ada6 -199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 8 +199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 25 SATA Phy Event Counters (GP Log 0x11) ID Size Value Description -0x0001 2 8 Command failed due to ICRC error -0x0002 2 8 R_ERR response for data FIS -0x0003 2 8 R_ERR response for device-to-host data FIS +0x0001 2 25 Command failed due to ICRC error +0x0002 2 25 R_ERR response for data FIS +0x0003 2 25 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x000a 2 0 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS -0x8000 4 744462 Vendor specific +0x8000 4 785195 Vendor specific ada7 -199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 13 +199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 30 SATA Phy Event Counters (GP Log 0x11) ID Size Value Description -0x0001 2 13 Command failed due to ICRC error -0x0002 2 13 R_ERR response for data FIS -0x0003 2 13 R_ERR response for device-to-host data FIS +0x0001 2 30 Command failed due to ICRC error +0x0002 2 31 R_ERR response for data FIS +0x0003 2 31 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 1 R_ERR response for non-data FIS 0x0006 2 1 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x000a 2 0 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS -0x8000 4 744460 Vendor specific +0x8000 4 785193 Vendor specific General Purpose Log 0x10 [NCQ Command Error log], Page 0-0 (of 1) -0000000: 19 00 41 84 74 3d 4a 40 29 00 00 00 00 00 00 00 |..A.t=J@).......| +0000000: 15 00 41 84 d7 03 1f 40 2d 00 00 00 00 00 00 00 |..A....@-.......| 0000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| @@ -238,5 +244,5 @@ 00001c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00001d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00001e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| -00001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b3 |................| +00001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b5 |................| ada9 ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE - 1 Raw_Read_Error_Rate POSR-- 115 099 006 - 91821743 + 1 Raw_Read_Error_Rate POSR-- 117 099 006 - 155365055 3 Spin_Up_Time PO---- 093 092 000 - 0 4 Start_Stop_Count -O--CK 100 100 020 - 68 5 Reallocated_Sector_Ct PO--CK 100 100 036 - 2 - 7 Seek_Error_Rate POSR-- 088 060 030 - 792342525 - 9 Power_On_Hours -O--CK 074 074 000 - 22792 + 7 Seek_Error_Rate POSR-- 088 060 030 - 792482445 + 9 Power_On_Hours -O--CK 074 074 000 - 22803 10 Spin_Retry_Count PO--C- 100 100 097 - 2 12 Power_Cycle_Count -O--CK 100 100 020 - 68 184 End-to-End_Error -O--CK 100 100 099 - 0 187 Reported_Uncorrect -O--CK 095 095 000 - 5 188 Command_Timeout -O--CK 100 100 000 - 0 189 High_Fly_Writes -O-RCK 001 001 000 - 961 -190 Airflow_Temperature_Cel -O---K 064 056 045 - 36 (Min/Max 33/37) -194 Temperature_Celsius -O---K 036 044 000 - 36 (0 25 0 0 0) -195 Hardware_ECC_Recovered -O-RC- 050 030 000 - 91821743 +190 Airflow_Temperature_Cel -O---K 066 056 045 - 34 (Min/Max 33/37) +194 Temperature_Celsius -O---K 034 044 000 - 34 (0 25 0 0 0) +195 Hardware_ECC_Recovered -O-RC- 050 030 000 - 155365055 197 Current_Pending_Sector -O--C- 100 100 000 - 0 198 Offline_Uncorrectable ----C- 100 100 000 - 0 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 ada10 SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE - 1 Raw_Read_Error_Rate POSR-- 118 099 006 - 196445860 + 1 Raw_Read_Error_Rate POSR-- 107 099 006 - 13128068 3 Spin_Up_Time PO---- 095 095 000 - 0 4 Start_Stop_Count -O--CK 100 100 020 - 216 5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0 - 7 Seek_Error_Rate POSR-- 087 060 030 - 586360650 - 9 Power_On_Hours -O--CK 077 077 000 - 20319 + 7 Seek_Error_Rate POSR-- 087 060 030 - 586495516 + 9 Power_On_Hours -O--CK 077 077 000 - 20330 10 Spin_Retry_Count PO--C- 100 100 097 - 0 12 Power_Cycle_Count -O--CK 100 100 020 - 113 183 Runtime_Bad_Block -O--CK 100 100 000 - 0 @@ -69,15 +69,15 @@ 187 Reported_Uncorrect -O--CK 100 100 000 - 0 188 Command_Timeout -O--CK 100 100 000 - 0 189 High_Fly_Writes -O-RCK 099 099 000 - 1 -190 Airflow_Temperature_Cel -O---K 067 062 045 - 33 (Min/Max 31/34) -194 Temperature_Celsius -O---K 033 040 000 - 33 (0 22 0 0 0) -195 Hardware_ECC_Recovered -O-RC- 040 018 000 - 196445860 +190 Airflow_Temperature_Cel -O---K 068 062 045 - 32 (Min/Max 31/34) +194 Temperature_Celsius -O---K 032 040 000 - 32 (0 22 0 0 0) +195 Hardware_ECC_Recovered -O-RC- 028 018 000 - 13128068 197 Current_Pending_Sector -O--C- 100 100 000 - 0 198 Offline_Uncorrectable ----C- 100 100 000 - 0 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 -240 Head_Flying_Hours ------ 100 253 000 - 205935091929084 -241 Total_LBAs_Written ------ 100 253 000 - 1286405353 -242 Total_LBAs_Read ------ 100 253 000 - 708601879 +240 Head_Flying_Hours ------ 100 253 000 - 221530118180872 +241 Total_LBAs_Written ------ 100 253 000 - 3323838357 +242 Total_LBAs_Read ------ 100 253 000 - 1778396343 ada11 ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE - 1 Raw_Read_Error_Rate POSR-- 120 097 006 - 242285977 + 1 Raw_Read_Error_Rate POSR-- 113 097 006 - 58229866 3 Spin_Up_Time PO---- 092 091 000 - 0 4 Start_Stop_Count -O--CK 100 100 020 - 69 5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0 - 7 Seek_Error_Rate POSR-- 073 060 030 - 133894632808 - 9 Power_On_Hours -O--CK 072 072 000 - 25283 + 7 Seek_Error_Rate POSR-- 073 060 030 - 133894764364 + 9 Power_On_Hours -O--CK 072 072 000 - 25294 10 Spin_Retry_Count PO--C- 100 100 097 - 3 12 Power_Cycle_Count -O--CK 100 100 020 - 82 184 End-to-End_Error -O--CK 100 100 099 - 0 187 Reported_Uncorrect -O--CK 100 100 000 - 0 188 Command_Timeout -O--CK 100 089 000 - 124555952157 189 High_Fly_Writes -O-RCK 080 080 000 - 20 -190 Airflow_Temperature_Cel -O---K 059 050 045 - 41 (Min/Max 38/42) -194 Temperature_Celsius -O---K 041 050 000 - 41 (0 22 0 0 0) -195 Hardware_ECC_Recovered -O-RC- 051 032 000 - 242285977 +190 Airflow_Temperature_Cel -O---K 061 050 045 - 39 (Min/Max 38/42) +194 Temperature_Celsius -O---K 039 050 000 - 39 (0 22 0 0 0) +195 Hardware_ECC_Recovered -O-RC- 050 032 000 - 58229866 197 Current_Pending_Sector -O--C- 100 100 000 - 0 198 Offline_Uncorrectable ----C- 100 100 000 - 0 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/