On my ZFS server: (info on: http://www.tegenbosch28.nl/FreeBSD/systems/ZFS/ ) +ahcich4: Timeout on slot 23 port 0 +ahcich4: is 00000000 cs 00800000 ss 00000000 rs 00800000 tfd c0 serr 00000000 cmd 0004d717 +(ada3:ahcich4:0:0:0): lost device +(ada3:ahcich4:0:0:0): removing device entry +ada3 at ahcich4 bus 0 scbus5 target 0 lun 0 +ada3: <ST31500341AS CC1H> ATA-8 SATA 2.x device +ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) +ada3: Command Queueing enabled +ada3: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) The reconnect occurs immediately after the disconnect. I had some discussions with Jeremy Chadwick, so below are the smartctl stats. The system was not particularly busy at that moment. Is this disk failure, or why other did it disconnect. --WjW Smartctl output: [~wjw] root@zfs.digiware.nl# smartctl -A /dev/ada3 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION ==SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 167678942 3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 35 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 13 7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 523895462 9 Power_On_Hours 0x0032 085 085 000 Old_age Always - 13172 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 35 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 1 189 High_Fly_Writes 0x003a 071 071 000 Old_age Always - 29 190 Airflow_Temperature_Cel 0x0022 067 047 045 Old_age Always - 33 (Min/Max 32/33) 194 Temperature_Celsius 0x0022 033 053 000 Old_age Always - 33 (0 17 0 0 0) 195 Hardware_ECC_Recovered 0x001a 048 035 000 Old_age Always - 167678942 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 3 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 3 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 95747705942900 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 656030587 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 2585367414 [~wjw] root@zfs.digiware.nl# smartctl -l devstat /dev/ada3 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net Device Statistics (GP Log 0x04) not supported [~wjw] root@zfs.digiware.nl# smartctl -l sataphy /dev/ada3 smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x000a 2 1 Device-to-host register FISes sent due to a COMRESET 0x0001 2 0 Command failed due to ICRC error 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS
Hi, On Fri, Mar 02, 2012 at 09:49:59AM +0100, Willem Jan Withagen wrote:> On my ZFS server: > (info on: http://www.tegenbosch28.nl/FreeBSD/systems/ZFS/ ) > > +ahcich4: Timeout on slot 23 port 0 > +ahcich4: is 00000000 cs 00800000 ss 00000000 rs 00800000 tfd c0 serr > 00000000 cmd 0004d717 > +(ada3:ahcich4:0:0:0): lost device > +(ada3:ahcich4:0:0:0): removing device entry > +ada3 at ahcich4 bus 0 scbus5 target 0 lun 0 > +ada3: <ST31500341AS CC1H> ATA-8 SATA 2.x device > +ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > +ada3: Command Queueing enabled > +ada3: 1430799MB (2930277168 512 byte sectors: 16H 63S/T 16383C) > > The reconnect occurs immediately after the disconnect. > > I had some discussions with Jeremy Chadwick, so below are the smartctl > stats. > > The system was not particularly busy at that moment. > Is this disk failure, or why other did it disconnect.I suggest changing the disk:> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail > Always - 13I guess, this growing soon and fast ...> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age > Always - 3 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age > Offline - 3Doesn't look too promising. - Oliver -- | Oliver Brandmueller http://sysadm.in/ ob@sysadm.in | | Ich bin das Internet. Sowahr ich Gott helfe. |