Peter Eriksson
2009-Sep-09 14:24 UTC
[zfs-discuss] [fm-discuss] self test failure on Intel X25-E SSD
Done some more testing, and I think my X4240/mpt/X25-problems must be something else. Attempting to read (with smartctl) the self test log on the 8850-firmware X25-E gives better results than with the old firmware: X25-E running firmware 8850 on an X4240 with mpt controller: # smartctl -d scsi -l selftest /dev/rdsk/c1t15d0 smartctl version 5.38 [i386-pc-solaris2.10] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ No self-tests have been logged Long (extended) Self Test duration: 120 seconds [2.0 minutes] X25-M running firmware 8820 on an X4240 with mpt controller: # smartctl -d scsi -l selftest /dev/rdsk/c1t14d0 smartctl version 5.38 [i386-pc-solaris2.10] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ scsiPrintSelfTest Failed [I/O error] Plus SCSI errors on the console: Sep 9 16:11:44 merope scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci10de,375 at f/pci1000,3150 at 0/sd at e,0 (sd31): Sep 9 16:11:44 merope SCSI transport failed: reason ''<unknown reason>'': giving up Sep 9 16:12:01 merope scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci10de,375 at f/pci1000,3150 at 0/sd at e,0 (sd31): Sep 9 16:12:01 merope Error for Command: write Error Level: Retryable Sep 9 16:12:01 merope scsi: [ID 107833 kern.notice] Requested Block: 42976 Error Block: 42976 Sep 9 16:12:01 merope scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: CVEM8465006B Sep 9 16:12:01 merope scsi: [ID 107833 kern.notice] Sense Key: Unit Attention Sep 9 16:12:01 merope scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 X25-E running firmware 8621 in an X4500 (marvell controller): # /ifm/bin/smartctl -d scsi -l selftest /dev/rdsk/c5t7d0 smartctl version 5.38 [i386-pc-solaris2.10] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] # 2 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] # 3 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] # 4 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] # 5 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] # 6 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] # 7 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] # 8 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] # 9 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] #10 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] #11 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] #12 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] #13 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] #14 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] #15 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] #16 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] #17 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] #18 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] #19 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] #20 Default Interrupted (bus reset ?) - 9216 154621181988 [0xb 0x40 0x82] And SCSI errors on the console: Sep 9 16:14:27 andromeda scsi: [ID 107833 kern.warning] WARNING: /pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 7,0 (sd47): Sep 9 16:14:27 andromeda Error for Command: mode sense Error Level: Informational Sep 9 16:14:27 andromeda scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 Sep 9 16:14:27 andromeda scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Sep 9 16:14:27 andromeda scsi: [ID 107833 kern.notice] Sense Key: Illegal Request Sep 9 16:14:27 andromeda scsi: [ID 107833 kern.notice] ASC: 0x24 (invalid field in cdb), ASCQ: 0x0, FRU: 0x0 This drive is generating a lot of FMA events too regarding the failed "selftests". Going to replace it with one running 8850 as soon as possible. (Requires some fiddling since I can''t "zpool remove" the SLOG device - *sigh*) I haven''t had time to test with an X25-E with the old firmware on an X4240 with the mpt controller but I think things are looking good with the 8850 firmware. -- This message posted from opensolaris.org