Peter Eriksson
2009-Sep-09  14:24 UTC
[zfs-discuss] [fm-discuss] self test failure on Intel X25-E SSD
Done some more testing, and I think my X4240/mpt/X25-problems must be something
else.
Attempting to read (with smartctl) the self test log on the 8850-firmware X25-E
gives better results
than with the old firmware:
X25-E running firmware 8850 on an X4240 with mpt controller:
# smartctl -d scsi -l selftest /dev/rdsk/c1t15d0
smartctl version 5.38 [i386-pc-solaris2.10] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
No self-tests have been logged
Long (extended) Self Test duration: 120 seconds [2.0 minutes]
X25-M running firmware 8820 on an X4240 with mpt controller:
# smartctl -d scsi -l selftest /dev/rdsk/c1t14d0
smartctl version 5.38 [i386-pc-solaris2.10] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
scsiPrintSelfTest Failed [I/O error]
Plus SCSI errors on the console:
Sep  9 16:11:44 merope scsi: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci10de,375 at f/pci1000,3150 at 0/sd at e,0 (sd31):
Sep  9 16:11:44 merope  SCSI transport failed: reason ''<unknown
reason>'': giving up
Sep  9 16:12:01 merope scsi: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci10de,375 at f/pci1000,3150 at 0/sd at e,0 (sd31):
Sep  9 16:12:01 merope  Error for Command: write                   Error Level:
Retryable
Sep  9 16:12:01 merope scsi: [ID 107833 kern.notice]    Requested Block: 42976  
Error Block: 42976
Sep  9 16:12:01 merope scsi: [ID 107833 kern.notice]    Vendor: ATA             
Serial Number: CVEM8465006B
Sep  9 16:12:01 merope scsi: [ID 107833 kern.notice]    Sense Key: Unit
Attention
Sep  9 16:12:01 merope scsi: [ID 107833 kern.notice]    ASC: 0x29 (power on,
reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
X25-E running firmware 8621 in an X4500 (marvell controller):
# /ifm/bin/smartctl -d scsi -l selftest /dev/rdsk/c5t7d0
smartctl version 5.38 [i386-pc-solaris2.10] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err
[SK ASC ASQ]
     Description                              number   (hours)
# 1  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
# 2  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
# 3  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
# 4  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
# 5  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
# 6  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
# 7  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
# 8  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
# 9  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
#10  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
#11  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
#12  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
#13  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
#14  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
#15  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
#16  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
#17  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
#18  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
#19  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
#20  Default           Interrupted (bus reset ?)   -    9216      154621181988
[0xb 0x40 0x82]
And SCSI errors on the console: 
Sep  9 16:14:27 andromeda scsi: [ID 107833 kern.warning] WARNING: /pci at
2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 7,0 (sd47):
Sep  9 16:14:27 andromeda       Error for Command: mode sense              Error
Level: Informational
Sep  9 16:14:27 andromeda scsi: [ID 107833 kern.notice]         Requested Block:
0                         Error Block: 0
Sep  9 16:14:27 andromeda scsi: [ID 107833 kern.notice]         Vendor: ATA     
Serial Number:
Sep  9 16:14:27 andromeda scsi: [ID 107833 kern.notice]         Sense Key:
Illegal Request
Sep  9 16:14:27 andromeda scsi: [ID 107833 kern.notice]         ASC: 0x24
(invalid field in cdb), ASCQ: 0x0, FRU: 0x0
This drive is generating a lot of FMA events too regarding the failed
"selftests". Going to
replace it with one running 8850 as soon as possible. (Requires some fiddling
since I can''t "zpool remove" the SLOG device - *sigh*)
I haven''t had time to test with an X25-E with the old firmware on an
X4240 with the mpt controller but I think things are looking good with the 8850
firmware.
-- 
This message posted from opensolaris.org