Peter Eriksson
2009-Sep-09 14:24 UTC
[zfs-discuss] [fm-discuss] self test failure on Intel X25-E SSD
Done some more testing, and I think my X4240/mpt/X25-problems must be something
else.
Attempting to read (with smartctl) the self test log on the 8850-firmware X25-E
gives better results
than with the old firmware:
X25-E running firmware 8850 on an X4240 with mpt controller:
# smartctl -d scsi -l selftest /dev/rdsk/c1t15d0
smartctl version 5.38 [i386-pc-solaris2.10] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
No self-tests have been logged
Long (extended) Self Test duration: 120 seconds [2.0 minutes]
X25-M running firmware 8820 on an X4240 with mpt controller:
# smartctl -d scsi -l selftest /dev/rdsk/c1t14d0
smartctl version 5.38 [i386-pc-solaris2.10] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
scsiPrintSelfTest Failed [I/O error]
Plus SCSI errors on the console:
Sep 9 16:11:44 merope scsi: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci10de,375 at f/pci1000,3150 at 0/sd at e,0 (sd31):
Sep 9 16:11:44 merope SCSI transport failed: reason ''<unknown
reason>'': giving up
Sep 9 16:12:01 merope scsi: [ID 107833 kern.warning] WARNING: /pci at
0,0/pci10de,375 at f/pci1000,3150 at 0/sd at e,0 (sd31):
Sep 9 16:12:01 merope Error for Command: write Error Level:
Retryable
Sep 9 16:12:01 merope scsi: [ID 107833 kern.notice] Requested Block: 42976
Error Block: 42976
Sep 9 16:12:01 merope scsi: [ID 107833 kern.notice] Vendor: ATA
Serial Number: CVEM8465006B
Sep 9 16:12:01 merope scsi: [ID 107833 kern.notice] Sense Key: Unit
Attention
Sep 9 16:12:01 merope scsi: [ID 107833 kern.notice] ASC: 0x29 (power on,
reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
X25-E running firmware 8621 in an X4500 (marvell controller):
# /ifm/bin/smartctl -d scsi -l selftest /dev/rdsk/c5t7d0
smartctl version 5.38 [i386-pc-solaris2.10] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err
[SK ASC ASQ]
Description number (hours)
# 1 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
# 2 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
# 3 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
# 4 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
# 5 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
# 6 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
# 7 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
# 8 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
# 9 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
#10 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
#11 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
#12 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
#13 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
#14 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
#15 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
#16 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
#17 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
#18 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
#19 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
#20 Default Interrupted (bus reset ?) - 9216 154621181988
[0xb 0x40 0x82]
And SCSI errors on the console:
Sep 9 16:14:27 andromeda scsi: [ID 107833 kern.warning] WARNING: /pci at
2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 7,0 (sd47):
Sep 9 16:14:27 andromeda Error for Command: mode sense Error
Level: Informational
Sep 9 16:14:27 andromeda scsi: [ID 107833 kern.notice] Requested Block:
0 Error Block: 0
Sep 9 16:14:27 andromeda scsi: [ID 107833 kern.notice] Vendor: ATA
Serial Number:
Sep 9 16:14:27 andromeda scsi: [ID 107833 kern.notice] Sense Key:
Illegal Request
Sep 9 16:14:27 andromeda scsi: [ID 107833 kern.notice] ASC: 0x24
(invalid field in cdb), ASCQ: 0x0, FRU: 0x0
This drive is generating a lot of FMA events too regarding the failed
"selftests". Going to
replace it with one running 8850 as soon as possible. (Requires some fiddling
since I can''t "zpool remove" the SLOG device - *sigh*)
I haven''t had time to test with an X25-E with the old firmware on an
X4240 with the mpt controller but I think things are looking good with the 8850
firmware.
--
This message posted from opensolaris.org