Ding Honghui
2011-Aug-16 06:17 UTC
[zfs-discuss] solaris 10u8 hangs with message Disconnected command timeout for Target 0
Hi, My solaris storage hangs. I login to the console and there is messages[1] display on the console. I can''t login into the console and seems the IO is totally blocked. The system is solaris 10u8 on Dell R710 with disk array Dell MD3000. 2 HBA cable connect the server and MD3000. The symptom is random. It is very appreciated if any one can help me out. Regards, Ding [1] Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /pci at 0,0/pci8086,3410 at 9 /pci8086,32c at 0/pci1028,1f04 at 8 (mpt1): Aug 16 13:14:16 nas-hz-02 Disconnected command timeout for Target 0 Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk at g60026b900053aa18000002a44b8f0ded (sd47): Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679073 Error Block: 1380679073 Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk at g60026b900053aa180000029e4b8f0d61 (sd41): Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679072 Error Block: 1380679072 Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk at g60026b900053aa18000002a24b8f0dc5 (sd45): Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679073 Error Block: 1380679073 Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk at g60026b900053aa180000029c4b8f0d35 (sd39): Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679072 Error Block: 1380679072 Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk at g60026b900053aa18000002984b8f0cd2 (sd35): Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679072 Error Block: 1380679072 Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110816/5216b386/attachment.html>
Andrew Gabriel
2011-Aug-16 06:41 UTC
[zfs-discuss] solaris 10u8 hangs with message Disconnected command timeout for Target 0
Ding Honghui wrote:> Hi, > > My solaris storage hangs. I login to the console and there is > messages[1] display on the console. > I can''t login into the console and seems the IO is totally blocked. > > The system is solaris 10u8 on Dell R710 with disk array Dell MD3000. 2 > HBA cable connect the server and MD3000. > The symptom is random. > > It is very appreciated if any one can help me out.The SCSI target you are talking to is being reset. "Unit Attention" means it''s forgotten what operating parameters have been negotiated with the system and is a warning the device might have been changed without the system knowing, and it''s telling you this happened because of "device internal reset". That sort of thing can happen if the firmware in the SCSI target crashes and restarts, or the power supply blips, or if the device was swapped. I don''t know anything about a Dell MD3000, but given it''s happened on lots of disks at the same moment following a timeout, it looks like the array power cycled or array firmware (if any) rebooted. (Not sure if a SCSI bus reset can do this or not.)> [1] > Aug 16 13:14:16 nas-hz-02 scsi: WARNING: > /pci at 0,0/pci8086,3410 at 9/pci8086,32c at 0/pci1028,1f04 at 8 (mpt1): > Aug 16 13:14:16 nas-hz-02 Disconnected command timeout for Target 0 > Aug 16 13:14:16 nas-hz-02 scsi: WARNING: > /scsi_vhci/disk at g60026b900053aa18000002a44b8f0ded (sd47): > Aug 16 13:14:16 nas-hz-02 Error for Command: > write(10) Error Level: Retryable > Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: > 1380679073 Error Block: 1380679073 > Aug 16 13:14:16 nas-hz-02 scsi: Vendor: > DELL Serial Number: > Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention > Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal > reset), ASCQ: 0x4, FRU: 0x0 > Aug 16 13:14:16 nas-hz-02 scsi: WARNING: > /scsi_vhci/disk at g60026b900053aa180000029e4b8f0d61 (sd41): > Aug 16 13:14:16 nas-hz-02 Error for Command: > write(10) Error Level: Retryable > Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: > 1380679072 Error Block: 1380679072 > Aug 16 13:14:16 nas-hz-02 scsi: Vendor: > DELL Serial Number: > Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention > Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal > reset), ASCQ: 0x4, FRU: 0x0 > Aug 16 13:14:16 nas-hz-02 scsi: WARNING: > /scsi_vhci/disk at g60026b900053aa18000002a24b8f0dc5 (sd45): > Aug 16 13:14:16 nas-hz-02 Error for Command: > write(10) Error Level: Retryable > Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: > 1380679073 Error Block: 1380679073 > Aug 16 13:14:16 nas-hz-02 scsi: Vendor: > DELL Serial Number: > Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention > Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal > reset), ASCQ: 0x4, FRU: 0x0 > Aug 16 13:14:16 nas-hz-02 scsi: WARNING: > /scsi_vhci/disk at g60026b900053aa180000029c4b8f0d35 (sd39): > Aug 16 13:14:16 nas-hz-02 Error for Command: > write(10) Error Level: Retryable > Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: > 1380679072 Error Block: 1380679072 > Aug 16 13:14:16 nas-hz-02 scsi: Vendor: > DELL Serial Number: > Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention > Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal > reset), ASCQ: 0x4, FRU: 0x0 > Aug 16 13:14:16 nas-hz-02 scsi: WARNING: > /scsi_vhci/disk at g60026b900053aa18000002984b8f0cd2 (sd35): > Aug 16 13:14:16 nas-hz-02 Error for Command: > write(10) Error Level: Retryable > Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: > 1380679072 Error Block: 1380679072 > Aug 16 13:14:16 nas-hz-02 scsi: Vendor: > DELL Serial Number: > Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention > Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal > reset), ASCQ: 0x4, FRU: 0x0-- Andrew Gabriel
Richard Elling
2011-Aug-18 00:20 UTC
[zfs-discuss] solaris 10u8 hangs with message Disconnected command timeout for Target 0
On Aug 15, 2011, at 11:17 PM, Ding Honghui wrote:> My solaris storage hangs. I login to the console and there is messages[1] display on the console. > I can''t login into the console and seems the IO is totally blocked. > > The system is solaris 10u8 on Dell R710 with disk array Dell MD3000. 2 HBA cable connect the server and MD3000. > The symptom is random.This symptom is consistent with a broken SATA disk behind a SAS expander. Unfortunately, the mpt driver is closed source, so we can only infer what the code does by using the open source mpt_sas driver as (hopefully) a derivative.> > It is very appreciated if any one can help me out. > > Regards, > Ding > > [1] > Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /pci at 0,0/pci8086,3410 at 9/pci8086,32c at 0/pci1028,1f04 at 8 (mpt1): > Aug 16 13:14:16 nas-hz-02 Disconnected command timeout for Target 0A command did not complete and the mpt driver reset the target. If that target is an expander, then everything behind the expander can reset, resulting in the aborts of any in-flight commands, as follows...> Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk at g60026b900053aa18000002a44b8f0ded (sd47): > Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable > Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679073 Error Block: 1380679073 > Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: > Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention > Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 > Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk at g60026b900053aa180000029e4b8f0d61 (sd41): > Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable > Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679072 Error Block: 1380679072 > Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: > Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention > Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 > Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk at g60026b900053aa18000002a24b8f0dc5 (sd45): > Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable > Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679073 Error Block: 1380679073 > Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: > Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention > Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 > Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk at g60026b900053aa180000029c4b8f0d35 (sd39): > Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable > Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679072 Error Block: 1380679072 > Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: > Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention > Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 > Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk at g60026b900053aa18000002984b8f0cd2 (sd35): > Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable > Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679072 Error Block: 1380679072 > Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: > Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention > Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0You will be happiest if you do not use SATA disks directly connected to SAS expanders. -- richard