Rob Nelson
2009-Dec-05 14:26 UTC
[zfs-discuss] How can we help fix MPT driver post build 129
How can we help with what is outlined below. I can reproduce these at will, so if anyone at Sun would like an environment to test this situation let me know. What is the best info to grab for you folks to help here? Thanks - nola This is in regard to these threads: http://www.opensolaris.org/jive/thread.jspa?messageID=421400񦸘 http://www.opensolaris.org/jive/thread.jspa?threadID=118947&tstart=0 http://www.opensolaris.org/jive/thread.jspa?threadID=117702&tstart=1 http://www.opensolaris.org/jive/thread.jspa?messageID=437031&tstart=0 And bug IDs: 6894775 mpt driver timeouts and bus resets under load 6900767 Server hang with LSI 1068E based SAS controller under load Exec Summary: Those using the LSI 1068 chipset with the LSI SAS2x IC expander have IO errors under load from about build 118 to 129 (last build I tested). At build 111b, it worked. If you take the same hardware and load test scripts, run under 111b your OK, run under @118 and on you suffer from for example: Dec 5 08:17:04 gb2000-007 scsi: [ID 365881 kern.info] /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:17:04 gb2000-007 Log info 0x31111000 received for target 79. Dec 5 08:17:04 gb2000-007 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 5 08:17:07 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:17:07 gb2000-007 SAS Discovery Error on port 4. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| Dec 5 08:18:09 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:18:09 gb2000-007 Disconnected command timeout for Target 79 Dec 5 08:18:14 gb2000-007 scsi: [ID 365881 kern.info] /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:18:14 gb2000-007 Log info 0x31130000 received for target 79. Dec 5 08:18:14 gb2000-007 scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 5 08:18:17 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:18:17 gb2000-007 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000 Dec 5 08:18:17 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:18:17 gb2000-007 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000 Dec 5 08:18:19 gb2000-007 scsi: [ID 365881 kern.info] /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:18:19 gb2000-007 Log info 0x31111000 received for target 79. Dec 5 08:18:19 gb2000-007 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 5 08:18:22 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:18:22 gb2000-007 SAS Discovery Error on port 4. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| Dec 5 08:19:24 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:19:24 gb2000-007 Disconnected command timeout for Target 79 Dec 5 08:19:29 gb2000-007 scsi: [ID 365881 kern.info] /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:19:29 gb2000-007 Log info 0x31130000 received for target 79. Dec 5 08:19:29 gb2000-007 scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 5 08:19:32 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:19:32 gb2000-007 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000 Dec 5 08:19:32 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:19:32 gb2000-007 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000 Dec 5 08:19:34 gb2000-007 scsi: [ID 365881 kern.info] /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:19:34 gb2000-007 Log info 0x31111000 received for target 79. Dec 5 08:19:34 gb2000-007 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 5 08:19:37 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:19:37 gb2000-007 SAS Discovery Error on port 4. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| Dec 5 08:20:39 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:20:39 gb2000-007 Disconnected command timeout for Target 79 Dec 5 08:20:39 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:20:39 gb2000-007 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000 Dec 5 08:20:44 gb2000-007 scsi: [ID 365881 kern.info] /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:20:44 gb2000-007 Log info 0x31130000 received for target 79. Dec 5 08:20:44 gb2000-007 scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 5 08:20:44 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:20:44 gb2000-007 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000 Dec 5 08:20:46 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:20:46 gb2000-007 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000 Dec 5 08:20:46 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:20:46 gb2000-007 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000 Dec 5 08:20:48 gb2000-007 scsi: [ID 365881 kern.info] /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:20:48 gb2000-007 Log info 0x31111000 received for target 79. Dec 5 08:20:48 gb2000-007 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 5 08:20:51 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:20:51 gb2000-007 SAS Discovery Error on port 4. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| Dec 5 08:21:54 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:21:54 gb2000-007 Disconnected command timeout for Target 79 Dec 5 08:21:54 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:21:54 gb2000-007 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000 Dec 5 08:21:58 gb2000-007 scsi: [ID 365881 kern.info] /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:21:58 gb2000-007 Log info 0x31130000 received for target 79. Dec 5 08:21:58 gb2000-007 scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 5 08:21:58 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:21:58 gb2000-007 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000 Dec 5 08:22:01 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:22:01 gb2000-007 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000 Dec 5 08:22:01 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:22:01 gb2000-007 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000 Dec 5 08:22:01 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:22:01 gb2000-007 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000 Dec 5 08:22:01 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:22:01 gb2000-007 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000 Dec 5 08:22:03 gb2000-007 scsi: [ID 365881 kern.info] /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:22:03 gb2000-007 Log info 0x31111000 received for target 79. Dec 5 08:22:03 gb2000-007 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 5 08:22:03 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0/sd at 4f,0 (sd96): Dec 5 08:22:03 gb2000-007 drive offline Dec 5 08:22:06 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /pci at 7a,0/pci8086,340e at 7/pci1000,30a0 at 0 (mpt1): Dec 5 08:22:06 gb2000-007 SAS Discovery Error on port 4. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| -- This message posted from opensolaris.org
Travis Tabbal
2009-Dec-07 21:32 UTC
[zfs-discuss] How can we help fix MPT driver post build 129
To be fair, I think it''s obvious that Sun people are looking into it and that users are willing to help diagnose and test. There were requests for particular data in those threads you linked to, have you sent yours? It might help them find a pattern in the errors. I understand the frustration that it hasn''t been fixed in a couple builds that they have been aware of it, but it could be a very tricky problem. It also sounds like it''s not reproducible on Sun hardware, so they have to get cards and such as well. It''s also less urgent now that they have identified a workaround that works for most of us. While disabling MSIs is not optimal, it does help a lot. -- This message posted from opensolaris.org