Hi all, Recently i upgrade from snv_118 to snv_125, and suddently i started to see this messages at /var/adm/messages : Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): Oct 22 12:54:37 SAN02 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x3112011a Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): Oct 22 12:56:47 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x3112011a Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): Oct 22 12:56:47 SAN02 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x3112011a Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): Oct 22 12:56:50 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x3112011a Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): Oct 22 12:56:50 SAN02 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x3112011a Is this a symptom of a disk error or some change was made in the driver?,that now i have more information, where in the past such information didn''t appear? Thanks, Bruno I''m using a LSI Logic SAS1068E B3 and i within lsiutil i have this behaviour : 1 MPT Port found Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC 1. mpt0 LSI Logic SAS1068E B3 105 011a0000 0 Select a device: [1-1 or 0 to quit] 1 1. Identify firmware, BIOS, and/or FCode 2. Download firmware (update the FLASH) 4. Download/erase BIOS and/or FCode (update the FLASH) 8. Scan for devices 10. Change IOC settings (interrupt coalescing) 13. Change SAS IO Unit settings 16. Display attached devices 20. Diagnostics 21. RAID actions 22. Reset bus 23. Reset target 42. Display operating system names for devices 45. Concatenate SAS firmware and NVDATA files 59. Dump PCI config space 60. Show non-default settings 61. Restore default settings 66. Show SAS discovery errors 69. Show board manufacturing information 97. Reset SAS link, HARD RESET 98. Reset SAS link 99. Reset port e Enable expert mode in menus p Enable paged mode w Enable logging Main menu, select an option: [1-99 or e/p/w or 0 to quit] 20 1. Inquiry Test 2. WriteBuffer/ReadBuffer/Compare Test 3. Read Test 4. Write/Read/Compare Test 8. Read Capacity / Read Block Limits Test 12. Display phy counters 13. Clear phy counters 14. SATA SMART Read Test 15. SEP (SCSI Enclosure Processor) Test 18. Report LUNs Test 19. Drive firmware download 20. Expander firmware download 21. Read Logical Blocks 99. Reset port e Enable expert mode in menus p Enable paged mode w Enable logging Diagnostics menu, select an option: [1-99 or e/p/w or 0 to quit] 12 Adapter Phy 0: Link Down, No Errors Adapter Phy 1: Link Down, No Errors Adapter Phy 2: Link Down, No Errors Adapter Phy 3: Link Down, No Errors Adapter Phy 4: Link Up, No Errors Adapter Phy 5: Link Up, No Errors Adapter Phy 6: Link Up, No Errors Adapter Phy 7: Link Up, No Errors Expander (Handle 0009) Phy 0: Link Up Invalid DWord Count 79,967,229 Running Disparity Error Count 63,036,893 Loss of DWord Synch Count 113 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 1: Link Up Invalid DWord Count 79,967,207 Running Disparity Error Count 78,339,626 Loss of DWord Synch Count 113 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 2: Link Up Invalid DWord Count 76,717,646 Running Disparity Error Count 73,334,563 Loss of DWord Synch Count 113 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 3: Link Up Invalid DWord Count 79,896,409 Running Disparity Error Count 76,199,329 Loss of DWord Synch Count 113 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 4: Link Up, No Errors Expander (Handle 0009) Phy 5: Link Up, No Errors Expander (Handle 0009) Phy 6: Link Up, No Errors Expander (Handle 0009) Phy 7: Link Up, No Errors Expander (Handle 0009) Phy 8: Link Up, No Errors Expander (Handle 0009) Phy 9: Link Up, No Errors Expander (Handle 0009) Phy 10: Link Up, No Errors Expander (Handle 0009) Phy 11: Link Up, No Errors Expander (Handle 0009) Phy 12: Link Up, No Errors Expander (Handle 0009) Phy 13: Link Up, No Errors Expander (Handle 0009) Phy 14: Link Up, No Errors Expander (Handle 0009) Phy 15: Link Up, No Errors Expander (Handle 0009) Phy 16: Link Up, No Errors Expander (Handle 0009) Phy 17: Link Up, No Errors Expander (Handle 0009) Phy 18: Link Up, No Errors Expander (Handle 0009) Phy 19: Link Up, No Errors Expander (Handle 0009) Phy 20: Link Down, No Errors Expander (Handle 0009) Phy 21: Link Down, No Errors Expander (Handle 0009) Phy 22: Link Up Invalid DWord Count 743,980 Running Disparity Error Count 38,796 Loss of DWord Synch Count 1 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 23: Link Down, No Errors Expander (Handle 0009) Phy 24: Link Down, No Errors Expander (Handle 0009) Phy 25: Link Down Invalid DWord Count 1,755 Running Disparity Error Count 408 Loss of DWord Synch Count 0 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 26: Link Down Invalid DWord Count 1,127 Running Disparity Error Count 1,022 Loss of DWord Synch Count 0 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 27: Link Down, No Errors Expander (Handle 0009) Phy 28: Link Down, No Errors Expander (Handle 0009) Phy 29: Link Down, No Errors Expander (Handle 0009) Phy 30: Link Down, No Errors Expander (Handle 0009) Phy 31: Link Down, No Errors Expander (Handle 0009) Phy 32: Link Down, No Errors Expander (Handle 0009) Phy 33: Link Down, No Errors Expander (Handle 0009) Phy 34: Link Down, No Errors Expander (Handle 0009) Phy 35: Link Down, No Errors Expander (Handle 0009) Phy 36: Link Up, No Errors Expander (Handle 0009) Phy 37: Link Down, No Errors Main menu, select an option: [1-99 or e/p/w or 0 to quit] 42 mpt0 is /dev/cfg/c5 B___T___L Type Operating System Device Name ScsiIo to Bus 0 Target 8 failed, IOCStatus = 004b (IOC Terminated) 0 10 0 Disk /dev/rdsk/c5t10d0s2 0 11 0 Disk /dev/rdsk/c5t11d0s2 0 12 0 Disk /dev/rdsk/c5t12d0s2 0 13 0 Disk /dev/rdsk/c5t13d0s2 0 14 0 Disk /dev/rdsk/c5t14d0s2 0 15 0 Disk /dev/rdsk/c5t15d0s2 0 16 0 Disk /dev/rdsk/c5t16d0s2 0 17 0 Disk /dev/rdsk/c5t17d0s2 0 18 0 Disk /dev/rdsk/c5t18d0s2 0 19 0 Disk /dev/rdsk/c5t19d0s2 0 20 0 Disk /dev/rdsk/c5t20d0s2 0 21 0 Disk /dev/rdsk/c5t21d0s2 0 22 0 Disk /dev/rdsk/c5t22d0s2 0 23 0 Disk /dev/rdsk/c5t23d0s2 0 24 0 Disk /dev/rdsk/c5t24d0s2 0 25 0 Disk /dev/rdsk/c5t25d0s2 0 26 0 Disk /dev/rdsk/c5t26d0s2 The iostat -En gives : iostat -En c4t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: SEAGATE ST32500N Revision: 3AZQ Serial No: Size: 250.06GB <250056000000 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 281 Predictive Failure Analysis: 0 c4t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: SEAGATE ST32502N Revision: SU0D Serial No: Size: 250.06GB <250056000000 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 285 Predictive Failure Analysis: 0 c3t0d0 Soft Errors: 0 Hard Errors: 9 Transport Errors: 0 Vendor: TEAC Product: DV-28E-V Revision: 1.AC Serial No: Size: 0.00GB <0 bytes> Media Error: 0 Device Not Ready: 9 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c5t10d0 Soft Errors: 18 Hard Errors: 1 Transport Errors: 9 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 18 Illegal Request: 8 Predictive Failure Analysis: 0 c5t11d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 Illegal Request: 8 Predictive Failure Analysis: 0 c5t12d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 Illegal Request: 8 Predictive Failure Analysis: 0 c5t13d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 Illegal Request: 8 Predictive Failure Analysis: 0 c5t14d0 Soft Errors: 16 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 16 Illegal Request: 8 Predictive Failure Analysis: 0 c5t15d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 6 Predictive Failure Analysis: 0 c5t16d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 6 Predictive Failure Analysis: 0 c5t17d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 6 Predictive Failure Analysis: 0 c5t18d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 6 Predictive Failure Analysis: 0 c5t19d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 6 Predictive Failure Analysis: 0 c5t20d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 6 Predictive Failure Analysis: 0 c5t21d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 6 Predictive Failure Analysis: 0 c5t22d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 6 Predictive Failure Analysis: 0 c5t23d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 6 Predictive Failure Analysis: 0 c5t24d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 6 Predictive Failure Analysis: 0 c5t25d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 6 Predictive Failure Analysis: 0 c5t26d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: Size: 1500.30GB <1500301910016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 Illegal Request: 6 Predictive Failure Analysis: 0 Thank you, Bruno -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Hi Bruno, I see some bugs associated with these messages (6694909) that point to an LSI firmware upgrade that cause these harmless errors to display. According to the 6694909 comments, this issue is documented in the release notes. As they are harmless, I wouldn''t worry about them. Maybe someone from the driver group can comment further. Cindy On 10/22/09 05:40, Bruno Sousa wrote:> Hi all, > > Recently i upgrade from snv_118 to snv_125, and suddently i started to > see this messages at /var/adm/messages : > > Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: > /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): > Oct 22 12:54:37 SAN02 mpt_handle_event: IOCStatus=0x8000, > IOCLogInfo=0x3112011a > Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: > /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): > Oct 22 12:56:47 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, > IOCLogInfo=0x3112011a > Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: > /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): > Oct 22 12:56:47 SAN02 mpt_handle_event: IOCStatus=0x8000, > IOCLogInfo=0x3112011a > Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: > /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): > Oct 22 12:56:50 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, > IOCLogInfo=0x3112011a > Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: > /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): > Oct 22 12:56:50 SAN02 mpt_handle_event: IOCStatus=0x8000, > IOCLogInfo=0x3112011a > > > Is this a symptom of a disk error or some change was made in the > driver?,that now i have more information, where in the past such > information didn''t appear? > > Thanks, > Bruno > > I''m using a LSI Logic SAS1068E B3 and i within lsiutil i have this > behaviour : > > > 1 MPT Port found > > Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC > 1. mpt0 LSI Logic SAS1068E B3 105 011a0000 0 > > Select a device: [1-1 or 0 to quit] 1 > > 1. Identify firmware, BIOS, and/or FCode > 2. Download firmware (update the FLASH) > 4. Download/erase BIOS and/or FCode (update the FLASH) > 8. Scan for devices > 10. Change IOC settings (interrupt coalescing) > 13. Change SAS IO Unit settings > 16. Display attached devices > 20. Diagnostics > 21. RAID actions > 22. Reset bus > 23. Reset target > 42. Display operating system names for devices > 45. Concatenate SAS firmware and NVDATA files > 59. Dump PCI config space > 60. Show non-default settings > 61. Restore default settings > 66. Show SAS discovery errors > 69. Show board manufacturing information > 97. Reset SAS link, HARD RESET > 98. Reset SAS link > 99. Reset port > e Enable expert mode in menus > p Enable paged mode > w Enable logging > > Main menu, select an option: [1-99 or e/p/w or 0 to quit] 20 > > 1. Inquiry Test > 2. WriteBuffer/ReadBuffer/Compare Test > 3. Read Test > 4. Write/Read/Compare Test > 8. Read Capacity / Read Block Limits Test > 12. Display phy counters > 13. Clear phy counters > 14. SATA SMART Read Test > 15. SEP (SCSI Enclosure Processor) Test > 18. Report LUNs Test > 19. Drive firmware download > 20. Expander firmware download > 21. Read Logical Blocks > 99. Reset port > e Enable expert mode in menus > p Enable paged mode > w Enable logging > > Diagnostics menu, select an option: [1-99 or e/p/w or 0 to quit] 12 > > Adapter Phy 0: Link Down, No Errors > > Adapter Phy 1: Link Down, No Errors > > Adapter Phy 2: Link Down, No Errors > > Adapter Phy 3: Link Down, No Errors > > Adapter Phy 4: Link Up, No Errors > > Adapter Phy 5: Link Up, No Errors > > Adapter Phy 6: Link Up, No Errors > > Adapter Phy 7: Link Up, No Errors > > Expander (Handle 0009) Phy 0: Link Up > Invalid DWord Count 79,967,229 > Running Disparity Error Count 63,036,893 > Loss of DWord Synch Count 113 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 1: Link Up > Invalid DWord Count 79,967,207 > Running Disparity Error Count 78,339,626 > Loss of DWord Synch Count 113 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 2: Link Up > Invalid DWord Count 76,717,646 > Running Disparity Error Count 73,334,563 > Loss of DWord Synch Count 113 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 3: Link Up > Invalid DWord Count 79,896,409 > Running Disparity Error Count 76,199,329 > Loss of DWord Synch Count 113 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 4: Link Up, No Errors > > Expander (Handle 0009) Phy 5: Link Up, No Errors > > Expander (Handle 0009) Phy 6: Link Up, No Errors > > Expander (Handle 0009) Phy 7: Link Up, No Errors > > Expander (Handle 0009) Phy 8: Link Up, No Errors > > Expander (Handle 0009) Phy 9: Link Up, No Errors > > Expander (Handle 0009) Phy 10: Link Up, No Errors > > Expander (Handle 0009) Phy 11: Link Up, No Errors > > Expander (Handle 0009) Phy 12: Link Up, No Errors > > Expander (Handle 0009) Phy 13: Link Up, No Errors > > Expander (Handle 0009) Phy 14: Link Up, No Errors > > Expander (Handle 0009) Phy 15: Link Up, No Errors > > Expander (Handle 0009) Phy 16: Link Up, No Errors > > Expander (Handle 0009) Phy 17: Link Up, No Errors > > Expander (Handle 0009) Phy 18: Link Up, No Errors > > Expander (Handle 0009) Phy 19: Link Up, No Errors > > Expander (Handle 0009) Phy 20: Link Down, No Errors > > Expander (Handle 0009) Phy 21: Link Down, No Errors > > Expander (Handle 0009) Phy 22: Link Up > Invalid DWord Count 743,980 > Running Disparity Error Count 38,796 > Loss of DWord Synch Count 1 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 23: Link Down, No Errors > > Expander (Handle 0009) Phy 24: Link Down, No Errors > > Expander (Handle 0009) Phy 25: Link Down > Invalid DWord Count 1,755 > Running Disparity Error Count 408 > Loss of DWord Synch Count 0 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 26: Link Down > Invalid DWord Count 1,127 > Running Disparity Error Count 1,022 > Loss of DWord Synch Count 0 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 27: Link Down, No Errors > > Expander (Handle 0009) Phy 28: Link Down, No Errors > > Expander (Handle 0009) Phy 29: Link Down, No Errors > > Expander (Handle 0009) Phy 30: Link Down, No Errors > > Expander (Handle 0009) Phy 31: Link Down, No Errors > > Expander (Handle 0009) Phy 32: Link Down, No Errors > > Expander (Handle 0009) Phy 33: Link Down, No Errors > > Expander (Handle 0009) Phy 34: Link Down, No Errors > > Expander (Handle 0009) Phy 35: Link Down, No Errors > > Expander (Handle 0009) Phy 36: Link Up, No Errors > > Expander (Handle 0009) Phy 37: Link Down, No Errors > > > Main menu, select an option: [1-99 or e/p/w or 0 to quit] 42 > > mpt0 is /dev/cfg/c5 > > B___T___L Type Operating System Device Name > ScsiIo to Bus 0 Target 8 failed, IOCStatus = 004b (IOC Terminated) > 0 10 0 Disk /dev/rdsk/c5t10d0s2 > 0 11 0 Disk /dev/rdsk/c5t11d0s2 > 0 12 0 Disk /dev/rdsk/c5t12d0s2 > 0 13 0 Disk /dev/rdsk/c5t13d0s2 > 0 14 0 Disk /dev/rdsk/c5t14d0s2 > 0 15 0 Disk /dev/rdsk/c5t15d0s2 > 0 16 0 Disk /dev/rdsk/c5t16d0s2 > 0 17 0 Disk /dev/rdsk/c5t17d0s2 > 0 18 0 Disk /dev/rdsk/c5t18d0s2 > 0 19 0 Disk /dev/rdsk/c5t19d0s2 > 0 20 0 Disk /dev/rdsk/c5t20d0s2 > 0 21 0 Disk /dev/rdsk/c5t21d0s2 > 0 22 0 Disk /dev/rdsk/c5t22d0s2 > 0 23 0 Disk /dev/rdsk/c5t23d0s2 > 0 24 0 Disk /dev/rdsk/c5t24d0s2 > 0 25 0 Disk /dev/rdsk/c5t25d0s2 > 0 26 0 Disk /dev/rdsk/c5t26d0s2 > > > The iostat -En gives : > > iostat -En > c4t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: SEAGATE ST32500N Revision: 3AZQ Serial No: > Size: 250.06GB <250056000000 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 281 Predictive Failure Analysis: 0 > c4t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: SEAGATE ST32502N Revision: SU0D Serial No: > Size: 250.06GB <250056000000 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 285 Predictive Failure Analysis: 0 > c3t0d0 Soft Errors: 0 Hard Errors: 9 Transport Errors: 0 > Vendor: TEAC Product: DV-28E-V Revision: 1.AC Serial No: > Size: 0.00GB <0 bytes> > Media Error: 0 Device Not Ready: 9 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c5t10d0 Soft Errors: 18 Hard Errors: 1 Transport Errors: 9 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 18 > Illegal Request: 8 Predictive Failure Analysis: 0 > c5t11d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 > Illegal Request: 8 Predictive Failure Analysis: 0 > c5t12d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 > Illegal Request: 8 Predictive Failure Analysis: 0 > c5t13d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 > Illegal Request: 8 Predictive Failure Analysis: 0 > c5t14d0 Soft Errors: 16 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 16 > Illegal Request: 8 Predictive Failure Analysis: 0 > c5t15d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t16d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t17d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t18d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t19d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t20d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t21d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t22d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t23d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t24d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t25d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t26d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > > > > Thank you, > Bruno >
Cindy: How can I view the bug report you referenced? Standard methods show my the bug number is valid (6694909) but no content or notes. We are having similar messages appear with snv_118 with a busy LSI controller, especially during scrubbing, and I''d be interested to see what they mentioned in that report. Also, the LSI firmware updates for the LSISAS3081E (the controller we use) don''t usually come with release notes indicating what has changed in each firmware revision, so I''m not sure where they got that idea from. -- This message posted from opensolaris.org
Adam Cheal wrote:> Cindy: How can I view the bug report you referenced? Standard methods > show my the bug number is valid (6694909) but no content or notes. We are > having similar messages appear with snv_118 with a busy LSI controller, > especially during scrubbing, and I''d be interested to see what they > mentioned in that report. Also, the LSI firmware updates for the > LSISAS3081E (the controller we use) don''t usually come with release notes > indicating what has changed in each firmware revision, so I''m not sure > where they got that idea from.Hi Adam, unfortunately, you can''t see that bug from outside. The evaluation from LSI is very clear that this is a firmware issue rather than a driver issue, and is claimed to be fixed in LSI BIOS v6.26.00 FW 1.27.02 (aka "Phase 15") cheers, James -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
James: We are running Phase 16 on our LSISAS3801E''s, and have also tried the recently released Phase 17 but it didn''t help. All firmware NVRAM settings are default. Basically, when we put the disks behind this controller under load (e.g. scrubbing, recursive ls on large ZFS filesystem) we get this series of log entries that appear at random intervals: scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0/sd at 34,0 (sd49): incomplete read- retrying scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110b00 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110b00 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): Log info 0x31110b00 received for target 40. scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): Log info 0x31110b00 received for target 40. scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): Log info 0x31110b00 received for target 40. scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): Log info 0x31110b00 received for target 40. scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0/sd at 2d,0 (sd42): incomplete read- retrying scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): Rev. 8 LSI, Inc. 1068E found. scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): mpt0 supports power management. scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): mpt0: IOC Operational. It seems to be timing out accessing a disk, retrying, giving up and then doing a bus reset? This is happening with random disks behind the controller and on multiple systems with the same hardware config. We are running snv_118 right now and was hoping this was some magic mpt-related "bug" that was going to be fixed in snv_125 but it doesn''t look like it. The LSI3801E is driving 2 x 23-disk JBOD''s which, albeit a dense solution, it should be able to handle. We are also using wide raidz2 vdevs (22 disks each, one per JBOD) which agreeably is slower performance-wise, but the goal here is density not performance. I would have hoped that the system would just "slow down" if there was IO contention, but not experience things like bus resets. Your thoughts? -- This message posted from opensolaris.org
Adam Cheal wrote:> James: We are running Phase 16 on our LSISAS3801E''s, and have also tried > the recently released Phase 17 but it didn''t help. All firmware NVRAM > settings are default. Basically, when we put the disks behind this > controller under load (e.g. scrubbing, recursive ls on large ZFS > filesystem) we get this series of log entries that appear at random > intervals:...> It seems to be timing out accessing a disk, retrying, giving up and then > doing a bus reset? > > This is happening with random disks behind the controller and on multiple > systems with the same hardware config. We are running snv_118 right now > and was hoping this was some magic mpt-related "bug" that was going to be > fixed in snv_125 but it doesn''t look like it. The LSI3801E is driving 2 x > 23-disk JBOD''s which, albeit a dense solution, it should be able to > handle. We are also using wide raidz2 vdevs (22 disks each, one per JBOD) > which agreeably is slower performance-wise, but the goal here is density > not performance. I would have hoped that the system would just "slow > down" if there was IO contention, but not experience things like bus > resets. > > Your thoughts?ugh. New bug time - bugs.opensolaris.org, please select Solaris / kernel / driver-mpt. In addition to the error messages and description of when you see it, please provide output from cfgadm -lav prtconf -v I''ll see that it gets moved to the correct group asap. Cheers, James -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
I''ve filed the bug, but was unable to include the "prtconf -v" output as the comments field only accepted 15000 chars total. Let me know if there is anything else I can provide/do to help figure this problem out as it is essentially preventing us from doing any kind of heavy IO to these pools, including scrubbing. -- This message posted from opensolaris.org
On 10/22/09 4:07 PM, James C. McPherson wrote:> Adam Cheal wrote: >> It seems to be timing out accessing a disk, retrying, giving up and then >> doing a bus reset?...> ugh. New bug time - bugs.opensolaris.org, please select > Solaris / kernel / driver-mpt. In addition to the error > messages and description of when you see it, please provide > output from > > cfgadm -lav > prtconf -v > > I''ll see that it gets moved to the correct group asap.FYI this is very similar to the behaviour I was seeing with my directly attached SATA disks on snv_118 (see the list archives for my original messages). I have not yet seen the error since I replaced my Hitachi 500 GB disks for Seagate 1.5TB disks, so it could very well have been some unfortunate LSI firmware / Hitachi drive firmware interaction. carson:gandalf 0 $ gzcat /var/adm/messages.2.gz | ggrep -4 mpt | tail -9 Oct 8 00:44:17 gandalf.taltos.org scsi: [ID 365881 kern.notice] /pci at 0,0/pci8086,27d0 at 1c/pci1000,3140 at 0 (mpt0): Oct 8 00:44:17 gandalf.taltos.org Log info 0x31130000 received for target 1. Oct 8 00:44:17 gandalf.taltos.org scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Oct 8 00:44:17 gandalf.taltos.org scsi: [ID 365881 kern.notice] /pci at 0,0/pci8086,27d0 at 1c/pci1000,3140 at 0 (mpt0): Oct 8 00:44:17 gandalf.taltos.org Log info 0x31130000 received for target 1. Oct 8 00:44:17 gandalf.taltos.org scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Oct 8 00:44:17 gandalf.taltos.org scsi: [ID 365881 kern.notice] /pci at 0,0/pci8086,27d0 at 1c/pci1000,3140 at 0 (mpt0): Oct 8 00:44:17 gandalf.taltos.org Log info 0x31130000 received for target 1. Oct 8 00:44:17 gandalf.taltos.org scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc carson:gandalf 1 $ gzcat /var/adm/messages.2.gz | sed -ne ''s,^.*\(Log info\),\1,p'' | sort -u Log info 0x31110b00 received for target 7. Log info 0x31130000 received for target 0. Log info 0x31130000 received for target 1. Log info 0x31130000 received for target 2. Log info 0x31130000 received for target 3. Log info 0x31130000 received for target 4. Log info 0x31130000 received for target 6. Log info 0x31130000 received for target 7. Log info 0x31140000 received for target 0. Log info 0x31140000 received for target 1. Log info 0x31140000 received for target 2. Log info 0x31140000 received for target 3. Log info 0x31140000 received for target 4. Log info 0x31140000 received for target 6. Log info 0x31140000 received for target 7. carson:gandalf 0 $ gzcat /var/adm/messages.2.gz | sed -ne ''s,^.*\(scsi_status\),\1,p'' | sort -u scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc -- Carson
Hi Cindy, I have a couple of questions about this issue : 1. i have exactly the same LSI controller in another server running opensolaris snv_101b, and so far no errors like this ones where seen in the system 2. up to snv_118 i haven''t seen any problems, only now within snv_125 3. the Sun StorageTek SAS HBA isn''t a LSI OEM ? if so, is it possible to know what firmware version is that HBA using? Thank you, Bruno Cindy Swearingen wrote:> Hi Bruno, > > I see some bugs associated with these messages (6694909) that point to > an LSI firmware upgrade that cause these harmless errors to display. > > According to the 6694909 comments, this issue is documented in the > release notes. > > As they are harmless, I wouldn''t worry about them. > > Maybe someone from the driver group can comment further. > > Cindy > > > On 10/22/09 05:40, Bruno Sousa wrote: >> Hi all, >> >> Recently i upgrade from snv_118 to snv_125, and suddently i started >> to see this messages at /var/adm/messages : >> >> Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: >> /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): >> Oct 22 12:54:37 SAN02 mpt_handle_event: IOCStatus=0x8000, >> IOCLogInfo=0x3112011a >> Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: >> /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): >> Oct 22 12:56:47 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, >> IOCLogInfo=0x3112011a >> Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: >> /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): >> Oct 22 12:56:47 SAN02 mpt_handle_event: IOCStatus=0x8000, >> IOCLogInfo=0x3112011a >> Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: >> /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): >> Oct 22 12:56:50 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, >> IOCLogInfo=0x3112011a >> Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: >> /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): >> Oct 22 12:56:50 SAN02 mpt_handle_event: IOCStatus=0x8000, >> IOCLogInfo=0x3112011a >> >> >> Is this a symptom of a disk error or some change was made in the >> driver?,that now i have more information, where in the past such >> information didn''t appear? >> >> Thanks, >> Bruno >> >> I''m using a LSI Logic SAS1068E B3 and i within lsiutil i have this >> behaviour : >> >> >> 1 MPT Port found >> >> Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC >> 1. mpt0 LSI Logic SAS1068E B3 105 011a0000 0 >> >> Select a device: [1-1 or 0 to quit] 1 >> >> 1. Identify firmware, BIOS, and/or FCode >> 2. Download firmware (update the FLASH) >> 4. Download/erase BIOS and/or FCode (update the FLASH) >> 8. Scan for devices >> 10. Change IOC settings (interrupt coalescing) >> 13. Change SAS IO Unit settings >> 16. Display attached devices >> 20. Diagnostics >> 21. RAID actions >> 22. Reset bus >> 23. Reset target >> 42. Display operating system names for devices >> 45. Concatenate SAS firmware and NVDATA files >> 59. Dump PCI config space >> 60. Show non-default settings >> 61. Restore default settings >> 66. Show SAS discovery errors >> 69. Show board manufacturing information >> 97. Reset SAS link, HARD RESET >> 98. Reset SAS link >> 99. Reset port >> e Enable expert mode in menus >> p Enable paged mode >> w Enable logging >> >> Main menu, select an option: [1-99 or e/p/w or 0 to quit] 20 >> >> 1. Inquiry Test >> 2. WriteBuffer/ReadBuffer/Compare Test >> 3. Read Test >> 4. Write/Read/Compare Test >> 8. Read Capacity / Read Block Limits Test >> 12. Display phy counters >> 13. Clear phy counters >> 14. SATA SMART Read Test >> 15. SEP (SCSI Enclosure Processor) Test >> 18. Report LUNs Test >> 19. Drive firmware download >> 20. Expander firmware download >> 21. Read Logical Blocks >> 99. Reset port >> e Enable expert mode in menus >> p Enable paged mode >> w Enable logging >> >> Diagnostics menu, select an option: [1-99 or e/p/w or 0 to quit] 12 >> >> Adapter Phy 0: Link Down, No Errors >> >> Adapter Phy 1: Link Down, No Errors >> >> Adapter Phy 2: Link Down, No Errors >> >> Adapter Phy 3: Link Down, No Errors >> >> Adapter Phy 4: Link Up, No Errors >> >> Adapter Phy 5: Link Up, No Errors >> >> Adapter Phy 6: Link Up, No Errors >> >> Adapter Phy 7: Link Up, No Errors >> >> Expander (Handle 0009) Phy 0: Link Up >> Invalid DWord Count 79,967,229 >> Running Disparity Error Count 63,036,893 >> Loss of DWord Synch Count 113 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 1: Link Up >> Invalid DWord Count 79,967,207 >> Running Disparity Error Count 78,339,626 >> Loss of DWord Synch Count 113 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 2: Link Up >> Invalid DWord Count 76,717,646 >> Running Disparity Error Count 73,334,563 >> Loss of DWord Synch Count 113 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 3: Link Up >> Invalid DWord Count 79,896,409 >> Running Disparity Error Count 76,199,329 >> Loss of DWord Synch Count 113 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 4: Link Up, No Errors >> >> Expander (Handle 0009) Phy 5: Link Up, No Errors >> >> Expander (Handle 0009) Phy 6: Link Up, No Errors >> >> Expander (Handle 0009) Phy 7: Link Up, No Errors >> >> Expander (Handle 0009) Phy 8: Link Up, No Errors >> >> Expander (Handle 0009) Phy 9: Link Up, No Errors >> >> Expander (Handle 0009) Phy 10: Link Up, No Errors >> >> Expander (Handle 0009) Phy 11: Link Up, No Errors >> >> Expander (Handle 0009) Phy 12: Link Up, No Errors >> >> Expander (Handle 0009) Phy 13: Link Up, No Errors >> >> Expander (Handle 0009) Phy 14: Link Up, No Errors >> >> Expander (Handle 0009) Phy 15: Link Up, No Errors >> >> Expander (Handle 0009) Phy 16: Link Up, No Errors >> >> Expander (Handle 0009) Phy 17: Link Up, No Errors >> >> Expander (Handle 0009) Phy 18: Link Up, No Errors >> >> Expander (Handle 0009) Phy 19: Link Up, No Errors >> >> Expander (Handle 0009) Phy 20: Link Down, No Errors >> >> Expander (Handle 0009) Phy 21: Link Down, No Errors >> >> Expander (Handle 0009) Phy 22: Link Up >> Invalid DWord Count 743,980 >> Running Disparity Error Count 38,796 >> Loss of DWord Synch Count 1 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 23: Link Down, No Errors >> >> Expander (Handle 0009) Phy 24: Link Down, No Errors >> >> Expander (Handle 0009) Phy 25: Link Down >> Invalid DWord Count 1,755 >> Running Disparity Error Count 408 >> Loss of DWord Synch Count 0 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 26: Link Down >> Invalid DWord Count 1,127 >> Running Disparity Error Count 1,022 >> Loss of DWord Synch Count 0 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 27: Link Down, No Errors >> >> Expander (Handle 0009) Phy 28: Link Down, No Errors >> >> Expander (Handle 0009) Phy 29: Link Down, No Errors >> >> Expander (Handle 0009) Phy 30: Link Down, No Errors >> >> Expander (Handle 0009) Phy 31: Link Down, No Errors >> >> Expander (Handle 0009) Phy 32: Link Down, No Errors >> >> Expander (Handle 0009) Phy 33: Link Down, No Errors >> >> Expander (Handle 0009) Phy 34: Link Down, No Errors >> >> Expander (Handle 0009) Phy 35: Link Down, No Errors >> >> Expander (Handle 0009) Phy 36: Link Up, No Errors >> >> Expander (Handle 0009) Phy 37: Link Down, No Errors >> >> >> Main menu, select an option: [1-99 or e/p/w or 0 to quit] 42 >> >> mpt0 is /dev/cfg/c5 >> >> B___T___L Type Operating System Device Name >> ScsiIo to Bus 0 Target 8 failed, IOCStatus = 004b (IOC Terminated) >> 0 10 0 Disk /dev/rdsk/c5t10d0s2 >> 0 11 0 Disk /dev/rdsk/c5t11d0s2 >> 0 12 0 Disk /dev/rdsk/c5t12d0s2 >> 0 13 0 Disk /dev/rdsk/c5t13d0s2 >> 0 14 0 Disk /dev/rdsk/c5t14d0s2 >> 0 15 0 Disk /dev/rdsk/c5t15d0s2 >> 0 16 0 Disk /dev/rdsk/c5t16d0s2 >> 0 17 0 Disk /dev/rdsk/c5t17d0s2 >> 0 18 0 Disk /dev/rdsk/c5t18d0s2 >> 0 19 0 Disk /dev/rdsk/c5t19d0s2 >> 0 20 0 Disk /dev/rdsk/c5t20d0s2 >> 0 21 0 Disk /dev/rdsk/c5t21d0s2 >> 0 22 0 Disk /dev/rdsk/c5t22d0s2 >> 0 23 0 Disk /dev/rdsk/c5t23d0s2 >> 0 24 0 Disk /dev/rdsk/c5t24d0s2 >> 0 25 0 Disk /dev/rdsk/c5t25d0s2 >> 0 26 0 Disk /dev/rdsk/c5t26d0s2 >> >> >> The iostat -En gives : >> >> iostat -En >> c4t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: SEAGATE ST32500N Revision: 3AZQ Serial No: >> Size: 250.06GB <250056000000 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 >> Illegal Request: 281 Predictive Failure Analysis: 0 >> c4t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: SEAGATE ST32502N Revision: SU0D Serial No: >> Size: 250.06GB <250056000000 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 >> Illegal Request: 285 Predictive Failure Analysis: 0 >> c3t0d0 Soft Errors: 0 Hard Errors: 9 Transport Errors: 0 >> Vendor: TEAC Product: DV-28E-V Revision: 1.AC Serial No: >> Size: 0.00GB <0 bytes> >> Media Error: 0 Device Not Ready: 9 No Device: 0 Recoverable: 0 >> Illegal Request: 0 Predictive Failure Analysis: 0 >> c5t10d0 Soft Errors: 18 Hard Errors: 1 Transport Errors: 9 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 18 >> Illegal Request: 8 Predictive Failure Analysis: 0 >> c5t11d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 >> Illegal Request: 8 Predictive Failure Analysis: 0 >> c5t12d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 >> Illegal Request: 8 Predictive Failure Analysis: 0 >> c5t13d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 >> Illegal Request: 8 Predictive Failure Analysis: 0 >> c5t14d0 Soft Errors: 16 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 16 >> Illegal Request: 8 Predictive Failure Analysis: 0 >> c5t15d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t16d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t17d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t18d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t19d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t20d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t21d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t22d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t23d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t24d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t25d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t26d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> >> >> >> Thank you, >> Bruno >> >-- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091023/46f94311/attachment.html>
Hi Adam, How many disks and zpoo/zfs''s do you have behind that LSI? I have a system with 22 disks and 4 zpools with around 30 zfs''s and so far it works like a charm, even during heavy load. The opensolaris release is snv_101b . Bruno Adam Cheal wrote:> Cindy: How can I view the bug report you referenced? Standard methods show my the bug number is valid (6694909) but no content or notes. We are having similar messages appear with snv_118 with a busy LSI controller, especially during scrubbing, and I''d be interested to see what they mentioned in that report. Also, the LSI firmware updates for the LSISAS3081E (the controller we use) don''t usually come with release notes indicating what has changed in each firmware revision, so I''m not sure where they got that idea from. >-- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Our config is: OpenSolaris snv_118 x64 1 x LSISAS3801E controller 2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives) Each of the two external ports on the LSI connects to a 23-disk JBOD. ZFS-wise we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD). Each zpool has one ZFS filesystem containing millions of files/directories. This data is served up via CIFS (kernel), which is why we went with snv_118 (first release post-2009.06 that had stable CIFS server). Like I mentioned to James, we know that the server won''t be a star performance-wise especially because of the wide vdevs but it shouldn''t hiccup under load either. A guaranteed way for us to cause these IO errors is to load up the zpool with about 30 TB of data (90% full) then scrub it. Within 30 minutes we start to see the errors, which usually evolves into "failing" disks (because of excessive retry errors) which just makes things worse. -- This message posted from opensolaris.org
What bug# is this under? I''m having what I believe is the same problem. Is it possible to just take the mpt driver from a prior build in the time being? The below is from the load the zpool scrub creates. This is on a dell t7400 workstation with a 1068E oemed lsi. I updated the firmware to the newest available from dell. The errors follow whichever of the 4 drives has the highest load. Streaming doesn''t seem to trigger it as I can push 60 MiB a second to a mirrored rpool all day, it''s only when there are a lot of metadata operations. Oct 23 06:25:44 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /pci at 0 ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): Oct 23 06:25:44 systurbo5 Disconnected command timeout for Target 1 Oct 23 06:27:15 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /pci at 0 ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): Oct 23 06:27:15 systurbo5 Disconnected command timeout for Target 1 Oct 23 06:28:26 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /pci at 0 ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): Oct 23 06:28:26 systurbo5 Disconnected command timeout for Target 1 Oct 23 06:29:47 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /pci at 0 ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): Oct 23 06:29:47 systurbo5 Disconnected command timeout for Target 1 Oct 23 06:30:58 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /pci at 0 ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): Oct 23 06:30:58 systurbo5 Disconnected command timeout for Target 1 Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /pci at 0 ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): Oct 23 06:31:28 systurbo5 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000 Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /pci at 0 ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): Oct 23 06:31:28 systurbo5 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000 Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /pci at 0 ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): Oct 23 06:31:29 systurbo5 Log info 0x31123000 received for target 1. Oct 23 06:31:29 systurbo5 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /pci at 0 ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): Oct 23 06:31:29 systurbo5 Log info 0x31123000 received for target 1. Oct 23 06:31:29 systurbo5 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /pci at 0 ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): Oct 23 06:31:29 systurbo5 Log info 0x31123000 received for target 1. Oct 23 06:31:29 systurbo5 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /pci at 0 ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): Oct 23 06:31:29 systurbo5 Log info 0x31123000 received for target 1. Oct 23 06:31:29 systurbo5 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc On Fri, Oct 23, 2009 at 7:13 AM, Adam Cheal <acheal at pnimedia.com> wrote:> Our config is: > OpenSolaris snv_118 x64 > 1 x LSISAS3801E controller > 2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives) > Each of the two external ports on the LSI connects to a 23-disk JBOD. > ZFS-wise we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD). > Each zpool has one ZFS filesystem containing millions of files/directories. > This data is served up via CIFS (kernel), which is why we went with snv_118 > (first release post-2009.06 that had stable CIFS server). Like I mentioned > to James, we know that the server won''t be a star performance-wise > especially because of the wide vdevs but it shouldn''t hiccup under load > either. A guaranteed way for us to cause these IO errors is to load up the > zpool with about 30 TB of data (90% full) then scrub it. Within 30 minutes > we start to see the errors, which usually evolves into "failing" disks > (because of excessive retry errors) which just makes things worse. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091023/44878d67/attachment.html>
Sorry, running snv_123, indiana On Fri, Oct 23, 2009 at 11:16 AM, Jeremy f <ryshask at gmail.com> wrote:> What bug# is this under? I''m having what I believe is the same problem. Is > it possible to just take the mpt driver from a prior build in the time > being? > The below is from the load the zpool scrub creates. This is on a dell t7400 > workstation with a 1068E oemed lsi. I updated the firmware to the newest > available from dell. The errors follow whichever of the 4 drives has the > highest load. > > Streaming doesn''t seem to trigger it as I can push 60 MiB a second to a > mirrored rpool all day, it''s only when there are a lot of metadata > operations. > > > Oct 23 06:25:44 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /pci at 0 > ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): > Oct 23 06:25:44 systurbo5 Disconnected command timeout for Target 1 > Oct 23 06:27:15 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /pci at 0 > ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): > Oct 23 06:27:15 systurbo5 Disconnected command timeout for Target 1 > Oct 23 06:28:26 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /pci at 0 > ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): > Oct 23 06:28:26 systurbo5 Disconnected command timeout for Target 1 > Oct 23 06:29:47 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /pci at 0 > ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): > Oct 23 06:29:47 systurbo5 Disconnected command timeout for Target 1 > Oct 23 06:30:58 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /pci at 0 > ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): > Oct 23 06:30:58 systurbo5 Disconnected command timeout for Target 1 > Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /pci at 0 > ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): > Oct 23 06:31:28 systurbo5 mpt_handle_event_sync: IOCStatus=0x8000, > IOCLogInfo=0x31123000 > Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /pci at 0 > ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): > Oct 23 06:31:28 systurbo5 mpt_handle_event: IOCStatus=0x8000, > IOCLogInfo=0x31123000 > Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /pci at 0 > ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): > Oct 23 06:31:29 systurbo5 Log info 0x31123000 received for target 1. > Oct 23 06:31:29 systurbo5 scsi_status=0x0, ioc_status=0x804b, > scsi_state=0xc > Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /pci at 0 > ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): > Oct 23 06:31:29 systurbo5 Log info 0x31123000 received for target 1. > Oct 23 06:31:29 systurbo5 scsi_status=0x0, ioc_status=0x804b, > scsi_state=0xc > Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /pci at 0 > ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): > Oct 23 06:31:29 systurbo5 Log info 0x31123000 received for target 1. > Oct 23 06:31:29 systurbo5 scsi_status=0x0, ioc_status=0x804b, > scsi_state=0xc > Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /pci at 0 > ,0/pci8086,4029 at 9/pci8086,3500 at 0/pci8086,3510 at 0/pci1028,21d at 0 (mpt0): > Oct 23 06:31:29 systurbo5 Log info 0x31123000 received for target 1. > Oct 23 06:31:29 systurbo5 scsi_status=0x0, ioc_status=0x804b, > scsi_state=0xc > > > On Fri, Oct 23, 2009 at 7:13 AM, Adam Cheal <acheal at pnimedia.com> wrote: > >> Our config is: >> OpenSolaris snv_118 x64 >> 1 x LSISAS3801E controller >> 2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives) >> Each of the two external ports on the LSI connects to a 23-disk JBOD. >> ZFS-wise we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD). >> Each zpool has one ZFS filesystem containing millions of files/directories. >> This data is served up via CIFS (kernel), which is why we went with snv_118 >> (first release post-2009.06 that had stable CIFS server). Like I mentioned >> to James, we know that the server won''t be a star performance-wise >> especially because of the wide vdevs but it shouldn''t hiccup under load >> either. A guaranteed way for us to cause these IO errors is to load up the >> zpool with about 30 TB of data (90% full) then scrub it. Within 30 minutes >> we start to see the errors, which usually evolves into "failing" disks >> (because of excessive retry errors) which just makes things worse. >> -- >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091023/ecc5d833/attachment.html>
Just submitted the bug yesterday, under advice of James, so I don''t have a number you can refer to you...the "change request" number is 6894775 if that helps or is directly related to the future bugid.>From what I seen/read this problem has been around for awhile but only rears its ugly head under heavy IO with large filesets, probably related to large metadata sets as you spoke of. We are using snv_118 x64 but it seems to appear in snv_123 and snv_125 as well from what I read here.We''ve tried installing SSD''s to act as a read-cache for the pool to reduce the metadata hits on the physical disks and as a last-ditch effort we even tried switching to the "latest" LSI-supplied itmpt driver from 2007 (from reading http://enginesmith.wordpress.com/2009/08/28/ssd-faults-finally-resolved/) and disabling the mpt driver but we ended up with the same timeout issues. In our case, the drives in the JBODs are all WD (model WD1002FBYS-18A6B0) 1TB 7.2k SATA drives. In revisting our architecture, we compared it to Sun''s x4540 Thumper offering which uses the same controller with similar (though apparently customized) firmware and 48 disks. The difference is that they use 6 x LSI1068e controllers which each have to deal with only 8 disks...obviously better on performance but this architecture could be "hiding" the real IO issue by distributing the IO across so many controllers. -- This message posted from opensolaris.org
Adam Cheal wrote:> Just submitted the bug yesterday, under advice of James, so I don''t have a number you can refer to you...the "change request" number is 6894775 if that helps or is directly related to the future bugid. > >>From what I seen/read this problem has been around for awhile but only rears its ugly head under heavy IO with large filesets, probably related to large metadata sets as you spoke of. We are using snv_118 x64 but it seems to appear in snv_123 and snv_125 as well from what I read here. > > We''ve tried installing SSD''s to act as a read-cache for the pool to reduce the metadata hits on the physical disks and as a last-ditch effort we even tried switching to the "latest" LSI-supplied itmpt driver from 2007 (from reading http://enginesmith.wordpress.com/2009/08/28/ssd-faults-finally-resolved/) and disabling the mpt driver but we ended up with the same timeout issues. In our case, the drives in the JBODs are all WD (model WD1002FBYS-18A6B0) 1TB 7.2k SATA drives. > > In revisting our architecture, we compared it to Sun''s x4540 Thumper offering which uses the same controller with similar (though apparently customized) firmware and 48 disks. The difference is that they use 6 x LSI1068e controllers which each have to deal with only 8 disks...obviously better on performance but this architecture could be "hiding" the real IO issue by distributing the IO across so many controllers.Hi Adam, I was watching the incoming queues all day yesterday for the bug, but missed seeing it, not sure why. I''ve now moved the bug to the appropriate category so it will get attention from the right people. Thanks, James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
Could Sun''x x4540 Thumper reason to have 6 LSI''s some sort of "hidden" problems found by Sun where the HBA resets, and due to market time pressure the "quick and dirty" solution was to spread the load over multiple HBA''s instead of software fix? Just my 2 cents.. Bruno Adam Cheal wrote:> Just submitted the bug yesterday, under advice of James, so I don''t have a number you can refer to you...the "change request" number is 6894775 if that helps or is directly related to the future bugid. > > >From what I seen/read this problem has been around for awhile but only rears its ugly head under heavy IO with large filesets, probably related to large metadata sets as you spoke of. We are using snv_118 x64 but it seems to appear in snv_123 and snv_125 as well from what I read here. > > We''ve tried installing SSD''s to act as a read-cache for the pool to reduce the metadata hits on the physical disks and as a last-ditch effort we even tried switching to the "latest" LSI-supplied itmpt driver from 2007 (from reading http://enginesmith.wordpress.com/2009/08/28/ssd-faults-finally-resolved/) and disabling the mpt driver but we ended up with the same timeout issues. In our case, the drives in the JBODs are all WD (model WD1002FBYS-18A6B0) 1TB 7.2k SATA drives. > > In revisting our architecture, we compared it to Sun''s x4540 Thumper offering which uses the same controller with similar (though apparently customized) firmware and 48 disks. The difference is that they use 6 x LSI1068e controllers which each have to deal with only 8 disks...obviously better on performance but this architecture could be "hiding" the real IO issue by distributing the IO across so many controllers. >-- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Hi Cindy, Thank you for the update, mas it seems like i can''t see any information specific to that bug. I can only see bugs number 6702538 and 6615564, but according to their history, they have been fixed quite some time ago. Can you by any chance present the information about bug 6694909 ? Thank you, Bruno Cindy Swearingen wrote:> Hi Bruno, > > I see some bugs associated with these messages (6694909) that point to > an LSI firmware upgrade that cause these harmless errors to display. > > According to the 6694909 comments, this issue is documented in the > release notes. > > As they are harmless, I wouldn''t worry about them. > > Maybe someone from the driver group can comment further. > > Cindy > > > On 10/22/09 05:40, Bruno Sousa wrote: >> Hi all, >> >> Recently i upgrade from snv_118 to snv_125, and suddently i started >> to see this messages at /var/adm/messages : >> >> Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: >> /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): >> Oct 22 12:54:37 SAN02 mpt_handle_event: IOCStatus=0x8000, >> IOCLogInfo=0x3112011a >> Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: >> /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): >> Oct 22 12:56:47 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, >> IOCLogInfo=0x3112011a >> Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: >> /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): >> Oct 22 12:56:47 SAN02 mpt_handle_event: IOCStatus=0x8000, >> IOCLogInfo=0x3112011a >> Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: >> /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): >> Oct 22 12:56:50 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, >> IOCLogInfo=0x3112011a >> Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: >> /pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): >> Oct 22 12:56:50 SAN02 mpt_handle_event: IOCStatus=0x8000, >> IOCLogInfo=0x3112011a >> >> >> Is this a symptom of a disk error or some change was made in the >> driver?,that now i have more information, where in the past such >> information didn''t appear? >> >> Thanks, >> Bruno >> >> I''m using a LSI Logic SAS1068E B3 and i within lsiutil i have this >> behaviour : >> >> >> 1 MPT Port found >> >> Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC >> 1. mpt0 LSI Logic SAS1068E B3 105 011a0000 0 >> >> Select a device: [1-1 or 0 to quit] 1 >> >> 1. Identify firmware, BIOS, and/or FCode >> 2. Download firmware (update the FLASH) >> 4. Download/erase BIOS and/or FCode (update the FLASH) >> 8. Scan for devices >> 10. Change IOC settings (interrupt coalescing) >> 13. Change SAS IO Unit settings >> 16. Display attached devices >> 20. Diagnostics >> 21. RAID actions >> 22. Reset bus >> 23. Reset target >> 42. Display operating system names for devices >> 45. Concatenate SAS firmware and NVDATA files >> 59. Dump PCI config space >> 60. Show non-default settings >> 61. Restore default settings >> 66. Show SAS discovery errors >> 69. Show board manufacturing information >> 97. Reset SAS link, HARD RESET >> 98. Reset SAS link >> 99. Reset port >> e Enable expert mode in menus >> p Enable paged mode >> w Enable logging >> >> Main menu, select an option: [1-99 or e/p/w or 0 to quit] 20 >> >> 1. Inquiry Test >> 2. WriteBuffer/ReadBuffer/Compare Test >> 3. Read Test >> 4. Write/Read/Compare Test >> 8. Read Capacity / Read Block Limits Test >> 12. Display phy counters >> 13. Clear phy counters >> 14. SATA SMART Read Test >> 15. SEP (SCSI Enclosure Processor) Test >> 18. Report LUNs Test >> 19. Drive firmware download >> 20. Expander firmware download >> 21. Read Logical Blocks >> 99. Reset port >> e Enable expert mode in menus >> p Enable paged mode >> w Enable logging >> >> Diagnostics menu, select an option: [1-99 or e/p/w or 0 to quit] 12 >> >> Adapter Phy 0: Link Down, No Errors >> >> Adapter Phy 1: Link Down, No Errors >> >> Adapter Phy 2: Link Down, No Errors >> >> Adapter Phy 3: Link Down, No Errors >> >> Adapter Phy 4: Link Up, No Errors >> >> Adapter Phy 5: Link Up, No Errors >> >> Adapter Phy 6: Link Up, No Errors >> >> Adapter Phy 7: Link Up, No Errors >> >> Expander (Handle 0009) Phy 0: Link Up >> Invalid DWord Count 79,967,229 >> Running Disparity Error Count 63,036,893 >> Loss of DWord Synch Count 113 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 1: Link Up >> Invalid DWord Count 79,967,207 >> Running Disparity Error Count 78,339,626 >> Loss of DWord Synch Count 113 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 2: Link Up >> Invalid DWord Count 76,717,646 >> Running Disparity Error Count 73,334,563 >> Loss of DWord Synch Count 113 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 3: Link Up >> Invalid DWord Count 79,896,409 >> Running Disparity Error Count 76,199,329 >> Loss of DWord Synch Count 113 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 4: Link Up, No Errors >> >> Expander (Handle 0009) Phy 5: Link Up, No Errors >> >> Expander (Handle 0009) Phy 6: Link Up, No Errors >> >> Expander (Handle 0009) Phy 7: Link Up, No Errors >> >> Expander (Handle 0009) Phy 8: Link Up, No Errors >> >> Expander (Handle 0009) Phy 9: Link Up, No Errors >> >> Expander (Handle 0009) Phy 10: Link Up, No Errors >> >> Expander (Handle 0009) Phy 11: Link Up, No Errors >> >> Expander (Handle 0009) Phy 12: Link Up, No Errors >> >> Expander (Handle 0009) Phy 13: Link Up, No Errors >> >> Expander (Handle 0009) Phy 14: Link Up, No Errors >> >> Expander (Handle 0009) Phy 15: Link Up, No Errors >> >> Expander (Handle 0009) Phy 16: Link Up, No Errors >> >> Expander (Handle 0009) Phy 17: Link Up, No Errors >> >> Expander (Handle 0009) Phy 18: Link Up, No Errors >> >> Expander (Handle 0009) Phy 19: Link Up, No Errors >> >> Expander (Handle 0009) Phy 20: Link Down, No Errors >> >> Expander (Handle 0009) Phy 21: Link Down, No Errors >> >> Expander (Handle 0009) Phy 22: Link Up >> Invalid DWord Count 743,980 >> Running Disparity Error Count 38,796 >> Loss of DWord Synch Count 1 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 23: Link Down, No Errors >> >> Expander (Handle 0009) Phy 24: Link Down, No Errors >> >> Expander (Handle 0009) Phy 25: Link Down >> Invalid DWord Count 1,755 >> Running Disparity Error Count 408 >> Loss of DWord Synch Count 0 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 26: Link Down >> Invalid DWord Count 1,127 >> Running Disparity Error Count 1,022 >> Loss of DWord Synch Count 0 >> Phy Reset Problem Count 0 >> >> Expander (Handle 0009) Phy 27: Link Down, No Errors >> >> Expander (Handle 0009) Phy 28: Link Down, No Errors >> >> Expander (Handle 0009) Phy 29: Link Down, No Errors >> >> Expander (Handle 0009) Phy 30: Link Down, No Errors >> >> Expander (Handle 0009) Phy 31: Link Down, No Errors >> >> Expander (Handle 0009) Phy 32: Link Down, No Errors >> >> Expander (Handle 0009) Phy 33: Link Down, No Errors >> >> Expander (Handle 0009) Phy 34: Link Down, No Errors >> >> Expander (Handle 0009) Phy 35: Link Down, No Errors >> >> Expander (Handle 0009) Phy 36: Link Up, No Errors >> >> Expander (Handle 0009) Phy 37: Link Down, No Errors >> >> >> Main menu, select an option: [1-99 or e/p/w or 0 to quit] 42 >> >> mpt0 is /dev/cfg/c5 >> >> B___T___L Type Operating System Device Name >> ScsiIo to Bus 0 Target 8 failed, IOCStatus = 004b (IOC Terminated) >> 0 10 0 Disk /dev/rdsk/c5t10d0s2 >> 0 11 0 Disk /dev/rdsk/c5t11d0s2 >> 0 12 0 Disk /dev/rdsk/c5t12d0s2 >> 0 13 0 Disk /dev/rdsk/c5t13d0s2 >> 0 14 0 Disk /dev/rdsk/c5t14d0s2 >> 0 15 0 Disk /dev/rdsk/c5t15d0s2 >> 0 16 0 Disk /dev/rdsk/c5t16d0s2 >> 0 17 0 Disk /dev/rdsk/c5t17d0s2 >> 0 18 0 Disk /dev/rdsk/c5t18d0s2 >> 0 19 0 Disk /dev/rdsk/c5t19d0s2 >> 0 20 0 Disk /dev/rdsk/c5t20d0s2 >> 0 21 0 Disk /dev/rdsk/c5t21d0s2 >> 0 22 0 Disk /dev/rdsk/c5t22d0s2 >> 0 23 0 Disk /dev/rdsk/c5t23d0s2 >> 0 24 0 Disk /dev/rdsk/c5t24d0s2 >> 0 25 0 Disk /dev/rdsk/c5t25d0s2 >> 0 26 0 Disk /dev/rdsk/c5t26d0s2 >> >> >> The iostat -En gives : >> >> iostat -En >> c4t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: SEAGATE ST32500N Revision: 3AZQ Serial No: >> Size: 250.06GB <250056000000 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 >> Illegal Request: 281 Predictive Failure Analysis: 0 >> c4t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: SEAGATE ST32502N Revision: SU0D Serial No: >> Size: 250.06GB <250056000000 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 >> Illegal Request: 285 Predictive Failure Analysis: 0 >> c3t0d0 Soft Errors: 0 Hard Errors: 9 Transport Errors: 0 >> Vendor: TEAC Product: DV-28E-V Revision: 1.AC Serial No: >> Size: 0.00GB <0 bytes> >> Media Error: 0 Device Not Ready: 9 No Device: 0 Recoverable: 0 >> Illegal Request: 0 Predictive Failure Analysis: 0 >> c5t10d0 Soft Errors: 18 Hard Errors: 1 Transport Errors: 9 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 18 >> Illegal Request: 8 Predictive Failure Analysis: 0 >> c5t11d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 >> Illegal Request: 8 Predictive Failure Analysis: 0 >> c5t12d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 >> Illegal Request: 8 Predictive Failure Analysis: 0 >> c5t13d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 >> Illegal Request: 8 Predictive Failure Analysis: 0 >> c5t14d0 Soft Errors: 16 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 16 >> Illegal Request: 8 Predictive Failure Analysis: 0 >> c5t15d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t16d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t17d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t18d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t19d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t20d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t21d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t22d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t23d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t24d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t25d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> c5t26d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 >> Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: >> Size: 1500.30GB <1500301910016 bytes> >> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 >> Illegal Request: 6 Predictive Failure Analysis: 0 >> >> >> >> Thank you, >> Bruno >> >-- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091023/526d3a63/attachment.html>
On Oct 23, 2009, at 1:48 PM, Bruno Sousa wrote:> Could Sun''x x4540 Thumper reason to have 6 LSI''s some sort of > "hidden" problems found by Sun where the HBA resets, and due to > market time pressure the "quick and dirty" solution was to spread > the load over multiple HBA''s instead of software fix?I don''t think so. X4540 has 48 disks -- 6 controllers at 8 disks/ controller. This is the same configuration as the X4500, which used a Marvell controller. This decision leverages parts from the previous design. -- richard
On Fri, Oct 23, 2009 at 3:48 PM, Bruno Sousa <bsousa at epinfante.com> wrote:> Could Sun''x x4540 Thumper reason to have 6 LSI''s some sort of "hidden" > problems found by Sun where the HBA resets, and due to market time pressure > the "quick and dirty" solution was to spread the load over multiple HBA''s > instead of software fix? > > Just my 2 cents.. > > > Bruno > >What else were you expecting them to do? According to LSI''s website, the 1068e in an x8 configuration is an 8-port card. http://www.lsi.com/DistributionSystem/AssetDocument/files/docs/marketing_docs/storage_stand_prod/SCG_LSISAS1068E_PB_040407.pdf While they could''ve used expanders, that just creates one more component that can fail/have issues. Looking at the diagram, they''ve taken the absolute shortest I/O path possible, which is what I would hope to see/expect. http://www.sun.com/servers/x64/x4540/server_architecture.pdf One drive per channel, 6 channels total. I also wouldn''t be surprised to find out that they found this the optimal configuration from a performance/throughput/IOPS perspective as well. Can''t seem to find those numbers published by LSI. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091023/afade564/attachment.html>
I don''t think there was any intention on Sun''s part to ignore the problem...obviously their target market wants a performance-oriented box and the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY channels = 1 channel per drive = no contention for channels. The x4540 is a monster and performs like a dream with snv_118 (we have a few ourselves). My issue is that implementing an archival-type solution demands a dense, simple storage platform that performs at a reasonable level, nothing more. Our design has the same controller chip (8 SAS PHY channels) driving 46 disks, so there is bound to be contention there especially in high-load situations. I just need it to work and handle load gracefully, not timeout and cause disk "failures"; at this point I can''t even scrub the zpools to verify the data we have on there is valid. From a hardware perspective, the 3801E card is spec''ed to handle our architecture; the OS just seems to fall over somewhere though and not be able to throttle itself in certain intensive IO situations. That said, I don''t know whether to point the finger at LSI''s firmware or mpt-driver/ZFS. Sun obviously has a good relationship with LSI as their 1068E is the recommended SAS controller chip and is used in their own products. At least we''ve got a bug filed now, and we can hopefully follow this through to find out where the system breaks down. -- This message posted from opensolaris.org
On Fri, Oct 23, 2009 at 6:32 PM, Adam Cheal <acheal at pnimedia.com> wrote:> I don''t think there was any intention on Sun''s part to ignore the > problem...obviously their target market wants a performance-oriented box and > the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY > channels = 1 channel per drive = no contention for channels. The x4540 is a > monster and performs like a dream with snv_118 (we have a few ourselves). > > My issue is that implementing an archival-type solution demands a dense, > simple storage platform that performs at a reasonable level, nothing more. > Our design has the same controller chip (8 SAS PHY channels) driving 46 > disks, so there is bound to be contention there especially in high-load > situations. I just need it to work and handle load gracefully, not timeout > and cause disk "failures"; at this point I can''t even scrub the zpools to > verify the data we have on there is valid. From a hardware perspective, the > 3801E card is spec''ed to handle our architecture; the OS just seems to fall > over somewhere though and not be able to throttle itself in certain > intensive IO situations. > > That said, I don''t know whether to point the finger at LSI''s firmware or > mpt-driver/ZFS. Sun obviously has a good relationship with LSI as their > 1068E is the recommended SAS controller chip and is used in their own > products. At least we''ve got a bug filed now, and we can hopefully follow > this through to find out where the system breaks down. > >Have you checked in with LSI to verify the IOPS ability of the chip? Just because it supports having 46 drives attached to one ASIC doesn''t mean it can actually service all 46 at once. You''re talking (VERY conservatively) 2800 IOPS. Even ignoring that, I know for a fact that the chip can''t handle raw throughput numbers on 46 disks unless you''ve got some very severe raid overhead. That chip is good for roughly 2GB/sec each direction. 46 7200RPM drives can fairly easily push 4x that amount in streaming IO loads. Long story short, it appears you''ve got a 5lbs bag a 50lbs load... --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091023/b563ace2/attachment.html>
LSI''s sales literature on that card specs "128 devices" which I take with a few hearty grains of salt. I agree that with all 46 drives pumping out streamed data, the controller would be overworked BUT the drives will only deliver data as fast as the OS tells them to. Just because the speedometer says 200 mph max doesn''t mean we should (or even can!) go that fast. The IO intensive operations that trigger our timeout issues are a small percentage of the actual normal IO we do to the box. Most of the time the solution happily serves up archived data, but when it comes time to scrub or do mass operations on the entire dataset bad things happen. It seems a waste to architect a more expensive performance-oriented solution when you aren''t going to use that performance the majority of the time. There is a balance between performance and functionality, but I still feel that we should be able to make this situation work. Ideally, the OS could dynamically adapt to slower storage and throttle its IO requests accordingly. At the least, it could allow the user to specify some IO thresholds so we can "cage the beast" if need be. We''ve tried some manual tuning via kernel parameters to restrict max queued operations per vdev and also a "scrub" related one (specifics escape me), but it still manages to overload itself. -- This message posted from opensolaris.org
On Oct 23, 2009, at 4:46 PM, Tim Cook wrote:> On Fri, Oct 23, 2009 at 6:32 PM, Adam Cheal <acheal at pnimedia.com> > wrote: > I don''t think there was any intention on Sun''s part to ignore the > problem...obviously their target market wants a performance-oriented > box and the x4540 delivers that. Each 1068E controller chip supports > 8 SAS PHY channels = 1 channel per drive = no contention for > channels. The x4540 is a monster and performs like a dream with > snv_118 (we have a few ourselves). > > My issue is that implementing an archival-type solution demands a > dense, simple storage platform that performs at a reasonable level, > nothing more. Our design has the same controller chip (8 SAS PHY > channels) driving 46 disks, so there is bound to be contention there > especially in high-load situations. I just need it to work and > handle load gracefully, not timeout and cause disk "failures"; at > this point I can''t even scrub the zpools to verify the data we have > on there is valid. From a hardware perspective, the 3801E card is > spec''ed to handle our architecture; the OS just seems to fall over > somewhere though and not be able to throttle itself in certain > intensive IO situations. > > That said, I don''t know whether to point the finger at LSI''s > firmware or mpt-driver/ZFS. Sun obviously has a good relationship > with LSI as their 1068E is the recommended SAS controller chip and > is used in their own products. At least we''ve got a bug filed now, > and we can hopefully follow this through to find out where the > system breaks down. > > > Have you checked in with LSI to verify the IOPS ability of the > chip? Just because it supports having 46 drives attached to one > ASIC doesn''t mean it can actually service all 46 at once. You''re > talking (VERY conservatively) 2800 IOPS.Tim has a valid point. By default, ZFS will queue 35 commands per disk. For 46 disks that is 1,610 concurrent I/Os. Historically, it has proven to be relatively easy to crater performance or cause problems with very, very, very expensive arrays that are easily overrun by Solaris. As a result, it is not uncommon to see references to setting throttles, especially in older docs. Fortunately, this is simple to test by reducing the number of I/Os ZFS will queue. See the Evil Tuning Guide http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29 The mpt source is not open, so the mpt driver''s reaction to 1,610 concurrent I/Os can only be guessed from afar -- public LSI docs mention a number of 511 concurrent I/Os for SAS1068, but it is not clear to me that is an explicit limit. If you have success with zfs_vdev_max_pending set to 10, then the mystery might be solved. Use iostat to observe the wait and actv columns, which show the number of transactions in the queues. JCMP? NB sometimes a driver will have the limit be configurable. For example, to get high performance out of a high-end array attached to a qlc card, I''ve set the execution-throttle in /kernel/drv/qlc.conf to be more than two orders of magnitude greater than its default of 32. /kernel/drv/mpt*.conf does not seem to have a similar throttle. -- richard> Even ignoring that, I know for a fact that the chip can''t handle raw > throughput numbers on 46 disks unless you''ve got some very severe > raid overhead. That chip is good for roughly 2GB/sec each > direction. 46 7200RPM drives can fairly easily push 4x that amount > in streaming IO loads. > > Long story short, it appears you''ve got a 5lbs bag a 50lbs load... > > --Tim > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Fri, Oct 23, 2009 at 7:17 PM, Adam Cheal <acheal at pnimedia.com> wrote:> LSI''s sales literature on that card specs "128 devices" which I take with a > few hearty grains of salt. I agree that with all 46 drives pumping out > streamed data, the controller would be overworked BUT the drives will only > deliver data as fast as the OS tells them to. Just because the speedometer > says 200 mph max doesn''t mean we should (or even can!) go that fast. > > The IO intensive operations that trigger our timeout issues are a small > percentage of the actual normal IO we do to the box. Most of the time the > solution happily serves up archived data, but when it comes time to scrub or > do mass operations on the entire dataset bad things happen. It seems a waste > to architect a more expensive performance-oriented solution when you aren''t > going to use that performance the majority of the time. There is a balance > between performance and functionality, but I still feel that we should be > able to make this situation work. > > Ideally, the OS could dynamically adapt to slower storage and throttle its > IO requests accordingly. At the least, it could allow the user to specify > some IO thresholds so we can "cage the beast" if need be. We''ve tried some > manual tuning via kernel parameters to restrict max queued operations per > vdev and also a "scrub" related one (specifics escape me), but it still > manages to overload itself. > -- >Where are you planning on queueing up those requests? The scrub, I can understand wanting throttling, but what about your user workload? Unless you''re talking about EXTREMELY short bursts of I/O, what do you suggest the OS do? If you''re sending 3000 IOPS at the box from a workstation, where is that workload going to sit if you''re only dumping 500 IOPS to disk? The only thing that will change is that your client will timeout instead of your disks. I don''t recall seeing what generates the I/O, but I do recall that it''s backup. My assumption would be it''s something coming in over the network, in which case I''d say you''re far, far better off throttling at the network stack. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091023/69657521/attachment.html>
On Fri, Oct 23, 2009 at 7:17 PM, Richard Elling <richard.elling at gmail.com>wrote:> > Tim has a valid point. By default, ZFS will queue 35 commands per disk. > For 46 disks that is 1,610 concurrent I/Os. Historically, it has proven to > be > relatively easy to crater performance or cause problems with very, very, > very expensive arrays that are easily overrun by Solaris. As a result, it > is > not uncommon to see references to setting throttles, especially in older > docs. > > Fortunately, this is simple to test by reducing the number of I/Os ZFS > will queue. See the Evil Tuning Guide > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29 > > The mpt source is not open, so the mpt driver''s reaction to 1,610 > concurrent > I/Os can only be guessed from afar -- public LSI docs mention a number of > 511 > concurrent I/Os for SAS1068, but it is not clear to me that is an explicit > limit. If > you have success with zfs_vdev_max_pending set to 10, then the mystery > might be solved. Use iostat to observe the wait and actv columns, which > show the number of transactions in the queues. JCMP? > > NB sometimes a driver will have the limit be configurable. For example, to > get > high performance out of a high-end array attached to a qlc card, I''ve set > the execution-throttle in /kernel/drv/qlc.conf to be more than two orders > of > magnitude greater than its default of 32. /kernel/drv/mpt*.conf does not > seem > to have a similar throttle. > -- richard > >I believe there''s a caveat here though. That really only helps if the total I/O load is actually enough for the controller to handle. If the sustained I/O workload is still 1600 concurrent I/O''s, lowering the batch won''t actually cause any difference in the timeouts, will it? It would obviously eliminate burstiness (yes, I made that word up), but if the total sustained I/O load is greater than the ASIC can handle, it''s still going to fall over and die with a queue of 10, correct? --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091023/9354bfdf/attachment.html>
And therein lies the issue. The excessive load that causes the IO issues is almost always generated locally from a scrub or a local recursive "ls" used to warm up the SSD-based zpool cache with metadata. The regular network IO to the box is minimal and is very read-centric; once we load the box up with archived data (which generally happens in a short amount of time), we simply serve it out as needed. As far as queueing goes, I would expect the system to queue bursts of IO in memory with appropriate timeouts, as required. These timeouts could either be manually or auto-magically adjusted to deal with the slower storage hardware. Obviously sustained intense IO requests would eventually blow up the queue so the goal here is to avoid creating those situations in the first place. We can throttle the network IO, if needed; I need the OS to know it''s own local IO boundaries though and not attempt to overwork itself during scrubs etc. -- This message posted from opensolaris.org
On Oct 23, 2009, at 5:32 PM, Tim Cook wrote:> On Fri, Oct 23, 2009 at 7:17 PM, Richard Elling <richard.elling at gmail.com > > wrote: > > Tim has a valid point. By default, ZFS will queue 35 commands per > disk. > For 46 disks that is 1,610 concurrent I/Os. Historically, it has > proven to be > relatively easy to crater performance or cause problems with very, > very, > very expensive arrays that are easily overrun by Solaris. As a > result, it is > not uncommon to see references to setting throttles, especially in > older docs. > > Fortunately, this is simple to test by reducing the number of I/Os > ZFS > will queue. See the Evil Tuning Guide > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29 > > The mpt source is not open, so the mpt driver''s reaction to 1,610 > concurrent > I/Os can only be guessed from afar -- public LSI docs mention a > number of 511 > concurrent I/Os for SAS1068, but it is not clear to me that is an > explicit limit. If > you have success with zfs_vdev_max_pending set to 10, then the mystery > might be solved. Use iostat to observe the wait and actv columns, > which > show the number of transactions in the queues. JCMP? > > NB sometimes a driver will have the limit be configurable. For > example, to get > high performance out of a high-end array attached to a qlc card, > I''ve set > the execution-throttle in /kernel/drv/qlc.conf to be more than two > orders of > magnitude greater than its default of 32. /kernel/drv/mpt*.conf does > not seem > to have a similar throttle. > -- richard > > > > I believe there''s a caveat here though. That really only helps if > the total I/O load is actually enough for the controller to handle. > If the sustained I/O workload is still 1600 concurrent I/O''s, > lowering the batch won''t actually cause any difference in the > timeouts, will it? It would obviously eliminate burstiness (yes, I > made that word up), but if the total sustained I/O load is greater > than the ASIC can handle, it''s still going to fall over and die with > a queue of 10, correct?Yes, but since they are disks, and I''m assuming HDDs here, there is no chance the disks will be faster than the host''s ability to send I/Os ;-) iostat will show what the queues look like. -- richard
Here is example of the pool config we use: # zpool status pool: pool002 state: ONLINE scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52 2009 config: NAME STATE READ WRITE CKSUM pool002 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c9t18d0 ONLINE 0 0 0 c9t17d0 ONLINE 0 0 0 c9t55d0 ONLINE 0 0 0 c9t13d0 ONLINE 0 0 0 c9t15d0 ONLINE 0 0 0 c9t16d0 ONLINE 0 0 0 c9t11d0 ONLINE 0 0 0 c9t12d0 ONLINE 0 0 0 c9t14d0 ONLINE 0 0 0 c9t9d0 ONLINE 0 0 0 c9t8d0 ONLINE 0 0 0 c9t10d0 ONLINE 0 0 0 c9t29d0 ONLINE 0 0 0 c9t28d0 ONLINE 0 0 0 c9t27d0 ONLINE 0 0 0 c9t23d0 ONLINE 0 0 0 c9t25d0 ONLINE 0 0 0 c9t26d0 ONLINE 0 0 0 c9t21d0 ONLINE 0 0 0 c9t22d0 ONLINE 0 0 0 c9t24d0 ONLINE 0 0 0 c9t19d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c9t30d0 ONLINE 0 0 0 c9t31d0 ONLINE 0 0 0 c9t32d0 ONLINE 0 0 0 c9t33d0 ONLINE 0 0 0 c9t34d0 ONLINE 0 0 0 c9t35d0 ONLINE 0 0 0 c9t36d0 ONLINE 0 0 0 c9t37d0 ONLINE 0 0 0 c9t38d0 ONLINE 0 0 0 c9t39d0 ONLINE 0 0 0 c9t40d0 ONLINE 0 0 0 c9t41d0 ONLINE 0 0 0 c9t42d0 ONLINE 0 0 0 c9t44d0 ONLINE 0 0 0 c9t45d0 ONLINE 0 0 0 c9t46d0 ONLINE 0 0 0 c9t47d0 ONLINE 0 0 0 c9t48d0 ONLINE 0 0 0 c9t49d0 ONLINE 0 0 0 c9t50d0 ONLINE 0 0 0 c9t51d0 ONLINE 0 0 0 c9t52d0 ONLINE 0 0 0 cache c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 spares c9t20d0 AVAIL c9t43d0 AVAIL errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c8t0d0s0 ONLINE 0 0 0 c8t1d0s0 ONLINE 0 0 0 errors: No known data errors ...and here is a snapshot of the system using "iostat -indexC 5" during a scrub of "pool002" (c8 is onboard AHCI controller, c9 is LSI SAS 3801E): extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t3d0 8738.7 0.0 555346.1 0.0 0.1 345.0 0.0 39.5 0 3875 0 1 1 2 c9 194.8 0.0 11936.9 0.0 0.0 7.9 0.0 40.3 0 87 0 0 0 0 c9t8d0 194.6 0.0 12927.9 0.0 0.0 7.6 0.0 38.9 0 86 0 0 0 0 c9t9d0 194.6 0.0 12622.6 0.0 0.0 8.1 0.0 41.7 0 90 0 0 0 0 c9t10d0 201.6 0.0 13350.9 0.0 0.0 8.0 0.0 39.5 0 90 0 0 0 0 c9t11d0 194.4 0.0 12902.3 0.0 0.0 7.8 0.0 40.1 0 88 0 0 0 0 c9t12d0 194.6 0.0 12902.3 0.0 0.0 7.7 0.0 39.3 0 88 0 0 0 0 c9t13d0 195.4 0.0 12479.0 0.0 0.0 8.5 0.0 43.4 0 92 0 0 0 0 c9t14d0 197.6 0.0 13107.4 0.0 0.0 8.1 0.0 41.0 0 92 0 0 0 0 c9t15d0 198.8 0.0 12918.1 0.0 0.0 8.2 0.0 41.4 0 92 0 0 0 0 c9t16d0 201.0 0.0 13350.3 0.0 0.0 8.1 0.0 40.4 0 91 0 0 0 0 c9t17d0 201.2 0.0 13325.0 0.0 0.0 7.8 0.0 38.5 0 88 0 0 0 0 c9t18d0 200.6 0.0 13021.5 0.0 0.0 8.2 0.0 40.7 0 91 0 0 0 0 c9t19d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t20d0 196.6 0.0 12991.9 0.0 0.0 7.6 0.0 38.8 0 85 0 0 0 0 c9t21d0 196.4 0.0 11499.3 0.0 0.0 8.0 0.0 40.5 0 89 0 0 0 0 c9t22d0 197.6 0.0 13030.3 0.0 0.0 8.0 0.0 40.3 0 90 0 0 0 0 c9t23d0 198.4 0.0 11535.8 0.0 0.0 7.8 0.0 39.3 0 87 0 0 0 0 c9t24d0 202.2 0.0 13096.3 0.0 0.0 7.9 0.0 39.3 0 89 0 0 0 0 c9t25d0 193.6 0.0 12457.4 0.0 0.0 8.3 0.0 42.8 0 90 0 0 0 0 c9t26d0 194.0 0.0 12799.9 0.0 0.0 8.2 0.0 42.1 0 91 0 0 0 0 c9t27d0 193.0 0.0 12748.8 0.0 0.0 7.9 0.0 41.0 0 88 0 0 0 0 c9t28d0 194.6 0.0 12863.9 0.0 0.0 7.9 0.0 40.6 0 89 0 0 0 0 c9t29d0 199.8 0.0 12849.1 0.0 0.0 7.8 0.0 39.0 0 87 0 0 0 0 c9t30d0 205.0 0.0 13631.9 0.0 0.0 7.8 0.0 38.2 0 88 0 0 0 0 c9t31d0 204.0 0.0 11674.3 0.0 0.0 7.9 0.0 38.6 0 88 0 0 0 0 c9t32d0 204.2 0.0 11339.9 0.0 0.0 8.1 0.0 39.7 0 89 0 0 0 0 c9t33d0 204.8 0.0 11569.7 0.0 0.0 7.7 0.0 37.7 0 86 0 0 0 0 c9t34d0 205.2 0.0 11268.7 0.0 0.0 7.9 0.0 38.6 0 88 0 0 0 0 c9t35d0 198.4 0.0 12814.9 0.0 0.0 7.8 0.0 39.5 0 88 0 0 0 0 c9t36d0 200.4 0.0 13222.3 0.0 0.0 7.9 0.0 39.2 0 88 0 0 0 0 c9t37d0 200.2 0.0 12324.5 0.0 0.0 7.4 0.0 37.1 0 85 0 0 0 0 c9t38d0 203.0 0.0 11928.8 0.0 0.0 7.7 0.0 37.7 0 88 0 0 0 0 c9t39d0 196.2 0.0 12966.3 0.0 0.0 7.5 0.0 38.0 0 84 0 0 0 0 c9t40d0 195.2 0.0 11544.8 0.0 0.0 7.9 0.0 40.5 0 89 0 0 0 0 c9t41d0 199.2 0.0 12601.8 0.0 0.0 7.8 0.0 38.9 0 88 0 0 0 0 c9t42d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t43d0 194.4 0.0 12940.7 0.0 0.0 7.6 0.0 39.2 0 86 0 0 0 0 c9t44d0 198.2 0.0 13120.6 0.0 0.0 7.5 0.0 38.1 0 86 0 0 0 0 c9t45d0 201.2 0.0 11713.6 0.0 0.0 7.8 0.0 39.0 0 89 0 0 0 0 c9t46d0 197.8 0.0 13196.7 0.0 0.0 7.4 0.0 37.4 0 85 0 0 0 0 c9t47d0 197.4 0.0 13094.3 0.0 0.0 7.6 0.0 38.6 0 87 0 0 0 0 c9t48d0 195.8 0.0 13017.5 0.0 0.0 7.5 0.0 38.4 0 85 0 1 1 2 c9t49d0 205.0 0.0 11384.4 0.0 0.0 8.0 0.0 39.0 0 89 0 0 0 0 c9t50d0 200.6 0.0 13286.6 0.0 0.0 7.5 0.0 37.2 0 85 0 0 0 0 c9t51d0 200.6 0.0 12931.6 0.0 0.0 7.9 0.0 39.5 0 89 0 0 0 0 c9t52d0 196.6 0.0 13055.9 0.0 0.0 7.5 0.0 38.3 0 87 0 0 0 0 c9t55d0 I had to abort the scrub shortly after this or we would start seeing the timeouts. -- This message posted from opensolaris.org
ok, see below... On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote:> Here is example of the pool config we use: > > # zpool status > pool: pool002 > state: ONLINE > scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52 > 2009 > config: > > NAME STATE READ WRITE CKSUM > pool002 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c9t18d0 ONLINE 0 0 0 > c9t17d0 ONLINE 0 0 0 > c9t55d0 ONLINE 0 0 0 > c9t13d0 ONLINE 0 0 0 > c9t15d0 ONLINE 0 0 0 > c9t16d0 ONLINE 0 0 0 > c9t11d0 ONLINE 0 0 0 > c9t12d0 ONLINE 0 0 0 > c9t14d0 ONLINE 0 0 0 > c9t9d0 ONLINE 0 0 0 > c9t8d0 ONLINE 0 0 0 > c9t10d0 ONLINE 0 0 0 > c9t29d0 ONLINE 0 0 0 > c9t28d0 ONLINE 0 0 0 > c9t27d0 ONLINE 0 0 0 > c9t23d0 ONLINE 0 0 0 > c9t25d0 ONLINE 0 0 0 > c9t26d0 ONLINE 0 0 0 > c9t21d0 ONLINE 0 0 0 > c9t22d0 ONLINE 0 0 0 > c9t24d0 ONLINE 0 0 0 > c9t19d0 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c9t30d0 ONLINE 0 0 0 > c9t31d0 ONLINE 0 0 0 > c9t32d0 ONLINE 0 0 0 > c9t33d0 ONLINE 0 0 0 > c9t34d0 ONLINE 0 0 0 > c9t35d0 ONLINE 0 0 0 > c9t36d0 ONLINE 0 0 0 > c9t37d0 ONLINE 0 0 0 > c9t38d0 ONLINE 0 0 0 > c9t39d0 ONLINE 0 0 0 > c9t40d0 ONLINE 0 0 0 > c9t41d0 ONLINE 0 0 0 > c9t42d0 ONLINE 0 0 0 > c9t44d0 ONLINE 0 0 0 > c9t45d0 ONLINE 0 0 0 > c9t46d0 ONLINE 0 0 0 > c9t47d0 ONLINE 0 0 0 > c9t48d0 ONLINE 0 0 0 > c9t49d0 ONLINE 0 0 0 > c9t50d0 ONLINE 0 0 0 > c9t51d0 ONLINE 0 0 0 > c9t52d0 ONLINE 0 0 0 > cache > c8t2d0 ONLINE 0 0 0 > c8t3d0 ONLINE 0 0 0 > spares > c9t20d0 AVAIL > c9t43d0 AVAIL > > errors: No known data errors > > pool: rpool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c8t0d0s0 ONLINE 0 0 0 > c8t1d0s0 ONLINE 0 0 0 > > errors: No known data errors > > ...and here is a snapshot of the system using "iostat -indexC 5" > during a scrub of "pool002" (c8 is onboard AHCI controller, c9 is > LSI SAS 3801E): > > extended device statistics ---- > errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w > trn tot device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t0d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t1d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t2d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t3d0 > 8738.7 0.0 555346.1 0.0 0.1 345.0 0.0 39.5 0 3875 > 0 1 1 2 c9You see 345 entries in the active queue. If the controller rolls over at 511 active entries, then it would explain why it would soon begin to have difficulty. Meanwhile, it is providing 8,738 IOPS and 555 MB/sec, which is quite respectable.> 194.8 0.0 11936.9 0.0 0.0 7.9 0.0 40.3 0 87 0 > 0 0 0 c9t8d0These disks are doing almost 200 read IOPS, but are not 100% busy. Average I/O size is 66 KB, which is not bad, lots of little I/Os could be worse, but at only 11.9 MB/s, you are not near the media bandwidth. Average service time is 40.3 milliseconds, which is not super, but may be reflective of contention in the channel. So there is more capacity to accept I/O commands, but...> 194.6 0.0 12927.9 0.0 0.0 7.6 0.0 38.9 0 86 0 > 0 0 0 c9t9d0 > 194.6 0.0 12622.6 0.0 0.0 8.1 0.0 41.7 0 90 0 > 0 0 0 c9t10d0 > 201.6 0.0 13350.9 0.0 0.0 8.0 0.0 39.5 0 90 0 > 0 0 0 c9t11d0 > 194.4 0.0 12902.3 0.0 0.0 7.8 0.0 40.1 0 88 0 > 0 0 0 c9t12d0 > 194.6 0.0 12902.3 0.0 0.0 7.7 0.0 39.3 0 88 0 > 0 0 0 c9t13d0 > 195.4 0.0 12479.0 0.0 0.0 8.5 0.0 43.4 0 92 0 > 0 0 0 c9t14d0 > 197.6 0.0 13107.4 0.0 0.0 8.1 0.0 41.0 0 92 0 > 0 0 0 c9t15d0 > 198.8 0.0 12918.1 0.0 0.0 8.2 0.0 41.4 0 92 0 > 0 0 0 c9t16d0 > 201.0 0.0 13350.3 0.0 0.0 8.1 0.0 40.4 0 91 0 > 0 0 0 c9t17d0 > 201.2 0.0 13325.0 0.0 0.0 7.8 0.0 38.5 0 88 0 > 0 0 0 c9t18d0 > 200.6 0.0 13021.5 0.0 0.0 8.2 0.0 40.7 0 91 0 > 0 0 0 c9t19d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t20d0 > 196.6 0.0 12991.9 0.0 0.0 7.6 0.0 38.8 0 85 0 > 0 0 0 c9t21d0 > 196.4 0.0 11499.3 0.0 0.0 8.0 0.0 40.5 0 89 0 > 0 0 0 c9t22d0 > 197.6 0.0 13030.3 0.0 0.0 8.0 0.0 40.3 0 90 0 > 0 0 0 c9t23d0 > 198.4 0.0 11535.8 0.0 0.0 7.8 0.0 39.3 0 87 0 > 0 0 0 c9t24d0 > 202.2 0.0 13096.3 0.0 0.0 7.9 0.0 39.3 0 89 0 > 0 0 0 c9t25d0 > 193.6 0.0 12457.4 0.0 0.0 8.3 0.0 42.8 0 90 0 > 0 0 0 c9t26d0 > 194.0 0.0 12799.9 0.0 0.0 8.2 0.0 42.1 0 91 0 > 0 0 0 c9t27d0 > 193.0 0.0 12748.8 0.0 0.0 7.9 0.0 41.0 0 88 0 > 0 0 0 c9t28d0 > 194.6 0.0 12863.9 0.0 0.0 7.9 0.0 40.6 0 89 0 > 0 0 0 c9t29d0 > 199.8 0.0 12849.1 0.0 0.0 7.8 0.0 39.0 0 87 0 > 0 0 0 c9t30d0 > 205.0 0.0 13631.9 0.0 0.0 7.8 0.0 38.2 0 88 0 > 0 0 0 c9t31d0 > 204.0 0.0 11674.3 0.0 0.0 7.9 0.0 38.6 0 88 0 > 0 0 0 c9t32d0 > 204.2 0.0 11339.9 0.0 0.0 8.1 0.0 39.7 0 89 0 > 0 0 0 c9t33d0 > 204.8 0.0 11569.7 0.0 0.0 7.7 0.0 37.7 0 86 0 > 0 0 0 c9t34d0 > 205.2 0.0 11268.7 0.0 0.0 7.9 0.0 38.6 0 88 0 > 0 0 0 c9t35d0 > 198.4 0.0 12814.9 0.0 0.0 7.8 0.0 39.5 0 88 0 > 0 0 0 c9t36d0 > 200.4 0.0 13222.3 0.0 0.0 7.9 0.0 39.2 0 88 0 > 0 0 0 c9t37d0 > 200.2 0.0 12324.5 0.0 0.0 7.4 0.0 37.1 0 85 0 > 0 0 0 c9t38d0 > 203.0 0.0 11928.8 0.0 0.0 7.7 0.0 37.7 0 88 0 > 0 0 0 c9t39d0 > 196.2 0.0 12966.3 0.0 0.0 7.5 0.0 38.0 0 84 0 > 0 0 0 c9t40d0 > 195.2 0.0 11544.8 0.0 0.0 7.9 0.0 40.5 0 89 0 > 0 0 0 c9t41d0 > 199.2 0.0 12601.8 0.0 0.0 7.8 0.0 38.9 0 88 0 > 0 0 0 c9t42d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t43d0 > 194.4 0.0 12940.7 0.0 0.0 7.6 0.0 39.2 0 86 0 > 0 0 0 c9t44d0 > 198.2 0.0 13120.6 0.0 0.0 7.5 0.0 38.1 0 86 0 > 0 0 0 c9t45d0 > 201.2 0.0 11713.6 0.0 0.0 7.8 0.0 39.0 0 89 0 > 0 0 0 c9t46d0 > 197.8 0.0 13196.7 0.0 0.0 7.4 0.0 37.4 0 85 0 > 0 0 0 c9t47d0 > 197.4 0.0 13094.3 0.0 0.0 7.6 0.0 38.6 0 87 0 > 0 0 0 c9t48d0 > 195.8 0.0 13017.5 0.0 0.0 7.5 0.0 38.4 0 85 0 > 1 1 2 c9t49d0 > 205.0 0.0 11384.4 0.0 0.0 8.0 0.0 39.0 0 89 0 > 0 0 0 c9t50d0 > 200.6 0.0 13286.6 0.0 0.0 7.5 0.0 37.2 0 85 0 > 0 0 0 c9t51d0 > 200.6 0.0 12931.6 0.0 0.0 7.9 0.0 39.5 0 89 0 > 0 0 0 c9t52d0 > 196.6 0.0 13055.9 0.0 0.0 7.5 0.0 38.3 0 87 0 > 0 0 0 c9t55d0 > > I had to abort the scrub shortly after this or we would start seeing > the timeouts.yep. If you set the queue depth to 7, does it complete without timeouts? -- richard
How do you estimate needed queue depth if one has say 64 to 128 disks sitting behind LSI? Is it bad idea having queuedepth 1? Yours Markus Kovero ________________________________________ L?hett?j?: zfs-discuss-bounces at opensolaris.org [zfs-discuss-bounces at opensolaris.org] käyttäjän Richard Elling [richard.elling at gmail.com] puolesta L?hetetty: 24. lokakuuta 2009 7:36 Vastaanottaja: Adam Cheal Kopio: zfs-discuss at opensolaris.org Aihe: Re: [zfs-discuss] SNV_125 MPT warning in logfile ok, see below... On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote:> Here is example of the pool config we use: > > # zpool status > pool: pool002 > state: ONLINE > scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52 > 2009 > config: > > NAME STATE READ WRITE CKSUM > pool002 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c9t18d0 ONLINE 0 0 0 > c9t17d0 ONLINE 0 0 0 > c9t55d0 ONLINE 0 0 0 > c9t13d0 ONLINE 0 0 0 > c9t15d0 ONLINE 0 0 0 > c9t16d0 ONLINE 0 0 0 > c9t11d0 ONLINE 0 0 0 > c9t12d0 ONLINE 0 0 0 > c9t14d0 ONLINE 0 0 0 > c9t9d0 ONLINE 0 0 0 > c9t8d0 ONLINE 0 0 0 > c9t10d0 ONLINE 0 0 0 > c9t29d0 ONLINE 0 0 0 > c9t28d0 ONLINE 0 0 0 > c9t27d0 ONLINE 0 0 0 > c9t23d0 ONLINE 0 0 0 > c9t25d0 ONLINE 0 0 0 > c9t26d0 ONLINE 0 0 0 > c9t21d0 ONLINE 0 0 0 > c9t22d0 ONLINE 0 0 0 > c9t24d0 ONLINE 0 0 0 > c9t19d0 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c9t30d0 ONLINE 0 0 0 > c9t31d0 ONLINE 0 0 0 > c9t32d0 ONLINE 0 0 0 > c9t33d0 ONLINE 0 0 0 > c9t34d0 ONLINE 0 0 0 > c9t35d0 ONLINE 0 0 0 > c9t36d0 ONLINE 0 0 0 > c9t37d0 ONLINE 0 0 0 > c9t38d0 ONLINE 0 0 0 > c9t39d0 ONLINE 0 0 0 > c9t40d0 ONLINE 0 0 0 > c9t41d0 ONLINE 0 0 0 > c9t42d0 ONLINE 0 0 0 > c9t44d0 ONLINE 0 0 0 > c9t45d0 ONLINE 0 0 0 > c9t46d0 ONLINE 0 0 0 > c9t47d0 ONLINE 0 0 0 > c9t48d0 ONLINE 0 0 0 > c9t49d0 ONLINE 0 0 0 > c9t50d0 ONLINE 0 0 0 > c9t51d0 ONLINE 0 0 0 > c9t52d0 ONLINE 0 0 0 > cache > c8t2d0 ONLINE 0 0 0 > c8t3d0 ONLINE 0 0 0 > spares > c9t20d0 AVAIL > c9t43d0 AVAIL > > errors: No known data errors > > pool: rpool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c8t0d0s0 ONLINE 0 0 0 > c8t1d0s0 ONLINE 0 0 0 > > errors: No known data errors > > ...and here is a snapshot of the system using "iostat -indexC 5" > during a scrub of "pool002" (c8 is onboard AHCI controller, c9 is > LSI SAS 3801E): > > extended device statistics ---- > errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w > trn tot device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t0d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t1d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t2d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t3d0 > 8738.7 0.0 555346.1 0.0 0.1 345.0 0.0 39.5 0 3875 > 0 1 1 2 c9You see 345 entries in the active queue. If the controller rolls over at 511 active entries, then it would explain why it would soon begin to have difficulty. Meanwhile, it is providing 8,738 IOPS and 555 MB/sec, which is quite respectable.> 194.8 0.0 11936.9 0.0 0.0 7.9 0.0 40.3 0 87 0 > 0 0 0 c9t8d0These disks are doing almost 200 read IOPS, but are not 100% busy. Average I/O size is 66 KB, which is not bad, lots of little I/Os could be worse, but at only 11.9 MB/s, you are not near the media bandwidth. Average service time is 40.3 milliseconds, which is not super, but may be reflective of contention in the channel. So there is more capacity to accept I/O commands, but...> 194.6 0.0 12927.9 0.0 0.0 7.6 0.0 38.9 0 86 0 > 0 0 0 c9t9d0 > 194.6 0.0 12622.6 0.0 0.0 8.1 0.0 41.7 0 90 0 > 0 0 0 c9t10d0 > 201.6 0.0 13350.9 0.0 0.0 8.0 0.0 39.5 0 90 0 > 0 0 0 c9t11d0 > 194.4 0.0 12902.3 0.0 0.0 7.8 0.0 40.1 0 88 0 > 0 0 0 c9t12d0 > 194.6 0.0 12902.3 0.0 0.0 7.7 0.0 39.3 0 88 0 > 0 0 0 c9t13d0 > 195.4 0.0 12479.0 0.0 0.0 8.5 0.0 43.4 0 92 0 > 0 0 0 c9t14d0 > 197.6 0.0 13107.4 0.0 0.0 8.1 0.0 41.0 0 92 0 > 0 0 0 c9t15d0 > 198.8 0.0 12918.1 0.0 0.0 8.2 0.0 41.4 0 92 0 > 0 0 0 c9t16d0 > 201.0 0.0 13350.3 0.0 0.0 8.1 0.0 40.4 0 91 0 > 0 0 0 c9t17d0 > 201.2 0.0 13325.0 0.0 0.0 7.8 0.0 38.5 0 88 0 > 0 0 0 c9t18d0 > 200.6 0.0 13021.5 0.0 0.0 8.2 0.0 40.7 0 91 0 > 0 0 0 c9t19d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t20d0 > 196.6 0.0 12991.9 0.0 0.0 7.6 0.0 38.8 0 85 0 > 0 0 0 c9t21d0 > 196.4 0.0 11499.3 0.0 0.0 8.0 0.0 40.5 0 89 0 > 0 0 0 c9t22d0 > 197.6 0.0 13030.3 0.0 0.0 8.0 0.0 40.3 0 90 0 > 0 0 0 c9t23d0 > 198.4 0.0 11535.8 0.0 0.0 7.8 0.0 39.3 0 87 0 > 0 0 0 c9t24d0 > 202.2 0.0 13096.3 0.0 0.0 7.9 0.0 39.3 0 89 0 > 0 0 0 c9t25d0 > 193.6 0.0 12457.4 0.0 0.0 8.3 0.0 42.8 0 90 0 > 0 0 0 c9t26d0 > 194.0 0.0 12799.9 0.0 0.0 8.2 0.0 42.1 0 91 0 > 0 0 0 c9t27d0 > 193.0 0.0 12748.8 0.0 0.0 7.9 0.0 41.0 0 88 0 > 0 0 0 c9t28d0 > 194.6 0.0 12863.9 0.0 0.0 7.9 0.0 40.6 0 89 0 > 0 0 0 c9t29d0 > 199.8 0.0 12849.1 0.0 0.0 7.8 0.0 39.0 0 87 0 > 0 0 0 c9t30d0 > 205.0 0.0 13631.9 0.0 0.0 7.8 0.0 38.2 0 88 0 > 0 0 0 c9t31d0 > 204.0 0.0 11674.3 0.0 0.0 7.9 0.0 38.6 0 88 0 > 0 0 0 c9t32d0 > 204.2 0.0 11339.9 0.0 0.0 8.1 0.0 39.7 0 89 0 > 0 0 0 c9t33d0 > 204.8 0.0 11569.7 0.0 0.0 7.7 0.0 37.7 0 86 0 > 0 0 0 c9t34d0 > 205.2 0.0 11268.7 0.0 0.0 7.9 0.0 38.6 0 88 0 > 0 0 0 c9t35d0 > 198.4 0.0 12814.9 0.0 0.0 7.8 0.0 39.5 0 88 0 > 0 0 0 c9t36d0 > 200.4 0.0 13222.3 0.0 0.0 7.9 0.0 39.2 0 88 0 > 0 0 0 c9t37d0 > 200.2 0.0 12324.5 0.0 0.0 7.4 0.0 37.1 0 85 0 > 0 0 0 c9t38d0 > 203.0 0.0 11928.8 0.0 0.0 7.7 0.0 37.7 0 88 0 > 0 0 0 c9t39d0 > 196.2 0.0 12966.3 0.0 0.0 7.5 0.0 38.0 0 84 0 > 0 0 0 c9t40d0 > 195.2 0.0 11544.8 0.0 0.0 7.9 0.0 40.5 0 89 0 > 0 0 0 c9t41d0 > 199.2 0.0 12601.8 0.0 0.0 7.8 0.0 38.9 0 88 0 > 0 0 0 c9t42d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t43d0 > 194.4 0.0 12940.7 0.0 0.0 7.6 0.0 39.2 0 86 0 > 0 0 0 c9t44d0 > 198.2 0.0 13120.6 0.0 0.0 7.5 0.0 38.1 0 86 0 > 0 0 0 c9t45d0 > 201.2 0.0 11713.6 0.0 0.0 7.8 0.0 39.0 0 89 0 > 0 0 0 c9t46d0 > 197.8 0.0 13196.7 0.0 0.0 7.4 0.0 37.4 0 85 0 > 0 0 0 c9t47d0 > 197.4 0.0 13094.3 0.0 0.0 7.6 0.0 38.6 0 87 0 > 0 0 0 c9t48d0 > 195.8 0.0 13017.5 0.0 0.0 7.5 0.0 38.4 0 85 0 > 1 1 2 c9t49d0 > 205.0 0.0 11384.4 0.0 0.0 8.0 0.0 39.0 0 89 0 > 0 0 0 c9t50d0 > 200.6 0.0 13286.6 0.0 0.0 7.5 0.0 37.2 0 85 0 > 0 0 0 c9t51d0 > 200.6 0.0 12931.6 0.0 0.0 7.9 0.0 39.5 0 89 0 > 0 0 0 c9t52d0 > 196.6 0.0 13055.9 0.0 0.0 7.5 0.0 38.3 0 87 0 > 0 0 0 c9t55d0 > > I had to abort the scrub shortly after this or we would start seeing > the timeouts.yep. If you set the queue depth to 7, does it complete without timeouts? -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
The iostat I posted previously was from a system we had already tuned the zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 in actv per disk). I reset this value in /etc/system to 7, rebooted, and started a scrub. iostat output showed busier disks (%b is higher, which seemed odd) but a cap of about 7 queue items per disk, proving the tuning was effective. iostat at a high-water mark during the test looked like this: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t3d0 8344.5 0.0 359640.4 0.0 0.1 300.5 0.0 36.0 0 4362 c9 190.0 0.0 6800.4 0.0 0.0 6.6 0.0 34.8 0 99 c9t8d0 185.0 0.0 6917.1 0.0 0.0 6.1 0.0 32.9 0 94 c9t9d0 187.0 0.0 6640.9 0.0 0.0 6.5 0.0 34.6 0 98 c9t10d0 186.5 0.0 6543.4 0.0 0.0 7.0 0.0 37.5 0 100 c9t11d0 180.5 0.0 7203.1 0.0 0.0 6.7 0.0 37.2 0 100 c9t12d0 195.5 0.0 7352.4 0.0 0.0 7.0 0.0 35.8 0 100 c9t13d0 188.0 0.0 6884.9 0.0 0.0 6.6 0.0 35.2 0 99 c9t14d0 204.0 0.0 6990.1 0.0 0.0 7.0 0.0 34.3 0 100 c9t15d0 199.0 0.0 7336.7 0.0 0.0 7.0 0.0 35.2 0 100 c9t16d0 180.5 0.0 6837.9 0.0 0.0 7.0 0.0 38.8 0 100 c9t17d0 198.0 0.0 7668.9 0.0 0.0 7.0 0.0 35.3 0 100 c9t18d0 203.0 0.0 7983.2 0.0 0.0 7.0 0.0 34.5 0 100 c9t19d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c9t20d0 195.5 0.0 7096.4 0.0 0.0 6.7 0.0 34.1 0 98 c9t21d0 189.5 0.0 7757.2 0.0 0.0 6.4 0.0 33.9 0 97 c9t22d0 195.5 0.0 7645.9 0.0 0.0 6.6 0.0 33.8 0 99 c9t23d0 194.5 0.0 7925.9 0.0 0.0 7.0 0.0 36.0 0 100 c9t24d0 188.5 0.0 6725.6 0.0 0.0 6.2 0.0 32.8 0 94 c9t25d0 188.5 0.0 7199.6 0.0 0.0 6.5 0.0 34.6 0 98 c9t26d0 196.0 0.0 6666.9 0.0 0.0 6.3 0.0 32.1 0 95 c9t27d0 193.5 0.0 7455.4 0.0 0.0 6.2 0.0 32.0 0 95 c9t28d0 189.0 0.0 7400.9 0.0 0.0 6.3 0.0 33.2 0 96 c9t29d0 182.5 0.0 9397.0 0.0 0.0 7.0 0.0 38.3 0 100 c9t30d0 192.5 0.0 9179.5 0.0 0.0 7.0 0.0 36.3 0 100 c9t31d0 189.5 0.0 9431.8 0.0 0.0 7.0 0.0 36.9 0 100 c9t32d0 187.5 0.0 9082.0 0.0 0.0 7.0 0.0 37.3 0 100 c9t33d0 188.5 0.0 9368.8 0.0 0.0 7.0 0.0 37.1 0 100 c9t34d0 180.5 0.0 9332.8 0.0 0.0 7.0 0.0 38.8 0 100 c9t35d0 183.0 0.0 9690.3 0.0 0.0 7.0 0.0 38.2 0 100 c9t36d0 186.0 0.0 9193.8 0.0 0.0 7.0 0.0 37.6 0 100 c9t37d0 180.5 0.0 8233.4 0.0 0.0 7.0 0.0 38.8 0 100 c9t38d0 175.5 0.0 9085.2 0.0 0.0 7.0 0.0 39.9 0 100 c9t39d0 177.0 0.0 9340.0 0.0 0.0 7.0 0.0 39.5 0 100 c9t40d0 175.5 0.0 8831.0 0.0 0.0 7.0 0.0 39.9 0 100 c9t41d0 190.5 0.0 9177.8 0.0 0.0 7.0 0.0 36.7 0 100 c9t42d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c9t43d0 196.0 0.0 9180.5 0.0 0.0 7.0 0.0 35.7 0 100 c9t44d0 193.5 0.0 9496.8 0.0 0.0 7.0 0.0 36.2 0 100 c9t45d0 187.0 0.0 8699.5 0.0 0.0 7.0 0.0 37.4 0 100 c9t46d0 198.5 0.0 9277.0 0.0 0.0 7.0 0.0 35.2 0 100 c9t47d0 185.5 0.0 9778.3 0.0 0.0 7.0 0.0 37.7 0 100 c9t48d0 192.0 0.0 8384.2 0.0 0.0 7.0 0.0 36.4 0 100 c9t49d0 198.5 0.0 8864.7 0.0 0.0 7.0 0.0 35.2 0 100 c9t50d0 192.0 0.0 9369.8 0.0 0.0 7.0 0.0 36.4 0 100 c9t51d0 182.5 0.0 8825.7 0.0 0.0 7.0 0.0 38.3 0 100 c9t52d0 202.0 0.0 7387.9 0.0 0.0 7.0 0.0 34.6 0 100 c9t55d0 ...and sure enough about 20 minutes into it I get this (bus reset?): scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0/sd at 34,0 (sd49): incomplete read- retrying scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0/sd at 21,0 (sd30): incomplete read- retrying scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0/sd at 1e,0 (sd27): incomplete read- retrying scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): Rev. 8 LSI, Inc. 1068E found. scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): mpt0 supports power management. scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): mpt0: IOC Operational. During the "bus reset", iostat output looked like this: extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t3d0 0.0 0.0 0.0 0.0 0.0 88.0 0.0 0.0 0 2200 0 3 0 3 c9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t8d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t9d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t10d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t11d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t13d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t14d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t15d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t16d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t17d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t18d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t19d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t20d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t21d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t22d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t23d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t24d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t25d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t26d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t27d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t28d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t29d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 1 0 1 c9t30d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t31d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t32d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 1 0 1 c9t33d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t34d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t35d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t36d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t37d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t38d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t39d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t40d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t41d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t42d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t43d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t44d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t45d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t46d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t47d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t48d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t49d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t50d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t51d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 1 0 1 c9t52d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t55d0 During our previous testing, we had tried even setting this max_pending value down to 1, but we still hit the problem (albeit it took a little longer to hit it) and I couldn''t find anything else I could set to throttle IO to the disk, hence the frustration. If you hadn''t seen this output, would you say that 7 was a "reasonable" value for that max_pending queue for our architecture and should give the LSI controller in this situation enough breathing room to operate? If so, I *should* be able to scrub the disks successfully (ZFS isn''t to blame) and therefore have to point the finger at the mpt-driver/LSI-firmware/disk-firmware instead. -- This message posted from opensolaris.org
We actually hit similar issues with LSI, but within workload not scrub, result is same but it seems to choke on writes rather than reads with suboptimal performance. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6891413 Anyway, we haven''t experienced this _at all_ with RE3-version of Western Digital disks.. Issues seem to pop up with 750GB seagate and 1TB WD black-series, so far 2TB green WDs seem unaffected too, so might it be related to disks firmware due how they chat with LSI? Also, we noticed more severe (even RE3 and 2TBWD green) timeouts if disks are not forced into SATA1-mode, I believe this is known issue with newer 2TB disks and some other disk controllers and may be caused by bad cabling or connectivity. We have never witnessed this behaviour with SAS (fujitsu,ibm..) also. All this happens with snv 118,122,123 and 125. Yours Markus Kovero ________________________________________ L?hett?j?: zfs-discuss-bounces at opensolaris.org [zfs-discuss-bounces at opensolaris.org] käyttäjän Adam Cheal [acheal at pnimedia.com] puolesta L?hetetty: 24. lokakuuta 2009 12:49 Vastaanottaja: zfs-discuss at opensolaris.org Aihe: Re: [zfs-discuss] SNV_125 MPT warning in logfile The iostat I posted previously was from a system we had already tuned the zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 in actv per disk). I reset this value in /etc/system to 7, rebooted, and started a scrub. iostat output showed busier disks (%b is higher, which seemed odd) but a cap of about 7 queue items per disk, proving the tuning was effective. iostat at a high-water mark during the test looked like this: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t3d0 8344.5 0.0 359640.4 0.0 0.1 300.5 0.0 36.0 0 4362 c9 190.0 0.0 6800.4 0.0 0.0 6.6 0.0 34.8 0 99 c9t8d0 185.0 0.0 6917.1 0.0 0.0 6.1 0.0 32.9 0 94 c9t9d0 187.0 0.0 6640.9 0.0 0.0 6.5 0.0 34.6 0 98 c9t10d0 186.5 0.0 6543.4 0.0 0.0 7.0 0.0 37.5 0 100 c9t11d0 180.5 0.0 7203.1 0.0 0.0 6.7 0.0 37.2 0 100 c9t12d0 195.5 0.0 7352.4 0.0 0.0 7.0 0.0 35.8 0 100 c9t13d0 188.0 0.0 6884.9 0.0 0.0 6.6 0.0 35.2 0 99 c9t14d0 204.0 0.0 6990.1 0.0 0.0 7.0 0.0 34.3 0 100 c9t15d0 199.0 0.0 7336.7 0.0 0.0 7.0 0.0 35.2 0 100 c9t16d0 180.5 0.0 6837.9 0.0 0.0 7.0 0.0 38.8 0 100 c9t17d0 198.0 0.0 7668.9 0.0 0.0 7.0 0.0 35.3 0 100 c9t18d0 203.0 0.0 7983.2 0.0 0.0 7.0 0.0 34.5 0 100 c9t19d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c9t20d0 195.5 0.0 7096.4 0.0 0.0 6.7 0.0 34.1 0 98 c9t21d0 189.5 0.0 7757.2 0.0 0.0 6.4 0.0 33.9 0 97 c9t22d0 195.5 0.0 7645.9 0.0 0.0 6.6 0.0 33.8 0 99 c9t23d0 194.5 0.0 7925.9 0.0 0.0 7.0 0.0 36.0 0 100 c9t24d0 188.5 0.0 6725.6 0.0 0.0 6.2 0.0 32.8 0 94 c9t25d0 188.5 0.0 7199.6 0.0 0.0 6.5 0.0 34.6 0 98 c9t26d0 196.0 0.0 6666.9 0.0 0.0 6.3 0.0 32.1 0 95 c9t27d0 193.5 0.0 7455.4 0.0 0.0 6.2 0.0 32.0 0 95 c9t28d0 189.0 0.0 7400.9 0.0 0.0 6.3 0.0 33.2 0 96 c9t29d0 182.5 0.0 9397.0 0.0 0.0 7.0 0.0 38.3 0 100 c9t30d0 192.5 0.0 9179.5 0.0 0.0 7.0 0.0 36.3 0 100 c9t31d0 189.5 0.0 9431.8 0.0 0.0 7.0 0.0 36.9 0 100 c9t32d0 187.5 0.0 9082.0 0.0 0.0 7.0 0.0 37.3 0 100 c9t33d0 188.5 0.0 9368.8 0.0 0.0 7.0 0.0 37.1 0 100 c9t34d0 180.5 0.0 9332.8 0.0 0.0 7.0 0.0 38.8 0 100 c9t35d0 183.0 0.0 9690.3 0.0 0.0 7.0 0.0 38.2 0 100 c9t36d0 186.0 0.0 9193.8 0.0 0.0 7.0 0.0 37.6 0 100 c9t37d0 180.5 0.0 8233.4 0.0 0.0 7.0 0.0 38.8 0 100 c9t38d0 175.5 0.0 9085.2 0.0 0.0 7.0 0.0 39.9 0 100 c9t39d0 177.0 0.0 9340.0 0.0 0.0 7.0 0.0 39.5 0 100 c9t40d0 175.5 0.0 8831.0 0.0 0.0 7.0 0.0 39.9 0 100 c9t41d0 190.5 0.0 9177.8 0.0 0.0 7.0 0.0 36.7 0 100 c9t42d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c9t43d0 196.0 0.0 9180.5 0.0 0.0 7.0 0.0 35.7 0 100 c9t44d0 193.5 0.0 9496.8 0.0 0.0 7.0 0.0 36.2 0 100 c9t45d0 187.0 0.0 8699.5 0.0 0.0 7.0 0.0 37.4 0 100 c9t46d0 198.5 0.0 9277.0 0.0 0.0 7.0 0.0 35.2 0 100 c9t47d0 185.5 0.0 9778.3 0.0 0.0 7.0 0.0 37.7 0 100 c9t48d0 192.0 0.0 8384.2 0.0 0.0 7.0 0.0 36.4 0 100 c9t49d0 198.5 0.0 8864.7 0.0 0.0 7.0 0.0 35.2 0 100 c9t50d0 192.0 0.0 9369.8 0.0 0.0 7.0 0.0 36.4 0 100 c9t51d0 182.5 0.0 8825.7 0.0 0.0 7.0 0.0 38.3 0 100 c9t52d0 202.0 0.0 7387.9 0.0 0.0 7.0 0.0 34.6 0 100 c9t55d0 ...and sure enough about 20 minutes into it I get this (bus reset?): scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0/sd at 34,0 (sd49): incomplete read- retrying scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0/sd at 21,0 (sd30): incomplete read- retrying scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0/sd at 1e,0 (sd27): incomplete read- retrying scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): Rev. 8 LSI, Inc. 1068E found. scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): mpt0 supports power management. scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): mpt0: IOC Operational. During the "bus reset", iostat output looked like this: extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c8t3d0 0.0 0.0 0.0 0.0 0.0 88.0 0.0 0.0 0 2200 0 3 0 3 c9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t8d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t9d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t10d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t11d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t12d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t13d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t14d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t15d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t16d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t17d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t18d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t19d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t20d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t21d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t22d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t23d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t24d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t25d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t26d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t27d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t28d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t29d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 1 0 1 c9t30d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t31d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t32d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 1 0 1 c9t33d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t34d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t35d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t36d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t37d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t38d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t39d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t40d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t41d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t42d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t43d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t44d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t45d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t46d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t47d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t48d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t49d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t50d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 0 0 0 c9t51d0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 1 0 1 c9t52d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c9t55d0 During our previous testing, we had tried even setting this max_pending value down to 1, but we still hit the problem (albeit it took a little longer to hit it) and I couldn''t find anything else I could set to throttle IO to the disk, hence the frustration. If you hadn''t seen this output, would you say that 7 was a "reasonable" value for that max_pending queue for our architecture and should give the LSI controller in this situation enough breathing room to operate? If so, I *should* be able to scrub the disks successfully (ZFS isn''t to blame) and therefore have to point the finger at the mpt-driver/LSI-firmware/disk-firmware instead. -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Sat, Oct 24, 2009 at 4:49 AM, Adam Cheal <acheal at pnimedia.com> wrote:> The iostat I posted previously was from a system we had already tuned the > zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 > in actv per disk). > > I reset this value in /etc/system to 7, rebooted, and started a scrub. > iostat output showed busier disks (%b is higher, which seemed odd) but a cap > of about 7 queue items per disk, proving the tuning was effective. iostat at > a high-water mark during the test looked like this: >> ...and sure enough about 20 minutes into it I get this (bus reset?): > > scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4 > /pci1000,30a0 at 0/sd at 34,0 (sd49): > incomplete read- retrying > scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4 > /pci1000,30a0 at 0/sd at 21,0 (sd30): > incomplete read- retrying > scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4 > /pci1000,30a0 at 0/sd at 1e,0 (sd27): > incomplete read- retrying > scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): > Rev. 8 LSI, Inc. 1068E found. > scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): > mpt0 supports power management. > scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 (mpt0): > mpt0: IOC Operational. > > During the "bus reset", iostat output looked like this: > > > During our previous testing, we had tried even setting this max_pending > value down to 1, but we still hit the problem (albeit it took a little > longer to hit it) and I couldn''t find anything else I could set to throttle > IO to the disk, hence the frustration. > > If you hadn''t seen this output, would you say that 7 was a "reasonable" > value for that max_pending queue for our architecture and should give the > LSI controller in this situation enough breathing room to operate? If so, I > *should* be able to scrub the disks successfully (ZFS isn''t to blame) and > therefore have to point the finger at the > mpt-driver/LSI-firmware/disk-firmware instead. > -- > >A little bit of searching google says: http://downloadmirror.intel.com/17968/eng/ESRT2_IR_readme.txt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091024/cd1a297c/attachment.html>
On Sat, Oct 24, 2009 at 11:20 AM, Tim Cook <tim at cook.ms> wrote:> > > On Sat, Oct 24, 2009 at 4:49 AM, Adam Cheal <acheal at pnimedia.com> wrote: > >> The iostat I posted previously was from a system we had already tuned the >> zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 >> in actv per disk). >> >> I reset this value in /etc/system to 7, rebooted, and started a scrub. >> iostat output showed busier disks (%b is higher, which seemed odd) but a cap >> of about 7 queue items per disk, proving the tuning was effective. iostat at >> a high-water mark during the test looked like this: >> > > >> ...and sure enough about 20 minutes into it I get this (bus reset?): >> >> >> scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4 >> /pci1000,30a0 at 0/sd at 34,0 (sd49): >> incomplete read- retrying >> scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4 >> /pci1000,30a0 at 0/sd at 21,0 (sd30): >> incomplete read- retrying >> scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4 >> /pci1000,30a0 at 0/sd at 1e,0 (sd27): >> incomplete read- retrying >> scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0(mpt0): >> Rev. 8 LSI, Inc. 1068E found. >> scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0(mpt0): >> mpt0 supports power management. >> scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0(mpt0): >> mpt0: IOC Operational. >> >> During the "bus reset", iostat output looked like this: >> >> >> During our previous testing, we had tried even setting this max_pending >> value down to 1, but we still hit the problem (albeit it took a little >> longer to hit it) and I couldn''t find anything else I could set to throttle >> IO to the disk, hence the frustration. >> >> If you hadn''t seen this output, would you say that 7 was a "reasonable" >> value for that max_pending queue for our architecture and should give the >> LSI controller in this situation enough breathing room to operate? If so, I >> *should* be able to scrub the disks successfully (ZFS isn''t to blame) and >> therefore have to point the finger at the >> mpt-driver/LSI-firmware/disk-firmware instead. >> -- >> >> > A little bit of searching google says: > http://downloadmirror.intel.com/17968/eng/ESRT2_IR_readme.txt > >Huh, good old keyboard shortcuts firing off emails before I''m done with them. Anyways, in that link, I found he following: 3. Updated - to provide NCQ queue depth of 32 (was 8) on 1064e and 1068e and 1078 internal-only controllers in IR and ESRT2 modes. Then there''s also this link from someone using a similar controller under freebsd: http://www.nabble.com/mpt-errors-QUEUE-FULL-EVENT,-freebsd-7.0-on-dell-1950-td20019090.html It would make total sense if you''re having issues and the default queue depth for that controller is 8 per port. Even setting it to 1 isn''t going to fix your issue if you''ve got 46 drives on one channel/port. Honestly I''m just taking shots in the dark though. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091024/15419708/attachment.html>
more below... On Oct 24, 2009, at 2:49 AM, Adam Cheal wrote:> The iostat I posted previously was from a system we had already > tuned the zfs:zfs_vdev_max_pending depth down to 10 (as visible by > the max of about 10 in actv per disk). > > I reset this value in /etc/system to 7, rebooted, and started a > scrub. iostat output showed busier disks (%b is higher, which seemed > odd) but a cap of about 7 queue items per disk, proving the tuning > was effective. iostat at a high-water mark during the test looked > like this: > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t0d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t1d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t2d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c8t3d0 > 8344.5 0.0 359640.4 0.0 0.1 300.5 0.0 36.0 0 4362 c9 > 190.0 0.0 6800.4 0.0 0.0 6.6 0.0 34.8 0 99 c9t8d0 > 185.0 0.0 6917.1 0.0 0.0 6.1 0.0 32.9 0 94 c9t9d0 > 187.0 0.0 6640.9 0.0 0.0 6.5 0.0 34.6 0 98 c9t10d0 > 186.5 0.0 6543.4 0.0 0.0 7.0 0.0 37.5 0 100 c9t11d0 > 180.5 0.0 7203.1 0.0 0.0 6.7 0.0 37.2 0 100 c9t12d0 > 195.5 0.0 7352.4 0.0 0.0 7.0 0.0 35.8 0 100 c9t13d0 > 188.0 0.0 6884.9 0.0 0.0 6.6 0.0 35.2 0 99 c9t14d0 > 204.0 0.0 6990.1 0.0 0.0 7.0 0.0 34.3 0 100 c9t15d0 > 199.0 0.0 7336.7 0.0 0.0 7.0 0.0 35.2 0 100 c9t16d0 > 180.5 0.0 6837.9 0.0 0.0 7.0 0.0 38.8 0 100 c9t17d0 > 198.0 0.0 7668.9 0.0 0.0 7.0 0.0 35.3 0 100 c9t18d0 > 203.0 0.0 7983.2 0.0 0.0 7.0 0.0 34.5 0 100 c9t19d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c9t20d0 > 195.5 0.0 7096.4 0.0 0.0 6.7 0.0 34.1 0 98 c9t21d0 > 189.5 0.0 7757.2 0.0 0.0 6.4 0.0 33.9 0 97 c9t22d0 > 195.5 0.0 7645.9 0.0 0.0 6.6 0.0 33.8 0 99 c9t23d0 > 194.5 0.0 7925.9 0.0 0.0 7.0 0.0 36.0 0 100 c9t24d0 > 188.5 0.0 6725.6 0.0 0.0 6.2 0.0 32.8 0 94 c9t25d0 > 188.5 0.0 7199.6 0.0 0.0 6.5 0.0 34.6 0 98 c9t26d0 > 196.0 0.0 6666.9 0.0 0.0 6.3 0.0 32.1 0 95 c9t27d0 > 193.5 0.0 7455.4 0.0 0.0 6.2 0.0 32.0 0 95 c9t28d0 > 189.0 0.0 7400.9 0.0 0.0 6.3 0.0 33.2 0 96 c9t29d0 > 182.5 0.0 9397.0 0.0 0.0 7.0 0.0 38.3 0 100 c9t30d0 > 192.5 0.0 9179.5 0.0 0.0 7.0 0.0 36.3 0 100 c9t31d0 > 189.5 0.0 9431.8 0.0 0.0 7.0 0.0 36.9 0 100 c9t32d0 > 187.5 0.0 9082.0 0.0 0.0 7.0 0.0 37.3 0 100 c9t33d0 > 188.5 0.0 9368.8 0.0 0.0 7.0 0.0 37.1 0 100 c9t34d0 > 180.5 0.0 9332.8 0.0 0.0 7.0 0.0 38.8 0 100 c9t35d0 > 183.0 0.0 9690.3 0.0 0.0 7.0 0.0 38.2 0 100 c9t36d0 > 186.0 0.0 9193.8 0.0 0.0 7.0 0.0 37.6 0 100 c9t37d0 > 180.5 0.0 8233.4 0.0 0.0 7.0 0.0 38.8 0 100 c9t38d0 > 175.5 0.0 9085.2 0.0 0.0 7.0 0.0 39.9 0 100 c9t39d0 > 177.0 0.0 9340.0 0.0 0.0 7.0 0.0 39.5 0 100 c9t40d0 > 175.5 0.0 8831.0 0.0 0.0 7.0 0.0 39.9 0 100 c9t41d0 > 190.5 0.0 9177.8 0.0 0.0 7.0 0.0 36.7 0 100 c9t42d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c9t43d0 > 196.0 0.0 9180.5 0.0 0.0 7.0 0.0 35.7 0 100 c9t44d0 > 193.5 0.0 9496.8 0.0 0.0 7.0 0.0 36.2 0 100 c9t45d0 > 187.0 0.0 8699.5 0.0 0.0 7.0 0.0 37.4 0 100 c9t46d0 > 198.5 0.0 9277.0 0.0 0.0 7.0 0.0 35.2 0 100 c9t47d0 > 185.5 0.0 9778.3 0.0 0.0 7.0 0.0 37.7 0 100 c9t48d0 > 192.0 0.0 8384.2 0.0 0.0 7.0 0.0 36.4 0 100 c9t49d0 > 198.5 0.0 8864.7 0.0 0.0 7.0 0.0 35.2 0 100 c9t50d0 > 192.0 0.0 9369.8 0.0 0.0 7.0 0.0 36.4 0 100 c9t51d0 > 182.5 0.0 8825.7 0.0 0.0 7.0 0.0 38.3 0 100 c9t52d0 > 202.0 0.0 7387.9 0.0 0.0 7.0 0.0 34.6 0 100 c9t55d0 > > ...and sure enough about 20 minutes into it I get this (bus reset?): > > scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/ > pci1000,30a0 at 0/sd at 34,0 (sd49): > incomplete read- retrying > scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/ > pci1000,30a0 at 0/sd at 21,0 (sd30): > incomplete read- retrying > scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,65fa at 4/ > pci1000,30a0 at 0/sd at 1e,0 (sd27): > incomplete read- retrying > scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 > (mpt0): > Rev. 8 LSI, Inc. 1068E found. > scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 > (mpt0): > mpt0 supports power management. > scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,65fa at 4/pci1000,30a0 at 0 > (mpt0): > mpt0: IOC Operational. > > During the "bus reset", iostat output looked like this: > > extended device statistics ---- > errors --- > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w > trn tot device > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t0d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t1d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t2d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c8t3d0 > 0.0 0.0 0.0 0.0 0.0 88.0 0.0 0.0 0 2200 0 > 3 0 3 c9 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t8d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t9d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t10d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t11d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t12d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t13d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t14d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t15d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t16d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t17d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t18d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t19d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t20d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t21d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t22d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t23d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t24d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t25d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t26d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t27d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t28d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t29d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 1 0 1 c9t30d0OK, here we see 4 I/Os pending outside of the host. The host has sent them on and is waiting for them to return. This means they are getting dropped either at the disk or somewhere between the disk and the controller. When this happens, the sd driver will time them out, try to clear the fault by reset, and retry. In other words, the resets you see are when the system tries to recover. Since there are many disks with 4 stuck I/Os, I would lean towards a common cause. What do these disks have in common? Firmware? Do they share a SAS expander? -- richard> 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t31d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t32d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 1 0 1 c9t33d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t34d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t35d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t36d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t37d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t38d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t39d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t40d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t41d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t42d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t43d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t44d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t45d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t46d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t47d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t48d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t49d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t50d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 0 0 0 c9t51d0 > 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0 100 0 > 1 0 1 c9t52d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 > 0 0 0 c9t55d0 > > During our previous testing, we had tried even setting this > max_pending value down to 1, but we still hit the problem (albeit it > took a little longer to hit it) and I couldn''t find anything else I > could set to throttle IO to the disk, hence the frustration. > > If you hadn''t seen this output, would you say that 7 was a > "reasonable" value for that max_pending queue for our architecture > and should give the LSI controller in this situation enough > breathing room to operate? If so, I *should* be able to scrub the > disks successfully (ZFS isn''t to blame) and therefore have to point > the finger at the mpt-driver/LSI-firmware/disk-firmware instead. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 10/24/09 9:43 AM, Richard Elling wrote:> OK, here we see 4 I/Os pending outside of the host. The host has > sent them on and is waiting for them to return. This means they are > getting dropped either at the disk or somewhere between the disk > and the controller. > > When this happens, the sd driver will time them out, try to clear > the fault by reset, and retry. In other words, the resets you see > are when the system tries to recover. > > Since there are many disks with 4 stuck I/Os, I would lean towards > a common cause. What do these disks have in common? Firmware? > Do they share a SAS expander?I saw this with my WD 500GB SATA disks (HDS725050KLA360) and LSI firmware 1.28.02.00 in IT mode, but I (almost?) always had exactly 1 "stuck" I/O. Note that my disks were one per channel, no expanders. I have _not_ seen it since replacing those disks. So my money is on a bug in the LSI firmware, the drive firmware, the drive controller hardware, or some combination thereof. Note that LSI has released firmware 1.29.00.00. Sadly I cannot find any documentation on what has changed. Downloadable from LSI at http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas3081e-r/index.html?remote=1&locale=EN -- Carson
On Sat, Oct 24, 2009 at 12:30 PM, Carson Gaspar <carson at taltos.org> wrote:> > I saw this with my WD 500GB SATA disks (HDS725050KLA360) and LSI firmware > 1.28.02.00 in IT mode, but I (almost?) always had exactly 1 "stuck" I/O. > Note that my disks were one per channel, no expanders. I have _not_ seen it > since replacing those disks. So my money is on a bug in the LSI firmware, > the drive firmware, the drive controller hardware, or some combination > thereof. > > Note that LSI has released firmware 1.29.00.00. Sadly I cannot find any > documentation on what has changed. Downloadable from LSI at > http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas3081e-r/index.html?remote=1&locale=EN > > -- > CarsonHere''s the closest I could find from some Intel release notes. It came from: ESRT2_IR_readme.txt and does mention the 1068e chipset, as well as that firmware rev. ===============Package Information ===============FW and OpROM Package for Native SAS mode, IT/IR mode and Intel(R) Embedded Server RAID Technology II Package version: 2009.10.06 FW Version = 01.29.00 (includes fixed firmware settings) BIOS (non-RAID) Version = 06.28.00 BIOS (SW RAID) Version = 08.09041155 Supported RAID modes: 0, 1, 1E, 10, 10E and 5 (activation key AXXRAKSW5 required for RAID 5 support) Supported Intel(R) Server Boards and Systems: - S5000PSLSASR, S5000XVNSASR, S5000VSASASR, S5000VCLSASR, S5000VSFSASR - SR1500ALSASR, SR1550ALSASR, SR2500ALLXR, S5000PALR (with SAS I/O Module) - S5000PSLROMBR (SROMBSAS18E) without HW RAID activation key AXXRAK18E installed (native SAS or SW RAID modes only) - for HW RAID mode separate package is available - NSC2U, TIGW1U Supported Intel(R) RAID controller (adapters): - SASMF8I, SASWT4I, SASUC8I Intel(R) SAS Entry RAID Module AXX4SASMOD, when inserted in below Intel(R) Server Boards and Systems: - S5520HC / S5520HCV, S5520SC,S5520UR,S5500WB ===============Known Restrictions ===============1. The sasflash versions within this package don''t support ESRTII controllers. 2. The sasflash utility for Windows and Linux version within this package only support Intel(R) IT/IR RAID controllers. The sasflash utility for Windows and Linux version within this package don''t support sasflash -o -e 6 command. 3. The sasflash utility for DOS version doesn''t support the Intel(R) Server Boards and Systems due to BIOS limitation. The DOS version sasflash might still be supported on 3rd party server boards which don''t have the BIOS limitation. 4. No PCI 3.0 support 5. No Foreign Configuration Resolution Support 6. No RAID migration Support 7. No mixed RAID mode support ever 8. No Stop On Error support ===============Known Bugs ===============(1) For Intel(R) chipset S5000P/S5000V/S5000X based server systems, please use the 32 bit, non-EBC version of sasflash which is SASFLASH_Ph17-1.22.00.00\sasflash_efi_bios32_rel\sasflash.efi, instead of the ebc version of sasflash which is in the top package directory and also in SASFLASH_Ph17-1.22.00.00\sasflash_efi_ebc_rel\sasflash.efi. The latter one may return a wrong sas address with a sasflash -list command in the listed systems. (2) LED behavior does not match between SES and SGPIO for some conditions (documentation in process). (3) When in EFI Optimized Boot mode, the task bar is not displayed in EFI_BSD after two volumes are created. (4) If a system is rebooted while a volume rebuild is in progress, the rebuild will start over from the beginning. ===============Fixes/Updates ===============Version 2009.10.06 1. Fixed - MP2 HDD fault LED stays on after rebuild completes 2. Fixed - System hangs if drive hot-unplugged during stress Version 2009.07.30 1. Fixed - SES over i2c for 106x products 2. Fixed - FW settings updated to support SES over i2c drive lights on FALSASMP2. Version 2009.06.15 1. Fixed - SES over I2C issue for 1078IR. 2. Updated - 1068e fw to fix SES over I2C on MP2 bug. 3. Updated - to provide NCQ queue depth of 32 (was 8) on 1064e and 1068e and 1078 internal-only controllers in IR and ESRT2 modes. 4. Updated - Firmware to enable SES over I2C on AXX4SASMOD. 5. Updated - Settings to provide better LED indicators for SGPIO. Version 2008.12.11 1. Fixed - Media can''t boot from SATA DVD in some systems in Software RAID (ESRT2) mode. 2. Fixed - Incorrect RAID 5 ECC error handling in Ctrl+M Version 2008.11.07 1. Added support for - Enable ICH10 support 2. Added support for - Software RAID5 to support ICH10R 3. Added support for - Single Drive RAID 0 (IS) Volume 4. Fixed - Resolved issue where user could not create a second volume immediately following the deletion of a second volume. 5. Fixed - Second hot spare status not shown when first hot spare is inactive/missing Version 2008.09.22 1. Fixed - SWR:During hot PD removal and then quick reboot, not updating the DDF correctly. Version 2008.06.16 1. Fixed - the issue with The LED functions are not working inside the OSes for SWR5 2. Fixed - the issue with (IR-Only) Volume rebuild fails after cold swap in IME volume with hotspare 3. Fixed - the issue with When a degraded RAID volume with a missing disk is rebooted, it may resync to a non-RAID disk. 4. Fixed - the issue with (IR-Only) Physical Disk firmware download commands are being rejected 5. Added support for (IR-Only) Allow the host to set the name of a RAID volume Version 2007.12.05 1. Fixed incorrect system-dependent firmware settings. This includes fix for the issue with HDD not being detected in Slot1 with SR2500ALLX system. 2. Added support for more than one SAS expander 3. Added graceful handling of situation when more than 8 HDDs are installed (the limit of 8 drives still exists). Version 2007.08.20 1. Updated readme files, added LSI adapters to package, added a new platform to supported list Version 2007.07.22 1. Fixed an issue with 1.6GHz processors when recognizing the SW RAID 5 Activation Key Version 2007.05.24 1. Fixed an issue preventing NCQ functionality on the new silicon spins Version 2007.01.18 1. Added pass thru support for up to 2 SATA CD/DVD devices when the controller is in SW RAID mode - boot from CD/DVD-ROM with floppy emulation or hard-drive emulation not supported -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091024/b765e3aa/attachment.html>
The controller connects to two disk shelves (expanders), one per port on the card. If you look back in the thread, you''ll see our zpool config has one vdev per shelf. All of the disks are Western Digital (model WD1002FBYS-18A6B0) 1TB 7.2K, firmware rev. 03.00C06. Without actually matching up the disks with "stuck" IOs, I am assuming they are all on the same vdev/shelf/controller port. I communicated with LSI support directly regarding the v1.29 firmware update, and here''s what they wrote back: "I have checked with our development team on this one. There are no release notes available as the functionality of the coding itself has not changed. This was a minor cleanup and the firmware was assigned a new phase number for these. There were no defects or added functionality in going from the P16 firmware to the P17 firmware." Also, regarding the NCQ depth on the drives I used the LSIUTIL in expert mode and used options 13/14 to dump the following settings (which are all default): Multi-pathing: [0=Disabled, 1=Enabled, default is 0] SATA Native Command Queuing: [0=Disabled, 1=Enabled, default is 1] SATA Write Caching: [0=Disabled, 1=Enabled, default is 1] SATA Maximum Queue Depth: [0 to 255, default is 32] Device Missing Report Delay: [0 to 2047, default is 0] Device Missing I/O Delay: [0 to 255, default is 0] Persistence: [0=Disabled, 1=Enabled, default is 1] Physical mapping: [0=None, 1=DirectAttach, 2=EnclosureSlot, default is 0] -- This message posted from opensolaris.org
So, while we are working on resolving this issue with Sun, let me approach this from the another perspective: what kind of controller/drive ratio would be the minimum recommended to support a functional OpenSolaris-based archival solution? Given the following: - the vast majority of IO to the system is going to be "read" oriented, other than the initial "load" of the archive shares and possibly scrubs/re-silvering in the case of failed drives - we currently have one LSISAS3801E with two external ports; each port connects to one 23-disk JBOD - Each JBOD has the ability to take in two external SAS connections if we enable the "split-backplane" option on it which would split the disk IO path between the two connectors (12 disks on one connector, 11 on the other); we do not currently have this enabled - our current server platform only has 1 x PCIe-x8 slot available; we *could* look at changing this in the future, but I''d prefer to find a one-card solution if possible Here is the math I did that shows the current IO situation (PLEASE correct this if I am mistaken, as I am somewhat "winging" it here and my head hurts) : Based on info from: http://storageadvisors.adaptec.com/2006/07/26/sas-drive-performance/ http://en.wikipedia.org/wiki/PCI_Express http://support.wdc.com/product/kb.asp?modelno=WD1002FBYS&x=9&y=8 WD1002FBYS 1TB SATA2 7200rpm drive specs Avg seek time = 8.9ms Avg latency = 4.2ms Max transfer speed = 112 MB/s Avg transfer speed ~= 65 MB/s "Random" IO scenario (theoretical numbers): 8.9ms avg seek time + 4.2ms avg latency = 13.1 ms avg access time 1/0.0131 = 76 IOPS/drive 22 (23 - 1 spare) drives x 76 IOPS/drive = 1672 IOPS/shelf 1672 IOPS/shelf x 2 = 3344 IOPS/controller -or- 22 (23 - 1 spare) drives x 65 MB/s/drive = 1430 MB/s/shelf 1430 MB/s/shelf x 2 = 2860 MB/s controller Pure "streamed read" IO scenario (theoretical numbers): 0.0 avg seek time + 4.2ms avg latency = 4.2 ms avg access time 1/0.0042 = 238 IOPS/drive 22 (23 - 1 spare) drives x 238 IOPS/drive = 5236 IOPS/shelf 5236 IOPS/shelf x 2 = 10472 IOPS/controller -or- 22 (23 - 1 spare) drives x 112 MB/s/drive = 2464 MB/s/shelf 2464 MB/s/shelf x 2 = 4928 MB/s controller Max. bandwith of single SAS PHY interface = 270MB/s per port (300MB/s - overhead) LSISAS3801E has 2 x 4-port SAS connections. Each shelf gets a 4-port connection, so: Max controller bandwidth/shelf = 4 x 270 MB/s = 1080 MB/s Max controller bandwidth = 2 x 1080 MB/s = 2160 MB/s Max. bandwidth of PCIe x8 interface = 2GB/s Typical sustained bandwidth of PCIe x8 interface (max - 5% overhead)1.9GB/s Summary: Current controller cannot handle max IO load of even random IO scenario (1430 MB/s per shelf needed, controller can only handle 1080 MB/s per shelf). Also, PCIe bus can''t push more than 1.9 GB/s sustained over a single slot, so we are limited by the single card. Solution: Connecting 2 x 4-port SAS connectors to one shelf (i.e. enabling split-mode) would get us 2160 MB/s / shelf. This would allow us to remove the controller as a bottleneck for all but the extreme cached read scenario, but the PCIe bus would still throttle us to 1.9 GB/s per slot. So, the controller could keep up with the shelves, but the PCIe bus would have to wait sometimes which may (?) be a "healthier" situation than overwhelming the controller. To support two shelves per controller, we could use a LSISAS31601E (4 x 4-port SAS connectors) but we would hit the PCIe bus limitation again. Moving to two (or more?) separate PCIe-x8 cards would be best, but we require us to alter our server platform. Whew. Thoughts? Comments? Suggestions? -- This message posted from opensolaris.org
I''m having similar issues, with two AOC-USAS-L8i Supermicro 1068e cards mpt2 and mpt3, running 1.26.00.00IT It seems to only affect a specific revision of disk. (???) sd67 Soft Errors: 0 Hard Errors: 127 Transport Errors: 3416 Vendor: ATA Product: WDC WD10EACS-00D Revision: 1A01 Serial No: Size: 1000.20GB <1000204886016 bytes> sd58 Soft Errors: 0 Hard Errors: 83 Transport Errors: 2087 Vendor: ATA Product: WDC WD10EACS-00D Revision: 1A01 Serial No: Size: 1000.20GB <1000204886016 bytes> There are 8 other disks on the two controllers: 6xWDC WD10EACS-00Z Revision: 1B01 (no errors) 2xSAMSUNG HD103UJ Revision: 1113 (no errors) The two EACS-00D disks are in seperate enclosures with new SAS->SATA fanout cables. Example error messages: Oct 27 14:26:05 fleet scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/ pci1002,5978 at 2/pci15d9,a580 at 0 (mpt2): Oct 27 14:26:05 fleet wwn for target has changed Oct 27 14:25:56 fleet scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/ pci1002,5979 at 3/pci15d9,a580 at 0 (mpt3): Oct 27 14:25:56 fleet wwn for target has changed Oct 27 14:25:57 fleet scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/ pci1002,5978 at 2/pci15d9,a580 at 0 (mpt2): Oct 27 14:25:57 fleet mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110d00 Oct 27 14:25:48 fleet scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/ pci1002,5979 at 3/pci15d9,a580 at 0 (mpt3): Oct 27 14:25:48 fleet mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110d00 Oct 27 14:26:01 fleet scsi: [ID 365881 kern.info] /pci at 0,0/ pci1002,5978 at 2/pci15d9,a580 at 0 (mpt2): Oct 27 14:26:01 fleet Log info 0x31110d00 received for target 1. Oct 27 14:26:01 fleet scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Oct 27 14:25:51 fleet scsi: [ID 365881 kern.info] /pci at 0,0/ pci1002,5979 at 3/pci15d9,a580 at 0 (mpt3): Oct 27 14:25:51 fleet Log info 0x31120403 received for target 2. Oct 27 14:25:51 fleet scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc On 22/10/2009, at 10:40 PM, Bruno Sousa wrote:> Hi all, > > Recently i upgrade from snv_118 to snv_125, and suddently i started > to see this messages at /var/adm/messages : > > Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: / > pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): > Oct 22 12:54:37 SAN02 mpt_handle_event: IOCStatus=0x8000, > IOCLogInfo=0x3112011a > Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: / > pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): > Oct 22 12:56:47 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, > IOCLogInfo=0x3112011a > Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: / > pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): > Oct 22 12:56:47 SAN02 mpt_handle_event: IOCStatus=0x8000, > IOCLogInfo=0x3112011a > Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: / > pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): > Oct 22 12:56:50 SAN02 mpt_handle_event_sync: IOCStatus=0x8000, > IOCLogInfo=0x3112011a > Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: / > pci at 0,0/pci10de,376 at a/pci1000,30a0 at 0 (mpt0): > Oct 22 12:56:50 SAN02 mpt_handle_event: IOCStatus=0x8000, > IOCLogInfo=0x3112011a > > > Is this a symptom of a disk error or some change was made in the > driver?,that now i have more information, where in the past such > information didn''t appear? > > Thanks, > Bruno > > I''m using a LSI Logic SAS1068E B3 and i within lsiutil i have this > behaviour : > > > 1 MPT Port found > > Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev > IOC > 1. mpt0 LSI Logic SAS1068E B3 105 > 011a0000 0 > > Select a device: [1-1 or 0 to quit] 1 > > 1. Identify firmware, BIOS, and/or FCode > 2. Download firmware (update the FLASH) > 4. Download/erase BIOS and/or FCode (update the FLASH) > 8. Scan for devices > 10. Change IOC settings (interrupt coalescing) > 13. Change SAS IO Unit settings > 16. Display attached devices > 20. Diagnostics > 21. RAID actions > 22. Reset bus > 23. Reset target > 42. Display operating system names for devices > 45. Concatenate SAS firmware and NVDATA files > 59. Dump PCI config space > 60. Show non-default settings > 61. Restore default settings > 66. Show SAS discovery errors > 69. Show board manufacturing information > 97. Reset SAS link, HARD RESET > 98. Reset SAS link > 99. Reset port > e Enable expert mode in menus > p Enable paged mode > w Enable logging > > Main menu, select an option: [1-99 or e/p/w or 0 to quit] 20 > > 1. Inquiry Test > 2. WriteBuffer/ReadBuffer/Compare Test > 3. Read Test > 4. Write/Read/Compare Test > 8. Read Capacity / Read Block Limits Test > 12. Display phy counters > 13. Clear phy counters > 14. SATA SMART Read Test > 15. SEP (SCSI Enclosure Processor) Test > 18. Report LUNs Test > 19. Drive firmware download > 20. Expander firmware download > 21. Read Logical Blocks > 99. Reset port > e Enable expert mode in menus > p Enable paged mode > w Enable logging > > Diagnostics menu, select an option: [1-99 or e/p/w or 0 to quit] 12 > > Adapter Phy 0: Link Down, No Errors > > Adapter Phy 1: Link Down, No Errors > > Adapter Phy 2: Link Down, No Errors > > Adapter Phy 3: Link Down, No Errors > > Adapter Phy 4: Link Up, No Errors > > Adapter Phy 5: Link Up, No Errors > > Adapter Phy 6: Link Up, No Errors > > Adapter Phy 7: Link Up, No Errors > > Expander (Handle 0009) Phy 0: Link Up > Invalid DWord Count 79,967,229 > Running Disparity Error Count 63,036,893 > Loss of DWord Synch Count 113 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 1: Link Up > Invalid DWord Count 79,967,207 > Running Disparity Error Count 78,339,626 > Loss of DWord Synch Count 113 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 2: Link Up > Invalid DWord Count 76,717,646 > Running Disparity Error Count 73,334,563 > Loss of DWord Synch Count 113 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 3: Link Up > Invalid DWord Count 79,896,409 > Running Disparity Error Count 76,199,329 > Loss of DWord Synch Count 113 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 4: Link Up, No Errors > > Expander (Handle 0009) Phy 5: Link Up, No Errors > > Expander (Handle 0009) Phy 6: Link Up, No Errors > > Expander (Handle 0009) Phy 7: Link Up, No Errors > > Expander (Handle 0009) Phy 8: Link Up, No Errors > > Expander (Handle 0009) Phy 9: Link Up, No Errors > > Expander (Handle 0009) Phy 10: Link Up, No Errors > > Expander (Handle 0009) Phy 11: Link Up, No Errors > > Expander (Handle 0009) Phy 12: Link Up, No Errors > > Expander (Handle 0009) Phy 13: Link Up, No Errors > > Expander (Handle 0009) Phy 14: Link Up, No Errors > > Expander (Handle 0009) Phy 15: Link Up, No Errors > > Expander (Handle 0009) Phy 16: Link Up, No Errors > > Expander (Handle 0009) Phy 17: Link Up, No Errors > > Expander (Handle 0009) Phy 18: Link Up, No Errors > > Expander (Handle 0009) Phy 19: Link Up, No Errors > > Expander (Handle 0009) Phy 20: Link Down, No Errors > > Expander (Handle 0009) Phy 21: Link Down, No Errors > > Expander (Handle 0009) Phy 22: Link Up > Invalid DWord Count 743,980 > Running Disparity Error Count 38,796 > Loss of DWord Synch Count 1 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 23: Link Down, No Errors > > Expander (Handle 0009) Phy 24: Link Down, No Errors > > Expander (Handle 0009) Phy 25: Link Down > Invalid DWord Count 1,755 > Running Disparity Error Count 408 > Loss of DWord Synch Count 0 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 26: Link Down > Invalid DWord Count 1,127 > Running Disparity Error Count 1,022 > Loss of DWord Synch Count 0 > Phy Reset Problem Count 0 > > Expander (Handle 0009) Phy 27: Link Down, No Errors > > Expander (Handle 0009) Phy 28: Link Down, No Errors > > Expander (Handle 0009) Phy 29: Link Down, No Errors > > Expander (Handle 0009) Phy 30: Link Down, No Errors > > Expander (Handle 0009) Phy 31: Link Down, No Errors > > Expander (Handle 0009) Phy 32: Link Down, No Errors > > Expander (Handle 0009) Phy 33: Link Down, No Errors > > Expander (Handle 0009) Phy 34: Link Down, No Errors > > Expander (Handle 0009) Phy 35: Link Down, No Errors > > Expander (Handle 0009) Phy 36: Link Up, No Errors > > Expander (Handle 0009) Phy 37: Link Down, No Errors > > > Main menu, select an option: [1-99 or e/p/w or 0 to quit] 42 > > mpt0 is /dev/cfg/c5 > > B___T___L Type Operating System Device Name > ScsiIo to Bus 0 Target 8 failed, IOCStatus = 004b (IOC Terminated) > 0 10 0 Disk /dev/rdsk/c5t10d0s2 > 0 11 0 Disk /dev/rdsk/c5t11d0s2 > 0 12 0 Disk /dev/rdsk/c5t12d0s2 > 0 13 0 Disk /dev/rdsk/c5t13d0s2 > 0 14 0 Disk /dev/rdsk/c5t14d0s2 > 0 15 0 Disk /dev/rdsk/c5t15d0s2 > 0 16 0 Disk /dev/rdsk/c5t16d0s2 > 0 17 0 Disk /dev/rdsk/c5t17d0s2 > 0 18 0 Disk /dev/rdsk/c5t18d0s2 > 0 19 0 Disk /dev/rdsk/c5t19d0s2 > 0 20 0 Disk /dev/rdsk/c5t20d0s2 > 0 21 0 Disk /dev/rdsk/c5t21d0s2 > 0 22 0 Disk /dev/rdsk/c5t22d0s2 > 0 23 0 Disk /dev/rdsk/c5t23d0s2 > 0 24 0 Disk /dev/rdsk/c5t24d0s2 > 0 25 0 Disk /dev/rdsk/c5t25d0s2 > 0 26 0 Disk /dev/rdsk/c5t26d0s2 > > > The iostat -En gives : > > iostat -En > c4t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: SEAGATE ST32500N Revision: 3AZQ Serial No: > Size: 250.06GB <250056000000 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 281 Predictive Failure Analysis: 0 > c4t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: SEAGATE ST32502N Revision: SU0D Serial No: > Size: 250.06GB <250056000000 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 285 Predictive Failure Analysis: 0 > c3t0d0 Soft Errors: 0 Hard Errors: 9 Transport Errors: 0 > Vendor: TEAC Product: DV-28E-V Revision: 1.AC Serial No: > Size: 0.00GB <0 bytes> > Media Error: 0 Device Not Ready: 9 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c5t10d0 Soft Errors: 18 Hard Errors: 1 Transport Errors: 9 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 18 > Illegal Request: 8 Predictive Failure Analysis: 0 > c5t11d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 > Illegal Request: 8 Predictive Failure Analysis: 0 > c5t12d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 > Illegal Request: 8 Predictive Failure Analysis: 0 > c5t13d0 Soft Errors: 18 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 18 > Illegal Request: 8 Predictive Failure Analysis: 0 > c5t14d0 Soft Errors: 16 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 16 > Illegal Request: 8 Predictive Failure Analysis: 0 > c5t15d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t16d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t17d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t18d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t19d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t20d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t21d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t22d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t23d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t24d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t25d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > c5t26d0 Soft Errors: 12 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: ST31500341AS Revision: CC1H Serial No: > Size: 1500.30GB <1500301910016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 12 > Illegal Request: 6 Predictive Failure Analysis: 0 > > > > Thank you, > Bruno > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
I am also running 2 of the Supermicro cards. I just upgraded to b126 and it seems improved. I am running a large file copy locally. I get these warnings in the dmesg log. When I do, I/O seems to stall for about 60sec. It comes back up fine, but it''s very annoying. Any hints? I have 4 disks per controller right now, different brands, sizes, everything. New SATA fanout cables and no expanders. The drives on mpt0 and mpt1 are completely different, 4x400GB Seagate drives, 4x1.5TB Samsung drives. I get the problem from both controllers. I didn''t notice this till about b124. I can reproduce it with rsync copying files locally between ZFS filesystems and with --bwlimit=10000 (10MB/sec). Keeping the limit low does seem to help. --------------- Oct 31 23:05:32 nas scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci10de,778 at 10/pci10de,5b1 at 0/pci10de,5b1 at 3/pci15d9,a580 at 0 (mpt1): Oct 31 23:05:32 nas Disconnected command timeout for Target 7 Oct 31 23:09:42 nas scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci10de,778 at 10/pci10de,5b1 at 0/pci10de,5b1 at 2/pci15d9,a580 at 0 (mpt0): Oct 31 23:09:42 nas Disconnected command timeout for Target 1 Oct 31 23:16:23 nas scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci10de,778 at 10/pci10de,5b1 at 0/pci10de,5b1 at 2/pci15d9,a580 at 0 (mpt0): Oct 31 23:16:23 nas Disconnected command timeout for Target 3 Oct 31 23:18:43 nas scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci10de,778 at 10/pci10de,5b1 at 0/pci10de,5b1 at 3/pci15d9,a580 at 0 (mpt1): Oct 31 23:18:43 nas Disconnected command timeout for Target 6 Oct 31 23:27:24 nas scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci10de,778 at 10/pci10de,5b1 at 0/pci10de,5b1 at 3/pci15d9,a580 at 0 (mpt1): Oct 31 23:27:24 nas Disconnected command timeout for Target 7 -- This message posted from opensolaris.org
We see the same issue on a x4540 Thor system with 500G disks: lots of: ... Nov 3 16:41:46 ....uva.nl scsi: [ID 107833 kern.warning] WARNING: /pci at 3c,0/pci10de,376 at f/pci1000,1000 at 0 (mpt5): Nov 3 16:41:46 encore.science.uva.nl Disconnected command timeout for Target 7 ... This system is running nv125 XvM. Seems to occur more when we are using vm-s. This of course causes very long interruptions on the vm-s as well... -- This message posted from opensolaris.org
It''s easy to reproduce for me under a VM. Create a zvol "zfs create -V 500G tank/test" then connect it to a VM with "virsh attach-disk". Even just formatting with ext4 from an Ubuntu 9.10 will cause the lockup for me. The errors seem to occur more frequently with large files. I can also reproduce it over NFS and from an Opensolaris zone. I''m running nv126 XvM right now. I haven''t tried it without XvM. -- This message posted from opensolaris.org
> I''m running nv126 XvM right now. I haven''t tried it > without XvM.Without XvM we do not see these issues. We''re running the VMs through NFS now (using ESXi)... -- This message posted from opensolaris.org
> > I''m running nv126 XvM right now. I haven''t tried > it > > without XvM. > > Without XvM we do not see these issues. We''re running > the VMs through NFS now (using ESXi)...Interesting. It sounds like it might be an XvM specific bug. I''m glad I mentioned that in my bug report to Sun. Hopefully they can duplicate it. I''d like to stick with XvM as I''ve spent a fair amount of time getting things working well under it. How did your migration to ESXi go? Are you using it on the same hardware or did you just switch that server to an NFS server and run the VMs on another box? -- This message posted from opensolaris.org
Travis Tabbal wrote:>>> I''m running nv126 XvM right now. I haven''t tried >> it >>> without XvM. >> Without XvM we do not see these issues. We''re running >> the VMs through NFS now (using ESXi)... > > Interesting. It sounds like it might be an XvM specific bug. I''m glad I mentioned that in my bug report to Sun. Hopefully they can duplicate it. I''d like to stick with XvM as I''ve spent a fair amount of time getting things working well under it. > > How did your migration to ESXi go? Are you using it on the same hardware or did you just switch that server to an NFS server and run the VMs on another box?Hi Travis, your bug showed up - it''s 6900767. Since bugs.opensolaris.org isn''t a "live" system, you won''t be able to see it at http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6900767 until tomorrow. cheers, James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
>How did your migration to ESXi go? Are you using it on the same hardware or did you just switch that server to an NFS server and run the VMs on another box?The latter, we run these VMs over NFS anyway and had ESXi boxes under test already. we were already separating "data" exports from "VM" exports. We use an in-house developed configuration management/bare metal system which allows us to install new machines pretty easily. In this case we just provisioned the ESXi VMs to new "VM" exports on the Thor whilst re-using the data-exports as they were... Works pretty well, although the Sun x1027A 10G NICs aren''t yet supported under ESXi 4... -- This message posted from opensolaris.org
> The latter, we run these VMs over NFS anyway and had > ESXi boxes under test already. we were already > separating "data" exports from "VM" exports. We use > an in-house developed configuration management/bare > metal system which allows us to install new machines > pretty easily. In this case we just provisioned the > ESXi VMs to new "VM" exports on the Thor whilst > re-using the data-exports as they were...Thanks for the info. Unfortunately, I need this box to do double duty and run the VMs as well. The hardware is capable, this issue with XvM and/or the mpt driver just needs to get fixed. Other than that, things are running great with this server. -- This message posted from opensolaris.org