Michael Stalnaker
2008-Jan-12 19:47 UTC
[zfs-discuss] Disk array problems - any suggestions?
All; I have a 24-disk SATA array attached to an HP DL160 with a LSI 3801E for the controller. We''ve been seeing errors that look like: WARNING: /pci at 0,0/pci8086,25f7 at 2/pci8086,350c at 0,3/pci1000,30e0 at 2 (mpt0); Disconnected command timeout for Target 23 WARNING: /pci at 0,0/pci8086,25f7 at 2/pci8086,350c at 0,3/pci1000,30e0 at 2 (mpt0); Disconnected command timeout for Target 23 SCSI transport failed: reason ''reset'': giving up WARNING: /pci at 0,0/pci8086,25f7 at 2/pci8086,350c at 0,3/pci1000,30e0 at 2 (mpt0); Disconnected command timeout for Target 23 WARNING: /pci at 0,0/pci8086,25f7 at 2/pci8086,350c at 0,3/pci1000,30e0 at 2 (mpt0); Disconnected command timeout for Target 23 When these occur, the system hangs on any access to the array and never recovers. After some discussions with some folks at Sun, I rebuilt the system from Solaris 10 x 86 Update 4 to run Open Solaris. It''s currently on Solaris Express (Nevada) build 78, and these errors are continuing. The drives are the 750g hitachis, and after power cycle and reboot, the error does not persist on one drive. Each of the drives is in a carrier with some active electronics to adapt the SATA drives for SAS use. My fear at the moment is that there''s some sort of problem with the 24 drive enclosure itself as the drives appear to be fine, and I cannot believe we''re seeing an intermittent failure across a number of drives. Any suggestions would be appreciated. --Mike Stalnaker