Hi. I have a nexsan atabeast with 2 raid-controllers each with 21 disks @ 400 GB. Each raid-controller has five raid-5 LUN''s and one hotspare. The solaris is from 2006/11. I have created a single raidz2 tank from the ten LUN''s. The raid-conroller is connected to a Dell PE 2650 with two qlogic 2310 hba''s, frame-size is 1024, speed is 2 GHz. The hba''s are connected directly to the storage, i.e. no fiber-switch. The atabeast has three powersupplies connected to an UPS, so I doubt it''s a power-related issue. The server does *not* reboot during these messages. How can I investigate this issue further? zetta~%>uname -a SunOS zetta 5.10 Generic_118855-33 i86pc i386 i86pc regards Claus Mar 26 13:12:37 zetta Error for Command: write(10) Error Level: Retryable Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] Requested Block: 470140414 Error Block: 0 Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] Vendor: NEXSAN Serial Number: 134D98CD:%^, Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] Sense Key: Unit Attention Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,3 (sd140): Mar 26 13:12:37 zetta Error for Command: write(10) Error Level: Retryable Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] Requested Block: 470140414 Error Block: 0 Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] Vendor: NEXSAN Serial Number: 134D98B3:%^, Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] Sense Key: Unit Attention Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,3 (sd140): Mar 26 13:12:37 zetta incomplete write- retrying Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,2 (sd141): Mar 26 13:12:37 zetta incomplete write- retrying Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,1 (sd30): Mar 26 13:12:37 zetta incomplete write- retrying Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,0 (sd4): Mar 26 13:12:37 zetta incomplete write- retrying Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,3 (sd140): Mar 26 13:12:37 zetta incomplete write- retrying Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,2 (sd141): Mar 26 13:12:37 zetta incomplete write- retrying Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,1 (sd30): Mar 26 13:12:37 zetta incomplete write- retrying Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,0 (sd4): Mar 26 13:12:37 zetta incomplete write- retrying Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,3 (sd140): Mar 26 13:12:37 zetta incomplete write- retrying Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,2 (sd141): Mar 26 13:12:37 zetta incomplete write- retrying Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,1 (sd30): Mar 26 13:12:37 zetta incomplete write- retrying Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,0 (sd4): Mar 26 13:12:37 zetta incomplete write- retrying Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,3 (sd140): Mar 26 13:12:37 zetta incomplete write- giving up Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,2 (sd141): Mar 26 13:12:37 zetta incomplete write- giving up Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,1 (sd30): Mar 26 13:12:37 zetta incomplete write- giving up Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,0 (sd4): Mar 26 13:12:37 zetta incomplete write- giving up
Claus Guttesen wrote:> Hi. > > I have a nexsan atabeast with 2 raid-controllers each with 21 disks @ > 400 GB. > Each raid-controller has five raid-5 LUN''s and one hotspare. The > solaris is from 2006/11. I have created a single raidz2 tank from the > ten LUN''s. > > The raid-conroller is connected to a Dell PE 2650 with two qlogic 2310 > hba''s, frame-size is 1024, speed is 2 GHz. The hba''s are connected > directly to the storage, i.e. no fiber-switch. The atabeast has three > powersupplies connected to an UPS, so I doubt it''s a power-related > issue. > > The server does *not* reboot during these messages. How can I > investigate this issue further?It appears that the problems are on the NEXSAN side. I''m not familiar with that product, but does it have a systems management interface? If so, that is the next place to look. Additional comments below...> zetta~%>uname -a > SunOS zetta 5.10 Generic_118855-33 i86pc i386 i86pc > > regards > Claus > > Mar 26 13:12:37 zetta Error for Command: write(10) > Error Level: Retryable > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] Requested > Block: 470140414 Error Block: 0 > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] Vendor: NEXSAN > Serial Number: 134D98CD:%^, > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] Sense Key: > Unit Attention > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] ASC: 0x29 > (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0The host initiated a write and the NEXSAN responded with ASC:0x29/ASCQ:0x0 This means that the NEXSAN saw something, though it is unclear whether the error was in the path between the host and the NEXSAN. It is retryable, so the host retries...> Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,3 (sd140): > Mar 26 13:12:37 zetta Error for Command: write(10) > Error Level: Retryable > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] Requested > Block: 470140414 Error Block: 0 > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] Vendor: NEXSAN > Serial Number: 134D98B3:%^, > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] Sense Key: > Unit Attention > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.notice] ASC: 0x29 > (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,3 (sd140): > Mar 26 13:12:37 zetta incomplete write- retrying... here is the first retry ...> Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,2 (sd141): > Mar 26 13:12:37 zetta incomplete write- retrying > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,1 (sd30): > Mar 26 13:12:37 zetta incomplete write- retrying > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,0 (sd4): > Mar 26 13:12:37 zetta incomplete write- retrying > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,3 (sd140): > Mar 26 13:12:37 zetta incomplete write- retrying... second retry ...> Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,2 (sd141): > Mar 26 13:12:37 zetta incomplete write- retrying > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,1 (sd30): > Mar 26 13:12:37 zetta incomplete write- retrying > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,0 (sd4): > Mar 26 13:12:37 zetta incomplete write- retrying > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,3 (sd140): > Mar 26 13:12:37 zetta incomplete write- retrying... third retry ...> Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,2 (sd141): > Mar 26 13:12:37 zetta incomplete write- retrying > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,1 (sd30): > Mar 26 13:12:37 zetta incomplete write- retrying > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,0 (sd4): > Mar 26 13:12:37 zetta incomplete write- retrying > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,3 (sd140): > Mar 26 13:12:37 zetta incomplete write- giving up... after 3 retries, we give up. This is the default number of retries for most FC connections. -- richard> Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,2 (sd141): > Mar 26 13:12:37 zetta incomplete write- giving up > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,1 (sd30): > Mar 26 13:12:37 zetta incomplete write- giving up > Mar 26 13:12:37 zetta scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci8086,3599 at 6/pci8086,32a at 0,2/pci1077,9 at 2/sd at 0,0 (sd4): > Mar 26 13:12:37 zetta incomplete write- giving up > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Try throttling back the max # of IOs. I saw a number of errors similar to this on Pillar and EMC. In /etc/system, set: set sd:sd_max_throttle=20 and reboot. This message posted from opensolaris.org
On HDS arrays we set sd_max_throttle to 8. gino This message posted from opensolaris.org
Richard Elling
2007-Mar-28 20:16 UTC
[zfs-discuss] Re: error-message from a nexsan-storage
Gino Ruopolo wrote:> On HDS arrays we set sd_max_throttle to 8.HDS provides an algorithm for estimating sd[d]_max_throttle in their planning docs. It will vary based on a number of different parameters. AFAIK, EMC just sets it to 20. -- richard
The thought is to start throttling and possibly tune up or down, depending on errors or lack of errors. I don''t know of a specific NexSAN throttle preference (we use SATABoy, and go with 20). This message posted from opensolaris.org
Gino Ruopolo
2007-Mar-28 20:25 UTC
[zfs-discuss] Re: Re: error-message from a nexsan-storage
> Gino Ruopolo wrote: > > On HDS arrays we set sd_max_throttle to 8. > > HDS provides an algorithm for estimating > sd[d]_max_throttle in their > planning docs. It will vary based on a number of > different parameters. > AFAIK, EMC just sets it to 20. > -- richardyou''re right but after -a lot- of tests we found 8 to be the best value.. gino This message posted from opensolaris.org
Richard Elling
2007-Mar-28 21:35 UTC
[zfs-discuss] Re: error-message from a nexsan-storage
JS wrote:> The thought is to start throttling and possibly tune up or down, depending > on errors or lack of errors. I don''t know of a specific NexSAN throttle > preference (we use SATABoy, and go with 20).One guess is as good as another :-) The default is 256, so even with 20 you are a long way from the ceiling. Note that ZFS will attempt to issue 35 concurrent iops to each vdev, so you might also see a difference based on how many vdevs you have. This could cause some confusion... -- richard
For the particular HDS array you''re working on, or also on NexSAN storage? This message posted from opensolaris.org
Claus Guttesen
2007-Mar-29 09:04 UTC
[zfs-discuss] Re: error-message from a nexsan-storage
> Try throttling back the max # of IOs. I saw a number of errors similar to this on Pillar and EMC. > > In /etc/system, set: > set sd:sd_max_throttle=20 > and reboot.I have added the setting and rebooted. I''m doing the same tests now and will know in a day or so if I can avoid the error (from the nexsan-storage). Thank you for the tip. Coming from a FreeBSD background getting recommendations like these is very valuable :-) regards Claus
Gino Ruopolo
2007-Mar-29 10:15 UTC
[zfs-discuss] Re: Re: error-message from a nexsan-storage
Unfortunately we don''t have experience with NexSAN. HDS are quite conservative and with a value of 8 we run quite stable (with UFS). Also we found that value appropiate for old HP EMA arrays (old units but very very reliable! Digital products were rocks) gino This message posted from opensolaris.org
Claus Guttesen
2007-Mar-30 07:48 UTC
[zfs-discuss] Re: error-message from a nexsan-storage
> > Try throttling back the max # of IOs. I saw a number of errors similar to this on Pillar and EMC. > > > > In /etc/system, set: > > set sd:sd_max_throttle=20 > > and reboot. > > I have added the setting and rebooted. I''m doing the same tests now > and will know in a day or so if I can avoid the error (from the > nexsan-storage). > > Thank you for the tip. Coming from a FreeBSD background getting > recommendations like these is very valuable :-)Works like a charm :-) regards Claus