Jeff Royle
2007-Jan-19 14:43 UTC
6.2 Release - Adaptec 2130SLP driver?? issue - aac driver
I could use some advice on this issue I have had with my raid controller. I am not really running much on the system yet, postfix, Pf + pflogd, rlogind, ssh, bsnmp and ntpd. While I was just reading a file with less the system stopped responding. I thought it was the network interfaces but I was able to ping the interface. Once I plugged a monitor into the system I saw this (roughly): AAC0: COMMAND <SOME HEX> TIMEOUT AFTER X number of seconds Not good :) Reset of the system resolved the issue and it booted fine. Since the controller stopped responding nothing was recorded to my logs. Now I have to figure out how to prevent that from happening again. Basic run down on the system and some history... P4 3.2Ghz Asus P5MT-S MB 2 x 1GB DDR2 667 memory Adaptec 2130SLP Raid Controller + battery backup module 2 Segate Ultra320 73GB 15k RPM (mirrored) I have run this same system hardware testing 6.2-BETA3, RC-1 and RC-2 without this issue. I was using the driver released by Adaptec while testing the pre-release installs (http://www.adaptec.com/en-US/speed/raid/aac/unix/aacraid_freebsd6_drv_b11518_tgz.htm). You could say I am fairly confidient in the hardware itself. I have put this system through a lot of testing since BETA3. The 6.2 release kernel has not been customized all that much, I just pulled out all the drivers I would never use. To be safe I kept just about all scsi devices/card models still in as I continued my testing of 6.2 release. Right now I am going to try taking out aac and aacp then try the driver I used in my previous tests. However, since I have run a week without this issue it will be hard/impossible tell if this did anything to resolve it...I almost want a crash on the old driver :) So I need some advice... How best do I debug this issue? Thanks in advance for any direction you guys can offer me. Cheers, Jeff
Jeff Royle
2007-Jan-19 16:45 UTC
6.2 Release - Adaptec 2130SLP driver?? issue - aac driver
Jeff Royle wrote:> I could use some advice on this issue I have had with my raid controller. > I am not really running much on the system yet, postfix, Pf + pflogd, > rlogind, ssh, bsnmp and ntpd. While I was just reading a file with > less the system stopped responding. I thought it was the network > interfaces but I was able to ping the interface. > Once I plugged a monitor into the system I saw this (roughly): > > AAC0: COMMAND <SOME HEX> TIMEOUT AFTER X number of seconds > > Not good :) > > Reset of the system resolved the issue and it booted fine. Since > the controller stopped responding nothing was recorded to my logs. > > Now I have to figure out how to prevent that from happening again. > > Basic run down on the system and some history... > > P4 3.2Ghz > Asus P5MT-S MB > 2 x 1GB DDR2 667 memory > Adaptec 2130SLP Raid Controller + battery backup module > 2 Segate Ultra320 73GB 15k RPM (mirrored) > > I have run this same system hardware testing 6.2-BETA3, RC-1 and RC-2 > without this issue. I was using the driver released by Adaptec > while testing the pre-release installs > (http://www.adaptec.com/en-US/speed/raid/aac/unix/aacraid_freebsd6_drv_b11518_tgz.htm). > You could say I am fairly confidient in the hardware itself. I have > put this system through a lot of testing since BETA3. > > The 6.2 release kernel has not been customized all that much, I just > pulled out all the drivers I would never use. To be safe I kept > just about all scsi devices/card models still in as I continued my > testing of 6.2 release. > Right now I am going to try taking out aac and aacp then try the > driver I used in my previous tests. However, since I have run a > week without this issue it will be hard/impossible tell if this did > anything to resolve it...I almost want a crash on the old driver :) > > So I need some advice... How best do I debug this issue? > > Thanks in advance for any direction you guys can offer me. > > Cheers, > > Jeff > >It appears the driver I was using in my pre-release testing is newer then the release driver. Stock driver in 6.2r dmesg: aac0: <Adaptec SCSI RAID 2130S> mem 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2 aac0: New comm. interface enabled aac0: Adaptec Raid Controller 2.0.0-1 aacp0: <SCSI Passthrough Bus> on aac0 Currently using: aacu0: <Adaptec SCSI RAID 2130S> mem 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2 aacu0: New comm. interface enabled aacu0: Adaptec Raid Controller 2.0.7-1 aacpu0: <SCSI Passthrough Bus> on aacu0 Going to continue testing with the newer driver. Cheers, Jeff
Jeff Royle wrote:> Jeff Royle wrote: >> I could use some advice on this issue I have had with my raid controller. >> I am not really running much on the system yet, postfix, Pf + pflogd, >> rlogind, ssh, bsnmp and ntpd. While I was just reading a file with >> less the system stopped responding. I thought it was the network >> interfaces but I was able to ping the interface. Once I plugged a >> monitor into the system I saw this (roughly): >> >> AAC0: COMMAND <SOME HEX> TIMEOUT AFTER X number of seconds >> >> Not good :) >> >> Reset of the system resolved the issue and it booted fine. Since >> the controller stopped responding nothing was recorded to my logs. >> >> Now I have to figure out how to prevent that from happening again. >> >> Basic run down on the system and some history... >> >> P4 3.2Ghz >> Asus P5MT-S MB >> 2 x 1GB DDR2 667 memory >> Adaptec 2130SLP Raid Controller + battery backup module >> 2 Segate Ultra320 73GB 15k RPM (mirrored) >> >> I have run this same system hardware testing 6.2-BETA3, RC-1 and RC-2 >> without this issue. I was using the driver released by Adaptec >> while testing the pre-release installs >> (http://www.adaptec.com/en-US/speed/raid/aac/unix/aacraid_freebsd6_drv_b11518_tgz.htm). >> You could say I am fairly confidient in the hardware itself. I have >> put this system through a lot of testing since BETA3. >> >> The 6.2 release kernel has not been customized all that much, I just >> pulled out all the drivers I would never use. To be safe I kept >> just about all scsi devices/card models still in as I continued my >> testing of 6.2 release. Right now I am going to try taking out aac and >> aacp then try the driver I used in my previous tests. However, >> since I have run a week without this issue it will be hard/impossible >> tell if this did anything to resolve it...I almost want a crash on the >> old driver :) >> >> So I need some advice... How best do I debug this issue? >> >> Thanks in advance for any direction you guys can offer me. >> >> Cheers, >> >> Jeff >> >> > > It appears the driver I was using in my pre-release testing is newer > then the release driver. > > Stock driver in 6.2r dmesg: > > aac0: <Adaptec SCSI RAID 2130S> mem > 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2 > aac0: New comm. interface enabled > aac0: Adaptec Raid Controller 2.0.0-1 > aacp0: <SCSI Passthrough Bus> on aac0 > > Currently using: > > aacu0: <Adaptec SCSI RAID 2130S> mem > 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2 > aacu0: New comm. interface enabled > aacu0: Adaptec Raid Controller 2.0.7-1 > aacpu0: <SCSI Passthrough Bus> on aacu0 > > Going to continue testing with the newer driver.I have some preliminary work on merging the Adaptec driver: http://people.freebsd.org/~delphij/for_review/patch-aac-vendor-b11518 But one of the reviewers has advised me to request boarder testing, especially against old cards and CLI tools, so I have hold the commit for now. Cheers, -- Xin LI <delphij@delphij.net> http://www.delphij.net/ FreeBSD - The Power to Serve! -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 249 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20070119/0c91eee1/signature.pgp
Jeff Royle
2007-Jan-19 17:01 UTC
6.2 Release - Adaptec 2130SLP driver?? issue - aac driver
LI Xin wrote:> Jeff Royle wrote: > >> Jeff Royle wrote: >> >>> I could use some advice on this issue I have had with my raid controller. >>> I am not really running much on the system yet, postfix, Pf + pflogd, >>> rlogind, ssh, bsnmp and ntpd. While I was just reading a file with >>> less the system stopped responding. I thought it was the network >>> interfaces but I was able to ping the interface. Once I plugged a >>> monitor into the system I saw this (roughly): >>> >>> AAC0: COMMAND <SOME HEX> TIMEOUT AFTER X number of seconds >>> >>> Not good :) >>> >>> Reset of the system resolved the issue and it booted fine. Since >>> the controller stopped responding nothing was recorded to my logs. >>> >>> Now I have to figure out how to prevent that from happening again. >>> >>> Basic run down on the system and some history... >>> >>> P4 3.2Ghz >>> Asus P5MT-S MB >>> 2 x 1GB DDR2 667 memory >>> Adaptec 2130SLP Raid Controller + battery backup module >>> 2 Segate Ultra320 73GB 15k RPM (mirrored) >>> >>> I have run this same system hardware testing 6.2-BETA3, RC-1 and RC-2 >>> without this issue. I was using the driver released by Adaptec >>> while testing the pre-release installs >>> (http://www.adaptec.com/en-US/speed/raid/aac/unix/aacraid_freebsd6_drv_b11518_tgz.htm). >>> You could say I am fairly confidient in the hardware itself. I have >>> put this system through a lot of testing since BETA3. >>> >>> The 6.2 release kernel has not been customized all that much, I just >>> pulled out all the drivers I would never use. To be safe I kept >>> just about all scsi devices/card models still in as I continued my >>> testing of 6.2 release. Right now I am going to try taking out aac and >>> aacp then try the driver I used in my previous tests. However, >>> since I have run a week without this issue it will be hard/impossible >>> tell if this did anything to resolve it...I almost want a crash on the >>> old driver :) >>> >>> So I need some advice... How best do I debug this issue? >>> >>> Thanks in advance for any direction you guys can offer me. >>> >>> Cheers, >>> >>> Jeff >>> >>> >>> >> It appears the driver I was using in my pre-release testing is newer >> then the release driver. >> >> Stock driver in 6.2r dmesg: >> >> aac0: <Adaptec SCSI RAID 2130S> mem >> 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2 >> aac0: New comm. interface enabled >> aac0: Adaptec Raid Controller 2.0.0-1 >> aacp0: <SCSI Passthrough Bus> on aac0 >> >> Currently using: >> >> aacu0: <Adaptec SCSI RAID 2130S> mem >> 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2 >> aacu0: New comm. interface enabled >> aacu0: Adaptec Raid Controller 2.0.7-1 >> aacpu0: <SCSI Passthrough Bus> on aacu0 >> >> Going to continue testing with the newer driver. >> > > I have some preliminary work on merging the Adaptec driver: > > http://people.freebsd.org/~delphij/for_review/patch-aac-vendor-b11518 > > But one of the reviewers has advised me to request boarder testing, > especially against old cards and CLI tools, so I have hold the commit > for now. > > Cheers, >I will patch my system and put it through some tests this weekend for you. As far as CLI tools are concerned any in particular I should be testing the patch with? The only CLI tool I know of is the aacli1.0 from the ports tree which definately does not work with the 2130S :) Cheers, Jeff
Jeff Royle
2007-Jan-20 17:08 UTC
6.2 Release - Adaptec 2130SLP driver?? issue - aac driver
LI Xin wrote:> Jeff Royle wrote: >> Jeff Royle wrote: >>> I could use some advice on this issue I have had with my raid controller. >>> I am not really running much on the system yet, postfix, Pf + pflogd, >>> rlogind, ssh, bsnmp and ntpd. While I was just reading a file with >>> less the system stopped responding. I thought it was the network >>> interfaces but I was able to ping the interface. Once I plugged a >>> monitor into the system I saw this (roughly): >>> >>> AAC0: COMMAND <SOME HEX> TIMEOUT AFTER X number of seconds >>> >>> Not good :) >>> >>> Reset of the system resolved the issue and it booted fine. Since >>> the controller stopped responding nothing was recorded to my logs. >>> >>> Now I have to figure out how to prevent that from happening again. >>> >>> Basic run down on the system and some history... >>> >>> P4 3.2Ghz >>> Asus P5MT-S MB >>> 2 x 1GB DDR2 667 memory >>> Adaptec 2130SLP Raid Controller + battery backup module >>> 2 Segate Ultra320 73GB 15k RPM (mirrored) >>> >>> I have run this same system hardware testing 6.2-BETA3, RC-1 and RC-2 >>> without this issue. I was using the driver released by Adaptec >>> while testing the pre-release installs >>> (http://www.adaptec.com/en-US/speed/raid/aac/unix/aacraid_freebsd6_drv_b11518_tgz.htm). >>> You could say I am fairly confidient in the hardware itself. I have >>> put this system through a lot of testing since BETA3. >>> >>> The 6.2 release kernel has not been customized all that much, I just >>> pulled out all the drivers I would never use. To be safe I kept >>> just about all scsi devices/card models still in as I continued my >>> testing of 6.2 release. Right now I am going to try taking out aac and >>> aacp then try the driver I used in my previous tests. However, >>> since I have run a week without this issue it will be hard/impossible >>> tell if this did anything to resolve it...I almost want a crash on the >>> old driver :) >>> >>> So I need some advice... How best do I debug this issue? >>> >>> Thanks in advance for any direction you guys can offer me. >>> >>> Cheers, >>> >>> Jeff >>> >>> >> It appears the driver I was using in my pre-release testing is newer >> then the release driver. >> >> Stock driver in 6.2r dmesg: >> >> aac0: <Adaptec SCSI RAID 2130S> mem >> 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2 >> aac0: New comm. interface enabled >> aac0: Adaptec Raid Controller 2.0.0-1 >> aacp0: <SCSI Passthrough Bus> on aac0 >> >> Currently using: >> >> aacu0: <Adaptec SCSI RAID 2130S> mem >> 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2 >> aacu0: New comm. interface enabled >> aacu0: Adaptec Raid Controller 2.0.7-1 >> aacpu0: <SCSI Passthrough Bus> on aacu0 >> >> Going to continue testing with the newer driver. > > I have some preliminary work on merging the Adaptec driver: > > http://people.freebsd.org/~delphij/for_review/patch-aac-vendor-b11518 > > But one of the reviewers has advised me to request boarder testing, > especially against old cards and CLI tools, so I have hold the commit > for now. > > Cheers,Well the driver patched fine, no issues to report there. The speed performance is where I expected to see it while using bonnie and simple DD tests based on my previous testing. So far the issue I noted above with the TIMEOUT error has not shown itself again, time will tell I think on this one. However I have encountered a intermittent bug on boot. Sometimes, say every 5-10 boots the system will hang while probing the the scsi bus for the drives. Now I have seen this happen on the aacdu 2.0.7-1 binary driver I was using in my 6.2-RC 1 / 6.2-RC 2 testing once before. This problem is happening a fair bit more. Here is where it hangs... Hung dmesg output: -- snip --- orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcd7ff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: parallel port not found. Timecounters tick every 1.000 msec acd0: CDRW <QSI CD-RW/DVD-ROM SBW-243/TX09> at ata0-master UDMA33 aacd0: <RAID 1 (Mirror)> on aac0 aacd0: 69889MB (143132672 sectors) --- end snip --- The system does not continue on and probe the drives, as seen in a normal boot dmesg: --- snip --- sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: parallel port not found. Timecounters tick every 1.000 msec acd0: CDRW <QSI CD-RW/DVD-ROM SBW-243/TX09> at ata0-master UDMA33 aacd0: <RAID 1 (Mirror)> on aac0 aacd0: 69889MB (143132672 sectors) pass0 at aacp0 bus 0 target 0 lun 0 pass0: <SEAGATE ST373207LC 0005> Fixed unknown SCSI-3 device pass0: 3.300MB/s transfers pass1 at aacp0 bus 0 target 3 lun 0 pass1: <SEAGATE ST373207LC 0005> Fixed unknown SCSI-3 device pass1: 3.300MB/s transfers SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/aacd0s1a -- end snip -- In a effort to resolve this I increased the scsi delay in the kernel from 5ms to 10ms options SCSI_DELAY=10000 It *may* have helped on one of my reboot tests, I thought it was going to hang again but proceeded. However it definitely did not solve the issue. Once I am back in the office I will see if I can get some debug output for you. Cheers, Jeff