Andy Farkas
2010-Apr-27 09:12 UTC
MFC of "Large set of CAM improvements" breaks I/O to Adaptec 29160 SCSI controller
Hi, firstly: RELENG_8 csup'd with date=2010.02.14.00.00 works perfectly for days. RELENG_8 csup'd with date=2010.02.15.00.00 dead-locks the disk I/O subsystem. Network still operational but anything needing disk hangs. Power-cycle required. kernel config is GENERIC with KDB, DDB and BREAK_TO_DEBUGGER options added. hardware: ahc0: <Adaptec 29160 Ultra160 SCSI adapter> port 0x4000-0x40ff mem 0xefa00000-0xefa00fff irq 16 at device 0.0 on pci10 ahc0: [ITHREAD] aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs da0: <SEAGATE ST3146707LW 0005> Fixed Direct Access SCSI-3 device da1: <SEAGATE ST3146707LW 0005> Fixed Direct Access SCSI-3 device The dead-lock can happen at any time, but I can provoke it by running a bonnie++ disk test. It happens doing rm -rf /usr/obj/usr and it has happened doing a make installworld. It can survive a make buildworld (the system runs normally until it decides to dead-lock). The box (HP ProLiant ML 110) has 2 scsi disks and 4 sata disks. The 2010.02.15 kernel will run perfectly for days on the SATA disks. *Only* when the scsi disks are accessed will the system dead-lock. Note that the SATA disks do not work either if the system has dead-locked. I can provide more details and a vmcore.0 if anyone is interested. -andyf
Pete French
2010-Apr-27 11:16 UTC
MFC of "Large set of CAM improvements" breaks I/O to Adaptec 29160 SCSI controller
> RELENG_8 csup'd with date=2010.02.14.00.00 works perfectly for days. > > RELENG_8 csup'd with date=2010.02.15.00.00 dead-locks the disk I/O > subsystem. Network still operational but anything needing disk hangs. > Power-cycle required.An aditional point (and thanks to Andy for doing all the work to identify the changes which cause the problems) this appears to be the cause of the problem I have with gmirror since upgrading. I only see this on SCSI systems too, and both my systems are using Adaptec controllers, as is Andy's system. -pete.
Alexander Motin
2010-Apr-28 15:02 UTC
MFC of "Large set of CAM improvements" breaks I/O to Adaptec 29160 SCSI controller
Andy Farkas wrote:> RELENG_8 csup'd with date=2010.02.14.00.00 works perfectly for days. > > RELENG_8 csup'd with date=2010.02.15.00.00 dead-locks the disk I/O > subsystem. Network still operational but anything needing disk hangs. > Power-cycle required. > > kernel config is GENERIC with KDB, DDB and BREAK_TO_DEBUGGER options added. > > hardware: > ahc0: <Adaptec 29160 Ultra160 SCSI adapter> port 0x4000-0x40ff mem > 0xefa00000-0xefa00fff irq 16 at device 0.0 on pci10 > ahc0: [ITHREAD] > aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs > > da0: <SEAGATE ST3146707LW 0005> Fixed Direct Access SCSI-3 device > da1: <SEAGATE ST3146707LW 0005> Fixed Direct Access SCSI-3 device > > > The dead-lock can happen at any time, but I can provoke it by running > a bonnie++ disk test. It happens doing rm -rf /usr/obj/usr and it has > happened doing a make installworld. It can survive a make buildworld > (the system runs normally until it decides to dead-lock). > > The box (HP ProLiant ML 110) has 2 scsi disks and 4 sata disks. The > 2010.02.15 kernel will run perfectly for days on the SATA disks. *Only* > when the scsi disks are accessed will the system dead-lock. Note that > the SATA disks do not work either if the system has dead-locked. > > I can provide more details and a vmcore.0 if anyone is interested.I have some 29160N locally and I'll try to reproduce this. -- Alexander Motin
Andy Farkas
2010-May-02 02:14 UTC
MFC of "Large set of CAM improvements" breaks I/O to Adaptec 29160 SCSI controller
On Fri, Apr 30, 2010 at 4:42 AM, Pete French <petefrench@ticketswitch.com> wrote:> > I've copied in the original poster of the problem to see how he is > doing, but as far as I am concerned the problem has gone away. Certainly > the things I was doing before to triger it no longer do so. Of course > in the normal state of things it was rarely locking up (every few days) > so I can't be sure just on a few hurs testing. But my initial impression > is that this fixes it. Good work! >Confirming patch fixes problem. Thanks Alexander, good pick-up on finding the missing bit of code! bootverbose <-- /me smacks forehead with palm of hand -andyf