Garrett Wollman
2014-Aug-06 19:46 UTC
9.3-RELEASE still instapanics on multi-mps(4) servers
Remember about six months ago when I tested 9-stable on one of my big NFS servers, and had it panic in the middle of the USB probe, but ultimately bisected the problem to an update to the mps(4) driver? I had to stop investigating and get the server working (which I did, by installing 9.2 instead of something newer and presumably faster). I'm at the point now where I'd like to upgrade my file servers to 9.3, and I can't, because of this issue, so it's time to start tracking down the bug again. I have two test servers now. 9.3 works just fine on one of them, and panics on the other. The one it works on is slightly older, and has an mpt(4) controller for the boot drives, as opposed to the system where the panic happens, which is mps(4)-only. Both systems have two SAS2116 controllers for external drives; the ones on the working system have drives attached, and the ones on the non-working system are not connected to anything (and in fact disabled in the BIOS for now). Any experts want to suggest where to start (besides, obviously, attaching a serial console, which I haven't done yet)? I saw one change in the svn logs for 9.3 prior to the release which looked like it might be a relevant fix, but it clearly hasn't improved anything for my servers. -GAWollman
Steven Hartland
2014-Aug-06 20:26 UTC
9.3-RELEASE still instapanics on multi-mps(4) servers
The stack from the panic would be a good start. ----- Original Message ----- From: "Garrett Wollman" <wollman at bimajority.org> To: <freebsd-stable at freebsd.org> Cc: <freebsd-scsi at freebsd.org> Sent: Wednesday, August 06, 2014 8:46 PM Subject: 9.3-RELEASE still instapanics on multi-mps(4) servers> Remember about six months ago when I tested 9-stable on one of my big > NFS servers, and had it panic in the middle of the USB probe, but > ultimately bisected the problem to an update to the mps(4) driver? I > had to stop investigating and get the server working (which I did, by > installing 9.2 instead of something newer and presumably faster). I'm > at the point now where I'd like to upgrade my file servers to 9.3, and > I can't, because of this issue, so it's time to start tracking down > the bug again. > > I have two test servers now. 9.3 works just fine on one of them, and > panics on the other. The one it works on is slightly older, and has > an mpt(4) controller for the boot drives, as opposed to the system > where the panic happens, which is mps(4)-only. Both systems have two > SAS2116 controllers for external drives; the ones on the working > system have drives attached, and the ones on the non-working system > are not connected to anything (and in fact disabled in the BIOS for > now). > > Any experts want to suggest where to start (besides, obviously, > attaching a serial console, which I haven't done yet)? I saw one > change in the svn logs for 9.3 prior to the release which looked like > it might be a relevant fix, but it clearly hasn't improved anything > for my servers. > > -GAWollman > _______________________________________________ > freebsd-stable at freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >
Garrett Wollman <wollman at bimajority.org> writes:> Remember about six months ago when I tested 9-stable on one of my big > NFS servers, and had it panic in the middle of the USB probe, but > ultimately bisected the problem to an update to the mps(4) driver? I > had to stop investigating and get the server working (which I did, by > installing 9.2 instead of something newer and presumably faster). I'm > at the point now where I'd like to upgrade my file servers to 9.3, and > I can't, because of this issue, so it's time to start tracking down > the bug again. > > I have two test servers now. 9.3 works just fine on one of them, and > panics on the other. The one it works on is slightly older, and has > an mpt(4) controller for the boot drives, as opposed to the system > where the panic happens, which is mps(4)-only. Both systems have two > SAS2116 controllers for external drives; the ones on the working > system have drives attached, and the ones on the non-working system > are not connected to anything (and in fact disabled in the BIOS for > now). > > Any experts want to suggest where to start (besides, obviously, > attaching a serial console, which I haven't done yet)? I saw one > change in the svn logs for 9.3 prior to the release which looked like > it might be a relevant fix, but it clearly hasn't improved anything > for my servers.I have a dual-mps (one Dell H200 and one LSI 9207) system now running 9.1 (w patched sys/kern/kern_intr.c) that I intend to upgrade to 9.3 shortly. Is it enough to boot a 9.3 memstick to provoke the issue? Bengt
John-Mark Gurney
2014-Aug-11 20:59 UTC
9.3-RELEASE still instapanics on multi-mps(4) servers
Garrett Wollman wrote this message on Wed, Aug 06, 2014 at 15:46 -0400:> Remember about six months ago when I tested 9-stable on one of my big > NFS servers, and had it panic in the middle of the USB probe, but > ultimately bisected the problem to an update to the mps(4) driver? I > had to stop investigating and get the server working (which I did, by > installing 9.2 instead of something newer and presumably faster). I'm > at the point now where I'd like to upgrade my file servers to 9.3, and > I can't, because of this issue, so it's time to start tracking down > the bug again. > > I have two test servers now. 9.3 works just fine on one of them, and > panics on the other. The one it works on is slightly older, and has > an mpt(4) controller for the boot drives, as opposed to the system > where the panic happens, which is mps(4)-only. Both systems have two > SAS2116 controllers for external drives; the ones on the working > system have drives attached, and the ones on the non-working system > are not connected to anything (and in fact disabled in the BIOS for > now). > > Any experts want to suggest where to start (besides, obviously, > attaching a serial console, which I haven't done yet)? I saw one > change in the svn logs for 9.3 prior to the release which looked like > it might be a relevant fix, but it clearly hasn't improved anything > for my servers.One thing you could do is turn on DEBUG_MEMGUARD and DEBUG_REDZONE and see if they trigger... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."