Hi,
I have been trying out a nice new tws controller and decided to enable
debugging in the kernel and run some stress tests. With a regular
GENERIC kernel, it boots up fine. But with debugging, it panics on
boot. Anyone know whats up ? Is this something that should be sent
directly to LSI ?
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> irq 17 at device 1.1 on pci0
pci2: <ACPI PCI bus> on pcib2
LSI 3ware device driver for SAS/SATA storage controllers, version:
10.80.00.003
tws0: <LSI 3ware SAS/SATA Storage Controller> port 0x4000-0x40ff mem
0xc2460000-0xc2463fff,0xc2400000-0xc243ffff irq 17 at device 0.0 on pci2
tws0: Using legacy INTx
panic: _mtx_lock_sleep: recursed on non-recursive mutex tws_io_lock @
/usr/HEAD/src/sys/dev/tws/tws_hdm.c:287
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x1d8
_mtx_lock_sleep() at _mtx_lock_sleep+0x27f
_mtx_lock_flags() at _mtx_lock_flags+0xf1
tws_submit_command() at tws_submit_command+0x3f
tws_dmamap_data_load_cbfn() at tws_dmamap_data_load_cbfn+0xb7
bus_dmamap_load() at bus_dmamap_load+0x16c
tws_map_request() at tws_map_request+0x78
tws_get_param() at tws_get_param+0xe1
tws_display_ctlr_info() at tws_display_ctlr_info+0x4c
tws_init_ctlr() at tws_init_ctlr+0x6d
tws_attach() at tws_attach+0x68c
device_attach() at device_attach+0x72
bus_generic_attach() at bus_generic_attach+0x1a
acpi_pci_attach() at acpi_pci_attach+0x164
device_attach() at device_attach+0x72
bus_generic_attach() at bus_generic_attach+0x1a
acpi_pcib_attach() at acpi_pcib_attach+0x1a7
acpi_pcib_pci_attach() at acpi_pcib_pci_attach+0x9b
device_attach() at device_attach+0x72
bus_generic_attach() at bus_generic_attach+0x1a
acpi_pci_attach() at acpi_pci_attach+0x164
device_attach() at device_attach+0x72
bus_generic_attach() at bus_generic_attach+0x1a
acpi_pcib_attach() at acpi_pcib_attach+0x1a7
acpi_pcib_acpi_attach() at acpi_pcib_acpi_attach+0x1f6
device_attach() at device_attach+0x72
bus_generic_attach() at bus_generic_attach+0x1a
acpi_attach() at acpi_attach+0xbc1
device_attach() at device_attach+0x72
bus_generic_attach() at bus_generic_attach+0x1a
nexus_acpi_attach() at nexus_acpi_attach+0x69
device_attach() at device_attach+0x72
bus_generic_new_pass() at bus_generic_new_pass+0xd6
bus_set_pass() at bus_set_pass+0x7a
configure() at configure+0xa
mi_startup() at mi_startup+0x77
btext() at btext+0x2c
KDB: enter: panic
[ thread pid 0 tid 100000 ]
Stopped at kdb_enter+0x3b: movq $0,0x993262(%rip)
db>
int
tws_submit_command(struct tws_softc *sc, struct tws_request *req)
{
u_int32_t regl, regh;
u_int64_t mfa=0;
/*
* mfa register read and write must be in order.
* Get the io_lock to protect against simultinous
* passthru calls
*/
mtx_lock(&sc->io_lock);
if ( sc->obfl_q_overrun ) {
tws_init_obfl_q(sc);
}
With no debugging in the kernel, it boots up fine
pcib2: <ACPI PCI-PCI bridge> irq 17 at device 1.1 on pci0
pci2: <ACPI PCI bus> on pcib2
LSI 3ware device driver for SAS/SATA storage controllers, version:
10.80.00.003
tws0: <LSI 3ware SAS/SATA Storage Controller> port 0x4000-0x40ff mem
0xc2460000-0xc2463fff,0xc2400000-0xc243ffff irq 17 at device 0.0 on pci2
tws0: Using legacy INTx
tws0: Controller details: Model 9750-4i, 8 Phys, Firmware FH9X
5.12.00.007, BIOS BE9X 5.11.00.006
em0: <Intel(R) PRO/1000 Network Connection 7.3.2> port 0x5040-0x505f mem
0xc2500000-0xc251ffff,0xc2570000-0xc2570fff irq 19 at device 25.0 on pci0
em0: Using an MSI interrupt
em0: Ethernet address: 00:1e:67:45:b6:29
ehci0: <EHCI (generic) USB 2.0 controller> mem 0xc2560000-0xc25603ff irq
22 at device 26.0 on pci0
usbus0: EHCI version 1.0
usbus0 on ehci0
tws0@pci0:2:0:0: class=0x010400 card=0x000113c1 chip=0x101013c1
rev=0x05 hdr=0x00
vendor = '3ware Inc'
device = '9750 SAS2/SATA-II RAID PCIe'
class = mass storage
subclass = RAID
bar [10] = type I/O Port, range 32, base 0x4000, size 256, enabled
bar [14] = type Memory, range 64, base 0xc2460000, size 16384, enabled
bar [1c] = type Memory, range 64, base 0xc2400000, size 262144,
enabled
cap 01[50] = powerspec 3 supports D0 D1 D2 D3 current D0
cap 10[68] = PCI-Express 2 endpoint max data 128(4096) link x4(x8)
cap 03[d0] = VPD
cap 05[a8] = MSI supports 1 message, 64 bit
ecap 0001[100] = AER 1 1 fatal 0 non-fatal 0 corrected
ecap 0004[138] = unknown 1
PCI-e errors = Fatal Error Detected
Unsupported Request Detected
Fatal = Unsupported Request
Also, any reason NOT to set hw.tws.enable_msi=1 in /boot/loader.conf ?
---Mike
--
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada http://www.tancsa.com/
On Fri, Sep 21, 2012 at 1:07 PM, Mike Tancsa <mike@sentex.net> wrote:> Hi, > I have been trying out a nice new tws controller and decided to enable > debugging in the kernel and run some stress tests. With a regular > GENERIC kernel, it boots up fine. But with debugging, it panics on > boot. Anyone know whats up ? Is this something that should be sent > directly to LSI ?Through a code inspection, this mutex is being recursed whether or not debugging is enabled. There is no code path here specific to INVARIANTS. And the main IO path in this driver is always recursing on this lock - it is not specific to the initialization callstack you listed below. The best course of action seems to be initializing the lock with MTX_RECURSE, since the driver seems to expect to be able to recurse on the io_lock. Can you try the following patch? diff --git a/sys/dev/tws/tws.c b/sys/dev/tws/tws.c index b1615db..d156d40 100644 --- a/sys/dev/tws/tws.c +++ b/sys/dev/tws/tws.c @@ -197,7 +197,7 @@ tws_attach(device_t dev) mtx_init( &sc->q_lock, "tws_q_lock", NULL, MTX_DEF); mtx_init( &sc->sim_lock, "tws_sim_lock", NULL, MTX_DEF); mtx_init( &sc->gen_lock, "tws_gen_lock", NULL, MTX_DEF); - mtx_init( &sc->io_lock, "tws_io_lock", NULL, MTX_DEF); + mtx_init( &sc->io_lock, "tws_io_lock", NULL, MTX_DEF | MTX_RECURSE); if ( tws_init_trace_q(sc) == FAILURE ) printf("trace init failure\n");> > pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 > pci0: <ACPI PCI bus> on pcib0 > pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0 > pci1: <ACPI PCI bus> on pcib1 > pcib2: <ACPI PCI-PCI bridge> irq 17 at device 1.1 on pci0 > pci2: <ACPI PCI bus> on pcib2 > LSI 3ware device driver for SAS/SATA storage controllers, version: > 10.80.00.003 > tws0: <LSI 3ware SAS/SATA Storage Controller> port 0x4000-0x40ff mem > 0xc2460000-0xc2463fff,0xc2400000-0xc243ffff irq 17 at device 0.0 on pci2 > tws0: Using legacy INTx > panic: _mtx_lock_sleep: recursed on non-recursive mutex tws_io_lock @ > /usr/HEAD/src/sys/dev/tws/tws_hdm.c:287 > > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x1d8 > _mtx_lock_sleep() at _mtx_lock_sleep+0x27f > _mtx_lock_flags() at _mtx_lock_flags+0xf1 > tws_submit_command() at tws_submit_command+0x3f > tws_dmamap_data_load_cbfn() at tws_dmamap_data_load_cbfn+0xb7 > bus_dmamap_load() at bus_dmamap_load+0x16c > tws_map_request() at tws_map_request+0x78 > tws_get_param() at tws_get_param+0xe1 > tws_display_ctlr_info() at tws_display_ctlr_info+0x4c > tws_init_ctlr() at tws_init_ctlr+0x6d > tws_attach() at tws_attach+0x68c > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_pci_attach() at acpi_pci_attach+0x164 > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_pcib_attach() at acpi_pcib_attach+0x1a7 > acpi_pcib_pci_attach() at acpi_pcib_pci_attach+0x9b > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_pci_attach() at acpi_pci_attach+0x164 > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_pcib_attach() at acpi_pcib_attach+0x1a7 > acpi_pcib_acpi_attach() at acpi_pcib_acpi_attach+0x1f6 > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_attach() at acpi_attach+0xbc1 > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > nexus_acpi_attach() at nexus_acpi_attach+0x69 > device_attach() at device_attach+0x72 > bus_generic_new_pass() at bus_generic_new_pass+0xd6 > bus_set_pass() at bus_set_pass+0x7a > configure() at configure+0xa > mi_startup() at mi_startup+0x77 > btext() at btext+0x2c > KDB: enter: panic > [ thread pid 0 tid 100000 ] > Stopped at kdb_enter+0x3b: movq $0,0x993262(%rip) > db> > > > int > tws_submit_command(struct tws_softc *sc, struct tws_request *req) > { > u_int32_t regl, regh; > u_int64_t mfa=0; > > /* > * mfa register read and write must be in order. > * Get the io_lock to protect against simultinous > * passthru calls > */ > mtx_lock(&sc->io_lock); > > if ( sc->obfl_q_overrun ) { > tws_init_obfl_q(sc); > } > > > > With no debugging in the kernel, it boots up fine > > pcib2: <ACPI PCI-PCI bridge> irq 17 at device 1.1 on pci0 > pci2: <ACPI PCI bus> on pcib2 > LSI 3ware device driver for SAS/SATA storage controllers, version: > 10.80.00.003 > tws0: <LSI 3ware SAS/SATA Storage Controller> port 0x4000-0x40ff mem > 0xc2460000-0xc2463fff,0xc2400000-0xc243ffff irq 17 at device 0.0 on pci2 > tws0: Using legacy INTx > tws0: Controller details: Model 9750-4i, 8 Phys, Firmware FH9X > 5.12.00.007, BIOS BE9X 5.11.00.006 > em0: <Intel(R) PRO/1000 Network Connection 7.3.2> port 0x5040-0x505f mem > 0xc2500000-0xc251ffff,0xc2570000-0xc2570fff irq 19 at device 25.0 on pci0 > em0: Using an MSI interrupt > em0: Ethernet address: 00:1e:67:45:b6:29 > ehci0: <EHCI (generic) USB 2.0 controller> mem 0xc2560000-0xc25603ff irq > 22 at device 26.0 on pci0 > usbus0: EHCI version 1.0 > usbus0 on ehci0 > > > tws0@pci0:2:0:0: class=0x010400 card=0x000113c1 chip=0x101013c1 > rev=0x05 hdr=0x00 > vendor = '3ware Inc' > device = '9750 SAS2/SATA-II RAID PCIe' > class = mass storage > subclass = RAID > bar [10] = type I/O Port, range 32, base 0x4000, size 256, enabled > bar [14] = type Memory, range 64, base 0xc2460000, size 16384, enabled > bar [1c] = type Memory, range 64, base 0xc2400000, size 262144, > enabled > cap 01[50] = powerspec 3 supports D0 D1 D2 D3 current D0 > cap 10[68] = PCI-Express 2 endpoint max data 128(4096) link x4(x8) > cap 03[d0] = VPD > cap 05[a8] = MSI supports 1 message, 64 bit > ecap 0001[100] = AER 1 1 fatal 0 non-fatal 0 corrected > ecap 0004[138] = unknown 1 > PCI-e errors = Fatal Error Detected > Unsupported Request Detected > Fatal = Unsupported Request > > > > > Also, any reason NOT to set hw.tws.enable_msi=1 in /boot/loader.conf ? > > ---Mike > > > > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, mike@sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/ > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
On 9/21/2012 4:59 PM, Jim Harris wrote:>> boot. Anyone know whats up ? Is this something that should be sent >> directly to LSI ? > > Through a code inspection, this mutex is being recursed whether or not > debugging is enabled. There is no code path here specific to > INVARIANTS. And the main IO path in this driver is always recursing > on this lock - it is not specific to the initialization callstack you > listed below. > > The best course of action seems to be initializing the lock with > MTX_RECURSE, since the driver seems to expect to be able to recurse on > the io_lock. Can you try the following patch? > > diff --git a/sys/dev/tws/tws.c b/sys/dev/tws/tws.c > index b1615db..d156d40 100644 > --- a/sys/dev/tws/tws.c > +++ b/sys/dev/tws/tws.c > @@ -197,7 +197,7 @@ tws_attach(device_t dev) > mtx_init( &sc->q_lock, "tws_q_lock", NULL, MTX_DEF); > mtx_init( &sc->sim_lock, "tws_sim_lock", NULL, MTX_DEF); > mtx_init( &sc->gen_lock, "tws_gen_lock", NULL, MTX_DEF); > - mtx_init( &sc->io_lock, "tws_io_lock", NULL, MTX_DEF); > + mtx_init( &sc->io_lock, "tws_io_lock", NULL, MTX_DEF | MTX_RECURSE); > > if ( tws_init_trace_q(sc) == FAILURE ) > printf("trace init failure\n");Thanks, that allows it to boot up now! pci2: <ACPI PCI bus> on pcib2 LSI 3ware device driver for SAS/SATA storage controllers, version: 10.80.00.003 tws0: <LSI 3ware SAS/SATA Storage Controller> port 0x4000-0x40ff mem 0xc2460000-0xc2463fff,0xc2400000-0xc243ffff irq 17 at device 0.0 on pci2 tws0: Using MSI tws0: Controller details: Model 9750-4i, 8 Phys, Firmware FH9X 5.12.00.007, BIOS BE9X 5.11.00.006 em0: <Intel(R) PRO/1000 Network Connection 7.3.2> port 0x5040-0x505f mem 0xc2500000-0xc251ffff,0xc2570000-0xc2570fff irq 19 at device 25.0 on pci0 . then a lot of . (probe65:tws0:0:65:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe65:tws0:0:65:0): CAM status: Invalid Target ID (probe65:tws0:0:65:0): Error 22, Unretryable error (probe1:tws0:0:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe1:tws0:0:1:0): CAM status: Invalid Target ID (probe1:tws0:0:1:0): Error 22, Unretryable error (probe2:tws0:0:2:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe2:tws0:0:2:0): CAM status: Invalid Target ID . . . (probe63:tws0:0:63:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe63:tws0:0:63:0): CAM status: Invalid Target ID (probe63:tws0:0:63:0): Error 22, Unretryable error (probe64:tws0:0:64:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:tws0:0:64:0): CAM status: Invalid Target ID (probe64:tws0:0:64:0): Error 22, Unretryable error da0 at tws0 bus 0 scbus0 target 0 lun 0 da0: <LSI 9750-4i DISK 5.12> Fixed Direct Access SCSI-5 device da0: 6000.000MB/s transfers da0: 953654MB (1953083392 512 byte sectors: 255H 63S/T 121573C) SMP: AP CPU #1 Launched! SMP: AP CPU #4 Launched!>> >> >> >> Also, any reason NOT to set hw.tws.enable_msi=1 in /boot/loader.conf ?Any thoughts on msi vs no msi ? Time to run some stress tests. Its certainly a fast little controller for the money! ---Mike
On Fri, Sep 21, 2012 at 5:37 PM, Mike Tancsa <mike@sentex.net> wrote:> On 9/21/2012 8:03 PM, Jim Harris wrote: >>> . >>> then a lot of >>> . >>> (probe65:tws0:0:65:0): INQUIRY. CDB: 12 0 0 0 24 0 >>> (probe65:tws0:0:65:0): CAM status: Invalid Target ID >>> (probe65:tws0:0:65:0): Error 22, Unretryable error >>> (probe1:tws0:0:1:0): INQUIRY. CDB: 12 0 0 0 24 0 >>> (probe1:tws0:0:1:0): CAM status: Invalid Target ID >>> (probe1:tws0:0:1:0): Error 22, Unretryable error >>> (probe2:tws0:0:2:0): INQUIRY. CDB: 12 0 0 0 24 0 >>> (probe2:tws0:0:2:0): CAM status: Invalid Target ID >>> . >>> . >>> . >>> (probe63:tws0:0:63:0): INQUIRY. CDB: 12 0 0 0 24 0 >>> (probe63:tws0:0:63:0): CAM status: Invalid Target ID >>> (probe63:tws0:0:63:0): Error 22, Unretryable error >>> (probe64:tws0:0:64:0): INQUIRY. CDB: 12 0 0 0 24 0 >>> (probe64:tws0:0:64:0): CAM status: Invalid Target ID >>> (probe64:tws0:0:64:0): Error 22, Unretryable error>> These can be ignored. CAM is just telling you that there are no >> devices attached at these target IDs.> What about a change similar to what Alexander Motin did in> http://lists.freebsd.org/pipermail/svn-src-head/2012-June/038196.htmlJim Harris <jimharris@freebsd.org> responded:> Ah, yes. I was thinking you had CAM_DEBUG enabled which is why you > were seeing this spew - but that's not the case. This indeed should > be fixed and not just ignored.> Seeing the attributions on Alexander's commit, you certainly seem to > have a monopoly on controllers that exhibit this problem on FreeBSD. > :)> I believe the CAM_LUN_INVALID here should be fixed as well, similar to > the twa commit. If you send me a revised patch I will commit it.The specific subject of this thread is not my issue, but I did notice problems apparently related to CAM on a SATA hard drive. I use one UFS partition, with FreeBSD 9.0-BETA1 installed (subsequently updated on another partition, using GPT as opposed to MBR), for ports tree and also NetBSD pkgsrc and NetBSD source code. I built NetBSD 5.1_STABLE i386 from FreeBSD and also built xorg-modular on the new NetBSD installation from pkgsrc. Going into and out of the newly installed Xorg resulted in some crashes with the FreeBSD 9.0-BETA1 partition mounted and not cleanly unmounted. File system was damaged, and FreeBSD fsck_ffs wouldn't fix it, went into a loop: Script started on Wed Sep 19 04:15:02 2012 fsck_ffs /dev/ada0p9 ** /dev/ada0p9 ** Last Mounted on /BETA1 ** Phase 1 - Check Blocks and Sizes CANNOT READ BLK: 7584192 CONTINUE? [yn] y THE FOLLOWING DISK SECTORS COULD NOT BE READ: 7584318, 7584319, ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1475900 files, 4638292 used, 21162419 free (61643 frags, 2637597 blocks, 0.2% fragmentation) ***** FILE SYSTEM STILL DIRTY ***** ***** PLEASE RERUN FSCK ***** Script done on Wed Sep 19 04:17:27 2012 This happened repeatedly, meaning an impasse. I didn't get to record preceding error messages relating to ATA and CAM but, seeing this last message, wonder if there are some bugs in the CAM. I booted that new NetBSD 5.1_STABLE i386 installation, on a USB stick, was able to mount that partition and see it wasn't trashed though there was a message about the dirty flag. I then umounted and ran NetBSD fsck_ffs successfully, just a few files were lost, and FreeBSD can access that partition again. I still intend to be more cautious when in NetBSD, not mounting a FreeBSD partition unnecessarily when doing something crash-prone on my system in NetBSD, such as going into and out of X. Tom
Reasonably Related Threads
- Strange reboot since 9.1
- [Fwd: Re: [Fwd: Re: Still ATAPICAM Lockup/Slowdown]]
- problem packaging S4 class that contains a slot of jobjRef class
- FreeBSD 9 "gptboot: invalid backup GPT header" error (boots fine though)
- help ith burncd (Input/output error, 6.1-RC, plextor PX-740a)