Hi, I have been trying out a nice new tws controller and decided to enable debugging in the kernel and run some stress tests. With a regular GENERIC kernel, it boots up fine. But with debugging, it panics on boot. Anyone know whats up ? Is this something that should be sent directly to LSI ? pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> irq 17 at device 1.1 on pci0 pci2: <ACPI PCI bus> on pcib2 LSI 3ware device driver for SAS/SATA storage controllers, version: 10.80.00.003 tws0: <LSI 3ware SAS/SATA Storage Controller> port 0x4000-0x40ff mem 0xc2460000-0xc2463fff,0xc2400000-0xc243ffff irq 17 at device 0.0 on pci2 tws0: Using legacy INTx panic: _mtx_lock_sleep: recursed on non-recursive mutex tws_io_lock @ /usr/HEAD/src/sys/dev/tws/tws_hdm.c:287 cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1d8 _mtx_lock_sleep() at _mtx_lock_sleep+0x27f _mtx_lock_flags() at _mtx_lock_flags+0xf1 tws_submit_command() at tws_submit_command+0x3f tws_dmamap_data_load_cbfn() at tws_dmamap_data_load_cbfn+0xb7 bus_dmamap_load() at bus_dmamap_load+0x16c tws_map_request() at tws_map_request+0x78 tws_get_param() at tws_get_param+0xe1 tws_display_ctlr_info() at tws_display_ctlr_info+0x4c tws_init_ctlr() at tws_init_ctlr+0x6d tws_attach() at tws_attach+0x68c device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_pci_attach() at acpi_pci_attach+0x164 device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_pcib_attach() at acpi_pcib_attach+0x1a7 acpi_pcib_pci_attach() at acpi_pcib_pci_attach+0x9b device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_pci_attach() at acpi_pci_attach+0x164 device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_pcib_attach() at acpi_pcib_attach+0x1a7 acpi_pcib_acpi_attach() at acpi_pcib_acpi_attach+0x1f6 device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_attach() at acpi_attach+0xbc1 device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a nexus_acpi_attach() at nexus_acpi_attach+0x69 device_attach() at device_attach+0x72 bus_generic_new_pass() at bus_generic_new_pass+0xd6 bus_set_pass() at bus_set_pass+0x7a configure() at configure+0xa mi_startup() at mi_startup+0x77 btext() at btext+0x2c KDB: enter: panic [ thread pid 0 tid 100000 ] Stopped at kdb_enter+0x3b: movq $0,0x993262(%rip) db> int tws_submit_command(struct tws_softc *sc, struct tws_request *req) { u_int32_t regl, regh; u_int64_t mfa=0; /* * mfa register read and write must be in order. * Get the io_lock to protect against simultinous * passthru calls */ mtx_lock(&sc->io_lock); if ( sc->obfl_q_overrun ) { tws_init_obfl_q(sc); } With no debugging in the kernel, it boots up fine pcib2: <ACPI PCI-PCI bridge> irq 17 at device 1.1 on pci0 pci2: <ACPI PCI bus> on pcib2 LSI 3ware device driver for SAS/SATA storage controllers, version: 10.80.00.003 tws0: <LSI 3ware SAS/SATA Storage Controller> port 0x4000-0x40ff mem 0xc2460000-0xc2463fff,0xc2400000-0xc243ffff irq 17 at device 0.0 on pci2 tws0: Using legacy INTx tws0: Controller details: Model 9750-4i, 8 Phys, Firmware FH9X 5.12.00.007, BIOS BE9X 5.11.00.006 em0: <Intel(R) PRO/1000 Network Connection 7.3.2> port 0x5040-0x505f mem 0xc2500000-0xc251ffff,0xc2570000-0xc2570fff irq 19 at device 25.0 on pci0 em0: Using an MSI interrupt em0: Ethernet address: 00:1e:67:45:b6:29 ehci0: <EHCI (generic) USB 2.0 controller> mem 0xc2560000-0xc25603ff irq 22 at device 26.0 on pci0 usbus0: EHCI version 1.0 usbus0 on ehci0 tws0@pci0:2:0:0: class=0x010400 card=0x000113c1 chip=0x101013c1 rev=0x05 hdr=0x00 vendor = '3ware Inc' device = '9750 SAS2/SATA-II RAID PCIe' class = mass storage subclass = RAID bar [10] = type I/O Port, range 32, base 0x4000, size 256, enabled bar [14] = type Memory, range 64, base 0xc2460000, size 16384, enabled bar [1c] = type Memory, range 64, base 0xc2400000, size 262144, enabled cap 01[50] = powerspec 3 supports D0 D1 D2 D3 current D0 cap 10[68] = PCI-Express 2 endpoint max data 128(4096) link x4(x8) cap 03[d0] = VPD cap 05[a8] = MSI supports 1 message, 64 bit ecap 0001[100] = AER 1 1 fatal 0 non-fatal 0 corrected ecap 0004[138] = unknown 1 PCI-e errors = Fatal Error Detected Unsupported Request Detected Fatal = Unsupported Request Also, any reason NOT to set hw.tws.enable_msi=1 in /boot/loader.conf ? ---Mike -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/
On Fri, Sep 21, 2012 at 1:07 PM, Mike Tancsa <mike@sentex.net> wrote:> Hi, > I have been trying out a nice new tws controller and decided to enable > debugging in the kernel and run some stress tests. With a regular > GENERIC kernel, it boots up fine. But with debugging, it panics on > boot. Anyone know whats up ? Is this something that should be sent > directly to LSI ?Through a code inspection, this mutex is being recursed whether or not debugging is enabled. There is no code path here specific to INVARIANTS. And the main IO path in this driver is always recursing on this lock - it is not specific to the initialization callstack you listed below. The best course of action seems to be initializing the lock with MTX_RECURSE, since the driver seems to expect to be able to recurse on the io_lock. Can you try the following patch? diff --git a/sys/dev/tws/tws.c b/sys/dev/tws/tws.c index b1615db..d156d40 100644 --- a/sys/dev/tws/tws.c +++ b/sys/dev/tws/tws.c @@ -197,7 +197,7 @@ tws_attach(device_t dev) mtx_init( &sc->q_lock, "tws_q_lock", NULL, MTX_DEF); mtx_init( &sc->sim_lock, "tws_sim_lock", NULL, MTX_DEF); mtx_init( &sc->gen_lock, "tws_gen_lock", NULL, MTX_DEF); - mtx_init( &sc->io_lock, "tws_io_lock", NULL, MTX_DEF); + mtx_init( &sc->io_lock, "tws_io_lock", NULL, MTX_DEF | MTX_RECURSE); if ( tws_init_trace_q(sc) == FAILURE ) printf("trace init failure\n");> > pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 > pci0: <ACPI PCI bus> on pcib0 > pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0 > pci1: <ACPI PCI bus> on pcib1 > pcib2: <ACPI PCI-PCI bridge> irq 17 at device 1.1 on pci0 > pci2: <ACPI PCI bus> on pcib2 > LSI 3ware device driver for SAS/SATA storage controllers, version: > 10.80.00.003 > tws0: <LSI 3ware SAS/SATA Storage Controller> port 0x4000-0x40ff mem > 0xc2460000-0xc2463fff,0xc2400000-0xc243ffff irq 17 at device 0.0 on pci2 > tws0: Using legacy INTx > panic: _mtx_lock_sleep: recursed on non-recursive mutex tws_io_lock @ > /usr/HEAD/src/sys/dev/tws/tws_hdm.c:287 > > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x1d8 > _mtx_lock_sleep() at _mtx_lock_sleep+0x27f > _mtx_lock_flags() at _mtx_lock_flags+0xf1 > tws_submit_command() at tws_submit_command+0x3f > tws_dmamap_data_load_cbfn() at tws_dmamap_data_load_cbfn+0xb7 > bus_dmamap_load() at bus_dmamap_load+0x16c > tws_map_request() at tws_map_request+0x78 > tws_get_param() at tws_get_param+0xe1 > tws_display_ctlr_info() at tws_display_ctlr_info+0x4c > tws_init_ctlr() at tws_init_ctlr+0x6d > tws_attach() at tws_attach+0x68c > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_pci_attach() at acpi_pci_attach+0x164 > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_pcib_attach() at acpi_pcib_attach+0x1a7 > acpi_pcib_pci_attach() at acpi_pcib_pci_attach+0x9b > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_pci_attach() at acpi_pci_attach+0x164 > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_pcib_attach() at acpi_pcib_attach+0x1a7 > acpi_pcib_acpi_attach() at acpi_pcib_acpi_attach+0x1f6 > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_attach() at acpi_attach+0xbc1 > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > nexus_acpi_attach() at nexus_acpi_attach+0x69 > device_attach() at device_attach+0x72 > bus_generic_new_pass() at bus_generic_new_pass+0xd6 > bus_set_pass() at bus_set_pass+0x7a > configure() at configure+0xa > mi_startup() at mi_startup+0x77 > btext() at btext+0x2c > KDB: enter: panic > [ thread pid 0 tid 100000 ] > Stopped at kdb_enter+0x3b: movq $0,0x993262(%rip) > db> > > > int > tws_submit_command(struct tws_softc *sc, struct tws_request *req) > { > u_int32_t regl, regh; > u_int64_t mfa=0; > > /* > * mfa register read and write must be in order. > * Get the io_lock to protect against simultinous > * passthru calls > */ > mtx_lock(&sc->io_lock); > > if ( sc->obfl_q_overrun ) { > tws_init_obfl_q(sc); > } > > > > With no debugging in the kernel, it boots up fine > > pcib2: <ACPI PCI-PCI bridge> irq 17 at device 1.1 on pci0 > pci2: <ACPI PCI bus> on pcib2 > LSI 3ware device driver for SAS/SATA storage controllers, version: > 10.80.00.003 > tws0: <LSI 3ware SAS/SATA Storage Controller> port 0x4000-0x40ff mem > 0xc2460000-0xc2463fff,0xc2400000-0xc243ffff irq 17 at device 0.0 on pci2 > tws0: Using legacy INTx > tws0: Controller details: Model 9750-4i, 8 Phys, Firmware FH9X > 5.12.00.007, BIOS BE9X 5.11.00.006 > em0: <Intel(R) PRO/1000 Network Connection 7.3.2> port 0x5040-0x505f mem > 0xc2500000-0xc251ffff,0xc2570000-0xc2570fff irq 19 at device 25.0 on pci0 > em0: Using an MSI interrupt > em0: Ethernet address: 00:1e:67:45:b6:29 > ehci0: <EHCI (generic) USB 2.0 controller> mem 0xc2560000-0xc25603ff irq > 22 at device 26.0 on pci0 > usbus0: EHCI version 1.0 > usbus0 on ehci0 > > > tws0@pci0:2:0:0: class=0x010400 card=0x000113c1 chip=0x101013c1 > rev=0x05 hdr=0x00 > vendor = '3ware Inc' > device = '9750 SAS2/SATA-II RAID PCIe' > class = mass storage > subclass = RAID > bar [10] = type I/O Port, range 32, base 0x4000, size 256, enabled > bar [14] = type Memory, range 64, base 0xc2460000, size 16384, enabled > bar [1c] = type Memory, range 64, base 0xc2400000, size 262144, > enabled > cap 01[50] = powerspec 3 supports D0 D1 D2 D3 current D0 > cap 10[68] = PCI-Express 2 endpoint max data 128(4096) link x4(x8) > cap 03[d0] = VPD > cap 05[a8] = MSI supports 1 message, 64 bit > ecap 0001[100] = AER 1 1 fatal 0 non-fatal 0 corrected > ecap 0004[138] = unknown 1 > PCI-e errors = Fatal Error Detected > Unsupported Request Detected > Fatal = Unsupported Request > > > > > Also, any reason NOT to set hw.tws.enable_msi=1 in /boot/loader.conf ? > > ---Mike > > > > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, mike@sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/ > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
On 9/21/2012 4:59 PM, Jim Harris wrote:>> boot. Anyone know whats up ? Is this something that should be sent >> directly to LSI ? > > Through a code inspection, this mutex is being recursed whether or not > debugging is enabled. There is no code path here specific to > INVARIANTS. And the main IO path in this driver is always recursing > on this lock - it is not specific to the initialization callstack you > listed below. > > The best course of action seems to be initializing the lock with > MTX_RECURSE, since the driver seems to expect to be able to recurse on > the io_lock. Can you try the following patch? > > diff --git a/sys/dev/tws/tws.c b/sys/dev/tws/tws.c > index b1615db..d156d40 100644 > --- a/sys/dev/tws/tws.c > +++ b/sys/dev/tws/tws.c > @@ -197,7 +197,7 @@ tws_attach(device_t dev) > mtx_init( &sc->q_lock, "tws_q_lock", NULL, MTX_DEF); > mtx_init( &sc->sim_lock, "tws_sim_lock", NULL, MTX_DEF); > mtx_init( &sc->gen_lock, "tws_gen_lock", NULL, MTX_DEF); > - mtx_init( &sc->io_lock, "tws_io_lock", NULL, MTX_DEF); > + mtx_init( &sc->io_lock, "tws_io_lock", NULL, MTX_DEF | MTX_RECURSE); > > if ( tws_init_trace_q(sc) == FAILURE ) > printf("trace init failure\n");Thanks, that allows it to boot up now! pci2: <ACPI PCI bus> on pcib2 LSI 3ware device driver for SAS/SATA storage controllers, version: 10.80.00.003 tws0: <LSI 3ware SAS/SATA Storage Controller> port 0x4000-0x40ff mem 0xc2460000-0xc2463fff,0xc2400000-0xc243ffff irq 17 at device 0.0 on pci2 tws0: Using MSI tws0: Controller details: Model 9750-4i, 8 Phys, Firmware FH9X 5.12.00.007, BIOS BE9X 5.11.00.006 em0: <Intel(R) PRO/1000 Network Connection 7.3.2> port 0x5040-0x505f mem 0xc2500000-0xc251ffff,0xc2570000-0xc2570fff irq 19 at device 25.0 on pci0 . then a lot of . (probe65:tws0:0:65:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe65:tws0:0:65:0): CAM status: Invalid Target ID (probe65:tws0:0:65:0): Error 22, Unretryable error (probe1:tws0:0:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe1:tws0:0:1:0): CAM status: Invalid Target ID (probe1:tws0:0:1:0): Error 22, Unretryable error (probe2:tws0:0:2:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe2:tws0:0:2:0): CAM status: Invalid Target ID . . . (probe63:tws0:0:63:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe63:tws0:0:63:0): CAM status: Invalid Target ID (probe63:tws0:0:63:0): Error 22, Unretryable error (probe64:tws0:0:64:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:tws0:0:64:0): CAM status: Invalid Target ID (probe64:tws0:0:64:0): Error 22, Unretryable error da0 at tws0 bus 0 scbus0 target 0 lun 0 da0: <LSI 9750-4i DISK 5.12> Fixed Direct Access SCSI-5 device da0: 6000.000MB/s transfers da0: 953654MB (1953083392 512 byte sectors: 255H 63S/T 121573C) SMP: AP CPU #1 Launched! SMP: AP CPU #4 Launched!>> >> >> >> Also, any reason NOT to set hw.tws.enable_msi=1 in /boot/loader.conf ?Any thoughts on msi vs no msi ? Time to run some stress tests. Its certainly a fast little controller for the money! ---Mike
On Fri, Sep 21, 2012 at 5:37 PM, Mike Tancsa <mike@sentex.net> wrote:> On 9/21/2012 8:03 PM, Jim Harris wrote: >>> . >>> then a lot of >>> . >>> (probe65:tws0:0:65:0): INQUIRY. CDB: 12 0 0 0 24 0 >>> (probe65:tws0:0:65:0): CAM status: Invalid Target ID >>> (probe65:tws0:0:65:0): Error 22, Unretryable error >>> (probe1:tws0:0:1:0): INQUIRY. CDB: 12 0 0 0 24 0 >>> (probe1:tws0:0:1:0): CAM status: Invalid Target ID >>> (probe1:tws0:0:1:0): Error 22, Unretryable error >>> (probe2:tws0:0:2:0): INQUIRY. CDB: 12 0 0 0 24 0 >>> (probe2:tws0:0:2:0): CAM status: Invalid Target ID >>> . >>> . >>> . >>> (probe63:tws0:0:63:0): INQUIRY. CDB: 12 0 0 0 24 0 >>> (probe63:tws0:0:63:0): CAM status: Invalid Target ID >>> (probe63:tws0:0:63:0): Error 22, Unretryable error >>> (probe64:tws0:0:64:0): INQUIRY. CDB: 12 0 0 0 24 0 >>> (probe64:tws0:0:64:0): CAM status: Invalid Target ID >>> (probe64:tws0:0:64:0): Error 22, Unretryable error>> These can be ignored. CAM is just telling you that there are no >> devices attached at these target IDs.> What about a change similar to what Alexander Motin did in> http://lists.freebsd.org/pipermail/svn-src-head/2012-June/038196.htmlJim Harris <jimharris@freebsd.org> responded:> Ah, yes. I was thinking you had CAM_DEBUG enabled which is why you > were seeing this spew - but that's not the case. This indeed should > be fixed and not just ignored.> Seeing the attributions on Alexander's commit, you certainly seem to > have a monopoly on controllers that exhibit this problem on FreeBSD. > :)> I believe the CAM_LUN_INVALID here should be fixed as well, similar to > the twa commit. If you send me a revised patch I will commit it.The specific subject of this thread is not my issue, but I did notice problems apparently related to CAM on a SATA hard drive. I use one UFS partition, with FreeBSD 9.0-BETA1 installed (subsequently updated on another partition, using GPT as opposed to MBR), for ports tree and also NetBSD pkgsrc and NetBSD source code. I built NetBSD 5.1_STABLE i386 from FreeBSD and also built xorg-modular on the new NetBSD installation from pkgsrc. Going into and out of the newly installed Xorg resulted in some crashes with the FreeBSD 9.0-BETA1 partition mounted and not cleanly unmounted. File system was damaged, and FreeBSD fsck_ffs wouldn't fix it, went into a loop: Script started on Wed Sep 19 04:15:02 2012 fsck_ffs /dev/ada0p9 ** /dev/ada0p9 ** Last Mounted on /BETA1 ** Phase 1 - Check Blocks and Sizes CANNOT READ BLK: 7584192 CONTINUE? [yn] y THE FOLLOWING DISK SECTORS COULD NOT BE READ: 7584318, 7584319, ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1475900 files, 4638292 used, 21162419 free (61643 frags, 2637597 blocks, 0.2% fragmentation) ***** FILE SYSTEM STILL DIRTY ***** ***** PLEASE RERUN FSCK ***** Script done on Wed Sep 19 04:17:27 2012 This happened repeatedly, meaning an impasse. I didn't get to record preceding error messages relating to ATA and CAM but, seeing this last message, wonder if there are some bugs in the CAM. I booted that new NetBSD 5.1_STABLE i386 installation, on a USB stick, was able to mount that partition and see it wasn't trashed though there was a message about the dirty flag. I then umounted and ran NetBSD fsck_ffs successfully, just a few files were lost, and FreeBSD can access that partition again. I still intend to be more cautious when in NetBSD, not mounting a FreeBSD partition unnecessarily when doing something crash-prone on my system in NetBSD, such as going into and out of X. Tom
Reasonably Related Threads
- Strange reboot since 9.1
- [Fwd: Re: [Fwd: Re: Still ATAPICAM Lockup/Slowdown]]
- problem packaging S4 class that contains a slot of jobjRef class
- FreeBSD 9 "gptboot: invalid backup GPT header" error (boots fine though)
- help ith burncd (Input/output error, 6.1-RC, plextor PX-740a)