Darn, I thought that that was already fixed. I'll go dig up my patches
and take care of this.
Scott
Pavel Merdin wrote:> Hello.
>
> There's a problem with a very busy server (ad server, CPU is close to
> 0% idle most of the time).
> Configuration: Dual AMD Opteron 252 2.6GHz
> Chipset: AMD 8131
> Integrated LAN Controller: Broadcom BCM5704 dual-channel GbE Gigabit
> Adaptec AIC-7902W Ultra 320 SCSI controller
> amr0: <LSILogic MegaRAID 1.53>
>
> We tried both 6.1-RELEASE and 6-STABLE amd64 kernels. (bge driver is
> always from recent stable with full Broadcom support).
>
> The server hangs one or more times a day. It even hangs for some time
> right after boot sequence finishes (when "login:" prompt
occurs).
> During a hang everything stops, even keyboard (interrupts).
>
> We already removed PREEMPTION and linux support.
> Sometimes the server can panic with:
> Sleeping thread (tid 100006, pid 4) owns a non-sleepable lock
> panic: sleeping thread
> cpuid=0
> KDB: enter: panic
> and hangs there without even starting a debugger.
> pid 4 seems to be [g_down]
>
> Today I compiled a kernel with INVARIANTS and WITTNESS.
> Right after booting sequence I got the following:
>
> Aug 10 04:37:09 ad1 kernel: lock order reversal: (Giant after
non-sleepable)
> Aug 10 04:37:09 ad1 kernel: 1st 0xffffff026c4ebe70 AMR List Lock (AMR List
Lock) @ dev/amr/amr.c:403
> Aug 10 04:37:09 ad1 kernel: 2nd 0xffffffff8073adc0 Giant (Giant) @
vm/vm_contig.c:579
> Aug 10 04:37:09 ad1 kernel: KDB: stack backtrace:
> Aug 10 04:37:09 ad1 kernel: kdb_backtrace() at kdb_backtrace+0x37
> Aug 10 04:37:09 ad1 kernel: witness_checkorder() at
witness_checkorder+0x6fb
> Aug 10 04:37:09 ad1 kernel: _mtx_lock_flags() at _mtx_lock_flags+0x9a
> Aug 10 04:37:09 ad1 kernel: contigmalloc() at contigmalloc+0x57
> Aug 10 04:37:09 ad1 kernel: alloc_bounce_pages() at alloc_bounce_pages+0x75
> Aug 10 04:37:09 ad1 kernel: bus_dmamap_create() at bus_dmamap_create+0x149
> Aug 10 04:37:09 ad1 kernel: amr_alloccmd_cluster() at
amr_alloccmd_cluster+0x102
> Aug 10 04:37:09 ad1 kernel: amr_alloccmd() at amr_alloccmd+0x55
> Aug 10 04:37:09 ad1 kernel: amr_bio_command() at amr_bio_command+0x27
> Aug 10 04:37:09 ad1 kernel: amr_startio() at amr_startio+0x6a
> Aug 10 04:37:09 ad1 kernel: amr_submit_bio() at amr_submit_bio+0x51
> Aug 10 04:37:09 ad1 kernel: amrd_strategy() at amrd_strategy+0x23
> Aug 10 04:37:09 ad1 kernel: g_disk_start() at g_disk_start+0x17d
> Aug 10 04:37:09 ad1 kernel: g_io_schedule_down() at
g_io_schedule_down+0x189
> Aug 10 04:37:09 ad1 kernel: g_down_procbody() at g_down_procbody+0x80
> Aug 10 04:37:09 ad1 kernel: fork_exit() at fork_exit+0xdf
> Aug 10 04:37:09 ad1 kernel: fork_trampoline() at fork_trampoline+0xe
> Aug 10 04:37:09 ad1 kernel: --- trap 0, rip = 0, rsp = 0xffffffffb8e8bd00,
rbp = 0 ---
>
> Any advice (except suggestion of switching to Linux) ?
>