We have a Celestica dual-Opteron system w/ 4GB RAM running
5.3-RELEASE/i386 (32-bit), and a SMP-aware kernel, which is experiencing
hard lockups. Debugging results below.
-=-
[BREAK]
KDB: enter: Line break on console
[thread 100104]
Stopped at kdb_enter+0x2b: nop
db> where
kdb_enter(c084e4c6) at kdb_enter+0x2b
siointr1(c507d800,c0946700,0,c084e28e,6ad) at siointr1+0xce
siointr(c507d800) at siointr+0x21
intr_execute_handlers(c4f5d490,e9826b80,4,e9826bd0,c07b2ae3) at
intr_execute_han
dlers+0x89
lapic_handle_intr(34) at lapic_handle_intr+0x2e
Xapic_isr1() at Xapic_isr1+0x33
--- interrupt, eip = 0xc0604456, esp = 0xe9826bc4, ebp = 0xe9826bd0 ---
_mtx_lock_sleep(c08f67c0,c5698640,0,c084a0b3,126) at _mtx_lock_sleep+0xc6
_mtx_lock_flags(c08f67c0,0,c084a0b3,126,c6a82738) at _mtx_lock_flags+0x48
vm_fault(c5bbd5dc,81ae000,2,8,c5698640) at vm_fault+0x1fe
trap_pfault(e9826d48,1,81ae000,81ae000,0) at trap_pfault+0xf2
trap(2f,2f,2f,2000,81ae000) at trap+0x1df
calltrap() at calltrap+0x5
--- trap 0xc, eip = 0x2809bd8d, esp = 0xbfbfb7b0, ebp = 0xbfbfb7e8 ---
db> panic
panic: from debugger
cpuid = 3
boot() called on cpu#3
Uptime: 2h50m29s
-=-
(then resetting the system causes a panic, and the system locks up for
good, and a power reset is required)
We were able to get a coredump, and the resulting kgdb output is below:
-=-
(kgdb) up
#45 0xc05f9bda in fork_exit (callout=0xc05fa5dc <ithread_loop>,
arg=0xc4fe7a00, frame=0xe8daed48) at ../../../kern/kern_fork.c:811
811 callout(arg, frame);
(kgdb) l
806 * cpu_set_fork_handler intercepts this function call to
807 * have this call a non-return function to stay in
kernel mode.
808 * initproc has its own fork handler, but it does return.
809 */
810 KASSERT(callout != NULL, ("NULL callout in
fork_exit"));
811 callout(arg, frame);
812
813 /*
814 * Check if a kernel thread misbehaved and returned from
its main
815 * function.
(kgdb) down
#44 0xc05fa6e8 in ithread_loop (arg=0xc4fe7a00)
at ../../../kern/kern_intr.c:547
547 ih->ih_handler(ih->ih_argument);
(kgdb) l
542
mtx_unlock(&ithd->it_lock);
543 goto restart;
544 }
545 if ((ih->ih_flags & IH_MPSAFE) ==
0)
546 mtx_lock(&Giant);
547 ih->ih_handler(ih->ih_argument);
548 if ((ih->ih_flags & IH_MPSAFE) ==
0)
549 mtx_unlock(&Giant);
550 }
551 if (ithd->it_enable != NULL) {
(kgdb) down
#43 0xc0615dfa in softclock (dummy=0x0) at ../../../kern/kern_timeout.c:247
247 mtx_lock(&Giant);
(kgdb) l
242 (c->c_flags &
~CALLOUT_PENDING);
243 }
244 curr_callout = c;
245 mtx_unlock_spin(&callout_lock);
246 if (!(c_flags & CALLOUT_MPSAFE)) {
247 mtx_lock(&Giant);
248 gcalls++;
249 CTR1(KTR_CALLOUT,
"callout %p", c_func);
250 } else {
251 mpcalls++;
-=-
It looks like it's trying to lock Giant while it already has Giant. In
any case, we have rebuilt a uniprocessor kernel for now. If this is
already fixed in 5-STABLE, then let me know. ;)
Best Wishes - Peter
--
Peter_Losher@isc.org | ISC | OpenPGP 0xE8048D08 | "The bits must flow"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20050219/0e8136e7/signature.bin