hello, I have a 5.4-p5 running on i386. Got a panic: panic: sbflush_locked: cc 0 || mb 0xc33bf000 || mbcnt 4294967040 It is an web server running Apache and Postfix as a backup MX. I'm using gmirror on all partitions and thus cannot get a dump (swap is on gmirror). Some ddb outputs are below. Google told me that http://lists.freebsd.org/pipermail/freebsd-current/2004-December/044535.html looks related. But the code path is different. Note that the patch in that mail is already in 5.4. If needed, I can provide kernel conf. I also tuned following sysctls: vfs.hirunningspace=2097152 kern.ipc.somaxconn=4096 kern.maxfiles=30000 kern.maxfilesperproc=30000 net.inet.ip.random_id=1 machdep.hyperthreading_allowed=1 The DDB messages go here: cpuid = 3 KDB: enter: panic [thread pid 61 tid 100061 ] Stopped at kdb_enter+0x2b: nop db> wh Tracing pid 61 tid 100061 td 0xc311e180 kdb_enter(c05f3bc6) at kdb_enter+0x2b panic(c05f6f09,0,c33bf000,ffffff00,c3a1970c) at panic+0x127 sbflush_locked(c3a1970c,c3a19654,e74aeba4,c04e4cb4,c3a1970c) at sbflush_locked+0x6f sbrelease_locked(c3a1970c,c3a19654) at sbrelease_locked+0xd sofree(c3a19654) at sofree+0x26c in_pcbdetach(c371d870,c3e996f0,c3e996f0,e74aec9c,c05355df) at in_pcbdetach+0xb6 tcp_close(c3e996f0,1,1,1042e,1) at tcp_close+0x16 tcp_input(c4513400,14,1c1e708c,0,0) at tcp_input+0x2297 ip_input(c4513400) at ip_input+0x4f1 netisr_processqueue(c0643298) at netisr_processqueue+0xa3 swi_net(0) at swi_net+0xf2 ithread_loop(c3094c80,e74aed48) at ithread_loop+0x159 fork_exit(c049c138,c3094c80,e74aed48) at fork_exit+0x75 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe74aed7c, ebp = 0 --- db> ps 61 c311ce20 0 0 0 0000204 [CPU 3] swi1: net Regards, Rong-En Fan
Rong-En Fan wrote:> hello, > > I have a 5.4-p5 running on i386. Got a panic: > panic: sbflush_locked: cc 0 || mb 0xc33bf000 || mbcnt 4294967040 > It is an web server running Apache and Postfix as a backup MX. > I'm using gmirror on all partitions and thus cannot get a dump (swap > is on gmirror). Some ddb outputs are below.I got a few similar panics. It looks that I managet to get rid of them by setting mpsafenet=0, but I am not sure -- I have to monitor it for a bit longer. I have managed to get a few dumps, so the traces are: ========================== N 1 ========================#0 doadump () at pcpu.h:159 #1 0xc0513885 in boot (howto=260) at ../../../kern/kern_shutdown.c:410 #2 0xc0513eca in panic (fmt=0xc06ac866 "sbflush_locked: cc %u || mb %p || mbcnt %u") at ../../../kern/kern_shutdown.c:566 #3 0xc05559a6 in sbflush_locked (sb=0xc28400b8) at ../../../kern/uipc_socket2.c:1119 #4 0xc05559ce in sbrelease_locked (sb=0xc28400b8, so=0x0) at ../../../kern/uipc_socket2.c:564 #5 0xc05525eb in sofree (so=0xc2840000) at ../../../kern/uipc_socket.c:405 #6 0xc05a56e1 in in_pcbdetach (inp=0xc2312654) at ../../../netinet/in_pcb.c:719 #7 0xc05b6284 in tcp_close (tp=0x0) at ../../../netinet/tcp_subr.c:783 #8 0xc05b2c13 in tcp_input (m=0xc1cff600, off0=-1625741474) at ../../../netinet/tcp_input.c:2286 #9 0xc05a9aff in ip_input (m=0xc1cff600) at ../../../netinet/ip_input.c:776 #10 0xc059214a in netisr_processqueue (ni=0xc070b0d8) at ../../../net/netisr.c:233 #11 0xc0592409 in swi_net (dummy=0x0) at ../../../net/netisr.c:346 #12 0xc04fb98d in ithread_loop (arg=0xc1979500) at ../../../kern/kern_intr.c:547 #13 0xc04fa9c8 in fork_exit (callout=0xc04fb8d6 <ithread_loop>, arg=0x0, frame=0x0) at ../../../kern/kern_fork.c:791 #14 0xc0656a7c in fork_trampoline () at ../../../i386/i386/exception.s:209 ============================================================ and ======================== N 2 ===============================#0 doadump () at pcpu.h:159 #1 0xc0513885 in boot (howto=260) at ../../../kern/kern_shutdown.c:410 #2 0xc0513eca in panic (fmt=0xc06989e7 "%s") at ../../../kern/kern_shutdown.c:566 #3 0xc0667756 in trap_fatal (frame=0xe686fa60, eva=12) at ../../../i386/i386/trap.c:817 #4 0xc06679e4 in trap_pfault (frame=0xe686fa60, usermode=0, eva=12) at ../../../i386/i386/trap.c:735 #5 0xc0667db3 in trap (frame {tf_fs = -427425768, tf_es = -1067253744, tf_ds = -1044447216, tf_edi = 16, tf_esi = 0, tf_ebp = -427361608, tf_isp = -427361652, tf_ebx = 40, tf_edx = -1044393868, tf_ecx = 0, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1068176275, tf_cs = 8, tf_eflags = 66050, tf_esp = -1044409808, tf_ss = -1044393868}) at ../../../i386/i386/trap.c:425 #6 0xc0656a1a in calltrap () at ../../../i386/i386/exception.s:140 #7 0xe6860018 in ?? () #8 0xc0630010 in zone_timeout (zone=0xc1bf9200) at ../../../vm/uma_core.c:418 #9 0xc05b44ad in tcp_output (tp=0xc23b6534) at ../../../netinet/tcp_output.c:811 #10 0xc05bc5ab in tcp_usr_send (so=0x0, flags=0, m=0xc1bf9200, nam=0x0, control=0x0, td=0xc1e33a80) at ../../../netinet/tcp_usrreq.c:699 #11 0xc0550fb4 in sosend (so=0xc228d8dc, addr=0x0, uio=0xe686fc80, top=0xc1bf9200, control=0x0, flags=0, td=0xc1e33a80) at ../../../kern/uipc_socket.c:835 #12 0xc053ed99 in soo_write (fp=0x0, uio=0xe686fc80, active_cred=0xc1fd9980, flags=0, td=0xc1e33a80) at ../../../kern/sys_socket.c:118 #13 0xc0537c15 in dofilewrite (td=0xc1e33a80, fp=0xc1fd7110, fd=0, buf=0x0, nbyte=56, offset=Unhandled dwarf expression opcode 0x93 ) at file.h:245 #14 0xc0537ea8 in write (td=0xc1e33a80, uap=0xe686fd14) at ../../../kern/sys_generic.c:282 #15 0xc06681fa in syscall (frame {tf_fs = -1078001617, tf_es = 47, tf_ds = -1078001617, tf_edi 138645504, tf_esi = 56, tf_ebp = -1077957448, tf_isp = -427360908, tf_ebx 675435700, tf_edx = 0, tf_ecx = 0, tf_eax = 4, tf_trapno = 22, tf_err = 2, tf_eip = 675424571, tf_cs = 31, tf_eflags = 646, tf_esp = -1077957476, tf_ss = 47}) at ../../../i386/i386/trap.c:1009 #16 0xc0656a6f in Xint0x80_syscall () at ../../../i386/i386/exception.s:201 #17 0xbfbf002f in ?? () #18 0x0000002f in ?? () #19 0xbfbf002f in ?? () #20 0x08439000 in ?? () #21 0x00000038 in ?? () ........... a bunch more of these ................ ============================================================ -- Best regards, Alexander.
On Mon, 25 Jul 2005, Rong-En Fan wrote:> I have a 5.4-p5 running on i386. Got a panic: panic: sbflush_locked: cc > 0 || mb 0xc33bf000 || mbcnt 4294967040 It is an web server running > Apache and Postfix as a backup MX. I'm using gmirror on all partitions > and thus cannot get a dump (swap is on gmirror). Some ddb outputs are > below.Is this system an SMP and/or HTT system? If this problem is reproduceable, could I ask you to capture the following serial console output from DDB: show pcpu show pcpu 0 show pcpu 1 show pcpu 2 show pcpu 3 # continue until out of CPU's ps And then traces of interesting threads -- in particular, threads mentioned in the pcpu output, the current thread, and threads of network-related processors (most importantly, the netisr thread, but also other active threads -- i.e., without a wchan listed). This sounds like a race between two threads in the TCP code, but to diagnose it further, I'll need to know what else is running. If you have access to serial gdb, I'd be quite interested in seeing the output of "l *so" in the sofree() frame, *tp in a tcp-related frame, and *inp if it's available in one of those frames, likely the in_pcbdetach() frame or tcp_close() frame if it's there. Would it be possible to add an extra ATA disk to use for swap and capturing a core dump? Robert N M Watson> > Google told me that > http://lists.freebsd.org/pipermail/freebsd-current/2004-December/044535.html > looks related. But the code path is different. Note that the patch in > that mail is already in 5.4. > > If needed, I can provide kernel conf. I also tuned following sysctls: > vfs.hirunningspace=2097152 > kern.ipc.somaxconn=4096 > kern.maxfiles=30000 > kern.maxfilesperproc=30000 > net.inet.ip.random_id=1 > machdep.hyperthreading_allowed=1 > > The DDB messages go here: > cpuid = 3 > KDB: enter: panic > [thread pid 61 tid 100061 ] > Stopped at kdb_enter+0x2b: nop > db> wh > Tracing pid 61 tid 100061 td 0xc311e180 > kdb_enter(c05f3bc6) at kdb_enter+0x2b > panic(c05f6f09,0,c33bf000,ffffff00,c3a1970c) at panic+0x127 > sbflush_locked(c3a1970c,c3a19654,e74aeba4,c04e4cb4,c3a1970c) at > sbflush_locked+0x6f > sbrelease_locked(c3a1970c,c3a19654) at sbrelease_locked+0xd > sofree(c3a19654) at sofree+0x26c > in_pcbdetach(c371d870,c3e996f0,c3e996f0,e74aec9c,c05355df) at in_pcbdetach+0xb6 > tcp_close(c3e996f0,1,1,1042e,1) at tcp_close+0x16 > tcp_input(c4513400,14,1c1e708c,0,0) at tcp_input+0x2297 > ip_input(c4513400) at ip_input+0x4f1 > netisr_processqueue(c0643298) at netisr_processqueue+0xa3 > swi_net(0) at swi_net+0xf2 > ithread_loop(c3094c80,e74aed48) at ithread_loop+0x159 > fork_exit(c049c138,c3094c80,e74aed48) at fork_exit+0x75 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip = 0, esp = 0xe74aed7c, ebp = 0 --- > db> ps > 61 c311ce20 0 0 0 0000204 [CPU 3] swi1: net > > Regards, > Rong-En Fan > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >
Hi After upgrading to 5-STABLE (about Aug 6), it works very good. With mpsafenet=1, it can work more than one day without panic. For 5.4-p5, it will panic at most half day or so. This bug seems fixed after 5.4 is released. I'll keep watching this machine. Will let you know if it still have similar panics ;-) Regards, Rong-En Fan On 7/29/05, Robert Watson <rwatson@freebsd.org> wrote:> On Mon, 25 Jul 2005, Rong-En Fan wrote: > > I have a 5.4-p5 running on i386. Got a panic: panic: sbflush_locked: cc > > 0 || mb 0xc33bf000 || mbcnt 4294967040 It is an web server running > > Apache and Postfix as a backup MX. I'm using gmirror on all partitions > > and thus cannot get a dump (swap is on gmirror). Some ddb outputs are > > below. >