Hello,
I am trying to deploy a FreeBSD6 machine using if_bridge and pf to
provide some protection for our Windows servers ;)
During testing it seemed to work fine, but in production it panics frequently.
When I came in to work this morning it was complaining about mbufs, so
I have tried increasing the number of mbufs as explained in various
places around the web. Despite the fact that it didn't seem to be
getting near the limit of mbufs I set (8192) it has paniced another
couple of times this morning already. I have now set it to 16k, but I
don't currently hold much hope of this fixing it.
The machine is a Dell Poweredge server which until recently was our
Exchange server which ran with no problems, so I do not suspect
hardware - except possibly the network cards I have added. (I added an
fxp card last night before deploying it, the other cards are an
onboard em and two PCI xl. The fxp and xl cards are bridged, the em
card is used for management.)
I have a selection of pf rules declared against the xl0 and fxp0
interfaces (which represent two feeds from our border router). xl1 is
connected to a switch with our external-facing servers on it. (Is this
the problem? Should I separate the two subnets on the 'internal' side
of the bridge too?)
Anyway, on to the debug output!
Fatal trap 12: page fault while in kernel mode
fault virtual address = 0xc
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc0512043
stack pointer = 0x28:0xd6a12aa8
frame pointer = 0x28:0xd6a12acc
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 599 (sshd)
trap number = 12
panic: page fault
Uptime: 25m10s
(kgdb) bt
#0 doadump () at pcpu.h:165
#1 0xc04dc9be in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2 0xc04dcc54 in panic (fmt=0xc062af0b "%s") at
/usr/src/sys/kern/kern_shutdown.c:555
#3 0xc060cde8 in trap_fatal (frame=0xd6a12a68, eva=12) at
/usr/src/sys/i386/i386/trap.c:831
#4 0xc060cb53 in trap_pfault (frame=0xd6a12a68, usermode=0, eva=12)
at /usr/src/sys/i386/i386/trap.c:742
#5 0xc060c7b1 in trap (frame {tf_fs = -1046544376, tf_es = -1045037016,
tf_ds = -694091736,
tf_edi = -1044162816, tf_esi = 60, tf_ebp = -694080820, tf_isp -694080876,
tf_ebx = 5840, tf_edx = 0, tf_ecx = -1044162816, tf_eax 0, tf_trapno = 12,
tf_err = 0, tf_eip = -1068425149, tf_cs = 32,
tf_eflags = 66054, tf_esp = 0, tf_ss = -694080816}) at
/usr/src/sys/i386/i386/trap.c:432
#6 0xc05fc89a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7 0xc0512043 in m_copym (m=0x0, off0=100, len=420, wait=1) at
/usr/src/sys/kern/uipc_mbuf.c:386
#8 0xc056524e in tcp_output (tp=0xc1a52398) at
/usr/src/sys/netinet/tcp_output.c:774
#9 0xc056c8e1 in tcp_usr_send (so=0xc18592c8, flags=0, m=0xc1900800,
nam=0x0, control=0x0, td=0xc19f7900) at
/usr/src/sys/netinet/tcp_usrreq.c:697
#10 0xc0514fa7 in sosend (so=0xc18592c8, addr=0x0, uio=0xd6a12cbc,
top=0xc1900800, control=0x0, flags=0, td=0xc19f7900) at
/usr/src/sys/kern/uipc_socket.c:829
#11 0xc0503dc2 in soo_write (fp=0x0, uio=0xd6a12cbc,
active_cred=0xc1bc4b00, flags=0, td=0xc19f7900) at
/usr/src/sys/kern/sys_socket.c:118
#12 0xc04fdfc3 in dofilewrite (td=0xc19f7900, fd=3, fp=0xc17ee5a0,
auio=0xd6a12cbc, offset=Unhandled dwarf expression opcode 0x93
) at file.h:246
#13 0xc04fde67 in kern_writev (td=0xc19f7900, fd=3, auio=0xd6a12cbc)
at /usr/src/sys/kern/sys_generic.c:402
#14 0xc04fdd8d in write (td=0xc19f7900, uap=0xc1c35700) at
/usr/src/sys/kern/sys_generic.c:326
#15 0xc060d0ff in syscall (frame {tf_fs = 134676539, tf_es = 134676539,
tf_ds = -1078001605,
tf_edi = -1077944696, tf_esi = 134679144, tf_ebp = -1077944824, tf_isp
= -694080156, tf_ebx = 671914192, tf_edx = 134679144, tf_ecx = 420,
tf_eax = 4, tf_trapno = 0, tf_err = 2, tf_eip = 673921043, tf_cs = 51,
tf_eflags = 518, tf_esp = -1077944852, tf_ss = 59})
at /usr/src/sys/i386/i386/trap.c:976
#16 0xc05fc8ef in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
#17 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
Another panic happened in process "irq17: em0" with backtrace:
#0 doadump () at pcpu.h:165
#1 0xc04dc9be in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2 0xc04dcc54 in panic (fmt=0xc062af0b "%s") at
/usr/src/sys/kern/kern_shutdown.c:555
#3 0xc060cde8 in trap_fatal (frame=0xd54c4bc0, eva=3217328832) at
/usr/src/sys/i386/i386/trap.c:831
#4 0xc060cb53 in trap_pfault (frame=0xd54c4bc0, usermode=0,
eva=3217328832) at /usr/src/sys/i386/i386/trap.c:742
#5 0xc060c7b1 in trap (frame {tf_fs = -2147483640, tf_es = -1900609496,
tf_ds = -1050804184,
tf_edi = -1050345296, tf_esi = -1048680360, tf_ebp = -716420016,
tf_isp = -716420116, tf_ebx = -1047532288, tf_edx = 0, tf_ecx 304808455, tf_eax
= 74416, tf_trapno = 12, tf_err = 0, tf_eip -1067474263, tf_cs = 32, tf_eflags =
590338, tf_esp = 4, tf_ss -1050896896})
at /usr/src/sys/i386/i386/trap.c:432
#6 0xc05fc89a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7 0xc05fa2a9 in bus_dmamap_load (dmat=0xc1845980, map=0x122b0,
buf=0x122b0207, buflen=2046, callback=0xc046c4d8 <em_dmamap_cb>,
callback_arg=0xd54c4c74, flags=0) at pmap.h:200
#8 0xc046cdc2 in em_get_buf (i=11, adapter=0xc164c800, nmp=0x0) at
/usr/src/sys/dev/em/if_em.c:2474
#9 0xc046d593 in em_process_receive_interrupts (adapter=0xc164c800,
count=-2) at /usr/src/sys/dev/em/if_em.c:2797
#10 0xc046a7ed in em_intr (arg=0xc164c800) at /usr/src/sys/dev/em/if_em.c:992
#11 0xc04c85b5 in ithread_loop (arg=0xc1586580) at
/usr/src/sys/kern/kern_intr.c:547
#12 0xc04c783c in fork_exit (callout=0xc04c845c <ithread_loop>,
arg=0xc1586580, frame=0xd54c4d38) at /usr/src/sys/kern/kern_fork.c:789
#13 0xc05fc8fc in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208
Now I look at it, the two panics are quite dissimilar. Should I
suspect the network cards?
Another dump previous to that one occured in irq18: fxp0. Now I'm
suspicious. (Excuse my thinking out loud.)
Other than pulling cards and seeing which ones stop it panicing, can
anyone suggest things that may be going wrong? Is the combination of
if_bridge and pf used in this fashion valid?
Thanks,
James