thr3ads.net - freebsd stable - FreeBSD -STABLE servers repeatedly crashing. [Jun 2005]

If this information is useful, please help other people find it:
Share via:

Matt Juszczak

2005-Jun-27 05:01 UTC

FreeBSD -STABLE servers repeatedly crashing.

Hello all,

About three weeks ago, I upgraded my 5.3-RELEASE boxes to 5.4-RELEASE.  
I also turned on procmail globally on our mail server.  Here is our 
current FreeBSD server setup:

URANUS  -  primary ldap
CALIBAN -  secondary ldap
ORION     -  primary mail

Orion was the first one to crash, about three weeks ago.  Orion is 
constantly talking to uranus, because uranus is our primary ldap server 
(we have a planet scheme), and caliban is our secondary ldap server.  I 
ran an email flood test on orion to see if I could crash it again.  This 
time, the high requests on Uranus caused Uranus to crash. With two 
different servers on two different hardware setups crashing, I had to 
start thinking of what could be causing the problem.

Memory tests on both servers came back OK.  Orion had some ECC errors 
which it was able to fix.  I wasn't able to catch orion's first crash, 
but I was able to catch uranus's first crash:

http://paste.atopia.net/126

I have the other crashes written down in pencil at my work.  They all 
say mostly the same thing.  I assume Caliban would also experience this 
behavior, but because it does not receive much load at all (only does 
anything when uranus dies), I am not able to confirm this.

The only thing similar between the boxes is that all three have two 
processors in them, and are running SMP.  Orion had hyperthreading 
turned on but I disabled this in the bios, to no avail.

Someone with similar experiences running SMP informed to upgrade to 
-STABLE as of last week.  For almost a week, Orion ran fine.  This 
evening; however, Orion once again crashed, its fourth time in three 
weeks.  Uranus has been stable for a few days but I am expecting it to 
crash again any day now (they usually take between 4-6 days).

So now I am stuck.  I have two -STABLE machines which continue to cause 
kernel traps.  Tomorrow, I am going to compile a debugging kernel on 
orion and try to let it crash again to see what kind of errors it 
reports, but I was wondering if anyone else is experiencing these problems.

Thanks in advance,

Matt Juszczak

Gleb Smirnoff

2005-Jun-27 08:19 UTC

head link

FreeBSD -STABLE servers repeatedly crashing.

On Mon, Jun 27, 2005 at 01:01:09AM -0400, Matt Juszczak wrote:
M> About three weeks ago, I upgraded my 5.3-RELEASE boxes to 5.4-RELEASE.  
M> I also turned on procmail globally on our mail server.  Here is our 
M> current FreeBSD server setup:
M> 
M> URANUS  -  primary ldap
M> CALIBAN -  secondary ldap
M> ORION     -  primary mail
M> 
M> Orion was the first one to crash, about three weeks ago.  Orion is 
M> constantly talking to uranus, because uranus is our primary ldap server 
M> (we have a planet scheme), and caliban is our secondary ldap server.  I 
M> ran an email flood test on orion to see if I could crash it again.  This 
M> time, the high requests on Uranus caused Uranus to crash. With two 
M> different servers on two different hardware setups crashing, I had to 
M> start thinking of what could be causing the problem.
M> 
M> Memory tests on both servers came back OK.  Orion had some ECC errors 
M> which it was able to fix.  I wasn't able to catch orion's first
crash,
M> but I was able to catch uranus's first crash:
M> 
M> http://paste.atopia.net/126

Can you please build kernel with debugging and obtain a crashdump?


-- 
Totus tuus, Glebius.
GLEBIUS-RIPN GLEB-RIPE

Blaz Zupan

2005-Jul-06 16:10 UTC

head link

FreeBSD -STABLE servers repeatedly crashing.

On Wed, 6 Jul 2005, Kris Kennaway wrote:> Please obtain the backtrace with kgdb.
Here you go:

[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so:
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-marcel-freebsd".
#0  doadump () at pcpu.h:159
159	pcpu.h: No such file or directory.
 	in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:159
#1  0xc044b006 in db_fncall (dummy1=0, dummy2=0, dummy3=-1067606609,
dummy4=0xe4b6c9d0 "????(\205]?????????\222\a")
     at /usr/src5/sys/ddb/db_command.c:531
#2  0xc044ae14 in db_command (last_cmdp=0xc0674644, cmd_table=0x0,
aux_cmd_tablep=0xc064226c, aux_cmd_tablep_end=0xc0642270)
     at /usr/src5/sys/ddb/db_command.c:349
#3  0xc044aedc in db_command_loop () at /usr/src5/sys/ddb/db_command.c:455
#4  0xc044ca75 in db_trap (type=12, code=0) at /usr/src5/sys/ddb/db_main.c:221
#5  0xc04e6599 in kdb_trap (type=12, code=0, tf=0xe4b6cb3c) at
/usr/src5/sys/kern/subr_kdb.c:468
#6  0xc05f4c79 in trap_fatal (frame=0xe4b6cb3c, eva=36) at
/usr/src5/sys/i386/i386/trap.c:812
#7  0xc05f43e9 in trap (frame       {tf_fs = -1040580584, tf_es = -1029439472,
tf_ds = 16, tf_edi = -1038000128, tf_esi = -1066898900, tf_ebp = -457782384,
tf_isp = -457782424, tf_ebx = -1040530304, tf_edx = -1040524364, tf_ecx =
-1040524544, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1068574101, tf_cs
= 8, tf_eflags = 65683, tf_esp = 180, tf_ss = 0}) at
/usr/src5/sys/i386/i386/trap.c:255
#8  0xc05e283a in calltrap () at /usr/src5/sys/i386/i386/exception.s:140
#9  0xc1fa0018 in ?? ()
#10 0xc2a40010 in ?? ()
#11 0x00000010 in ?? ()
#12 0xc2216000 in ?? ()
#13 0xc0686a2c in tcbinfo ()
#14 0xe4b6cb90 in ?? ()
#15 0xe4b6cb68 in ?? ()
#16 0xc1fac480 in ?? ()
#17 0xc1fadbb4 in ?? ()
#18 0xc1fadb00 in ?? ()
#19 0x00000000 in ?? ()
#20 0x0000000c in ?? ()
#21 0x00000000 in ?? ()
#22 0xc04eda6b in propagate_priority (td=0xc2216000) at
/usr/src5/sys/kern/subr_turnstile.c:243
#23 0xc04ee225 in turnstile_wait (ts=0xc1fadb00, lock=0xc0686a2c,
owner=0xc2216000)
     at /usr/src5/sys/kern/subr_turnstile.c:556
#24 0xc04c5ced in _mtx_lock_sleep (m=0xc0686a2c, td=0xc1fac480, opts=0,
file=0x0, line=0)
     at /usr/src5/sys/kern/kern_mutex.c:552
#25 0xc0559ad8 in tcp_usr_rcvd (so=0x0, flags=0) at
/usr/src5/sys/netinet/tcp_usrreq.c:602
#26 0xc0506103 in soreceive (so=0xc27bf798, psa=0x0, uio=0xe4b6cc88, mp0=0x0,
controlp=0x0, flagsp=0x0)
     at /usr/src5/sys/kern/uipc_socket.c:1395
#27 0xc04f4bd9 in soo_read (fp=0x0, uio=0xe4b6cc88, active_cred=0xc2884a80,
flags=0, td=0xc1fac480)
     at /usr/src5/sys/kern/sys_socket.c:91
#28 0xc04ee865 in dofileread (td=0xc1fac480, fp=0xc2e17bb0, fd=10, buf=0x0,
nbyte=4096, offset=Unhandled dwarf expression opcode 0x93
) at file.h:233
#29 0xc04ee72f in read (td=0xc1fac480, uap=0xe4b6cd14) at
/usr/src5/sys/kern/sys_generic.c:107
#30 0xc05f4fe7 in syscall (frame       {tf_fs = 47, tf_es = 47, tf_ds =
-1078001617, tf_edi = 10, tf_esi = 300, tf_ebp = -1077942168, tf_isp =
-457781900, tf_ebx = 134822152, tf_edx = 0, tf_ecx = 10, tf_eax = 3, tf_trapno =
0, tf_err = 2, tf_eip = 672556795, tf_cs = 31, tf_eflags = 658, tf_esp =
-1077942212, tf_ss = 47}) at /usr/src5/sys/i386/i386/trap.c:1009
#31 0xc05e288f in Xint0x80_syscall () at /usr/src5/sys/i386/i386/exception.s:201
#32 0x0000002f in ?? ()
#33 0x0000002f in ?? ()
#34 0xbfbf002f in ?? ()
#35 0x0000000a in ?? ()
#36 0x0000012c in ?? ()
#37 0xbfbfe868 in ?? ()
#38 0xe4b6cd74 in ?? ()
#39 0x08093908 in ?? ()
#40 0x00000000 in ?? ()
#41 0x0000000a in ?? ()
#42 0x00000003 in ?? ()
#43 0x00000000 in ?? ()
#44 0x00000002 in ?? ()
#45 0x281666fb in ?? ()
#46 0x0000001f in ?? ()
#47 0x00000292 in ?? ()
#48 0xbfbfe83c in ?? ()
#49 0x0000002f in ?? ()
#50 0x00000000 in ?? ()
#51 0x00000000 in ?? ()
#52 0x00000000 in ?? ()
#53 0x00000000 in ?? ()
#54 0x2c75b000 in ?? ()
#55 0xc22de000 in ?? ()
#56 0xc1fac480 in ?? ()
#57 0xe4b6ccac in ?? ()
#58 0xe4b6cc94 in ?? ()
#59 0xc1f26000 in ?? ()
#60 0xc04ded13 in sched_switch (td=0x12c, newtd=0x8093908, flags=Cannot access
memory at address 0xbfbfe878
) at /usr/src5/sys/kern/sched_4bsd.c:881
Previous frame inner to this frame (corrupt stack?)
(kgdb) quit

Blaz Zupan

2005-Jul-17 11:17 UTC

head link

FreeBSD -STABLE servers repeatedly crashing.

On Tue, 12 Jul 2005, Matt Juszczak wrote:> So far a 13 day up time after switching from IPF to PF.  If thats not the 
> problem, I hope I find it soon considering this is a production server ... 
> but it seems to be more stable.
For me, 5 days up time after switching from IPF to PF. Before the switch a 
couple of hours of uptime was the maximum. Seems like the crashes are caused 
by ipfilter.

Matt Juszczak

2005-Jul-18 18:32 UTC

head link

FreeBSD -STABLE servers repeatedly crashing.

> For me, 5 days up time after switching from IPF to PF. Before the switch a 
> couple of hours of uptime was the maximum. Seems like the crashes are
caused
> by ipfilter.

Still same for me :)  Uptime almost 20 days now after switching to PF.

dick hoogendijk

2005-Jul-18 20:08 UTC

head link

FreeBSD -STABLE servers repeatedly crashing.

On Mon, 18 Jul 2005 14:32:09 -0400 (EDT)
Matt Juszczak <matt@atopia.net> wrote:
> > For me, 5 days up time after switching from IPF to PF. Before the
switch a
> > couple of hours of uptime was the maximum. Seems like the crashes are
caused
> > by ipfilter.
> 
> 
> Still same for me :)  Uptime almost 20 days now after switching to PF.
I find this messages kind of weird. Are you saying your servers only run long
periods of uptime with pf and *not* with ipf? I run a server and almost never
put it down. IPF performs very well, including a lot of natting for my home
network.

-- 
dick -- http://nagual.st/ -- PGP/GnuPG key: F86289CE
++ Running FreeBSD 4.11-stable ++ FreeBSD 5.4
+ Nai tiruvantel ar vayuvantel i Valar tielyanna nu vilja

Matt Juszczak

2005-Jul-18 20:10 UTC

head link

FreeBSD -STABLE servers repeatedly crashing.

> I find this messages kind of weird. Are you saying your servers only run
long periods of uptime with pf and *not* with ipf? I run a server and almost
never put it down. IPF performs very well, including a lot of natting for my
home network.
Correct.  IPF is unstable with our SMP (most of the time) - based 5.x 
boxes.  VERY unstable.  VERY VERY unstable.

-Matt

freebsd stable - Jun 2005 - FreeBSD -STABLE servers repeatedly crashing.

FreeBSD -STABLE servers repeatedly crashing.

FreeBSD -STABLE servers repeatedly crashing.

FreeBSD -STABLE servers repeatedly crashing.

FreeBSD -STABLE servers repeatedly crashing.

FreeBSD -STABLE servers repeatedly crashing.

FreeBSD -STABLE servers repeatedly crashing.

FreeBSD -STABLE servers repeatedly crashing.