Joe Murphy
2007-Jun-22 13:41 UTC
[Samba] Re: Intermittent "INTERNAL ERROR: Signal 11" with 3.0.24
Hi all Follow up to this post, we've been able to capture a gdb backtrace. Can anyone help with guidance as to what this means. See below: (gdb) bt #0 0xffffe410 in ?? () #1 0x00000001 in ?? () #2 0x00000000 in ?? () #3 0xbfffc9d8 in ?? () #4 0x402b36e3 in __waitpid_nocancel () from /lib/tls/libc.so.6 #5 0x4025ef58 in do_system () from /lib/tls/libc.so.6 #6 0x402268dd in system () from /lib/tls/libpthread.so.0 #7 0x0822b612 in smb_panic (why=0x0) at lib/util.c:1608 #8 0x08219b3f in fault_report (sig=-512) at lib/fault.c:47 #9 0x08219b50 in sig_fault (sig=-512) at lib/fault.c:70 #10 <signal handler called> #11 0x40292d1b in strlen () from /lib/tls/libc.so.6 #12 0x40268242 in vfprintf () from /lib/tls/libc.so.6 #13 0x40285e76 in vsnprintf () from /lib/tls/libc.so.6 #14 0x08219956 in dbgtext (format_str=0x6d2e5c73 "") at lib/debug.c:1011 #15 0x0825b360 in oplock_timeout_handler (te=0x844ce10, now=0xbfffd9c0, private_data=0x84492f0) at smbd/oplock.c:351 #16 0x08242d7d in run_events () at lib/events.c:102 #17 0x080f2801 in receive_message_or_smb (buffer=0x40433008 "", buffer_len=131137, timeout=60000) at smbd/process.c:457 #18 0x080f4122 in smbd_process () at smbd/process.c:1649 #19 0x082beea9 in main (argc=1831754867, argv=0xbfffdd34) at smbd/server.c:1024 This is similar to the following panic message recorded in syslog: Jun 13 12:57:29 uhti02 smbd[16322]: [2007/06/13 12:57:29, 0] smbd/oplock.c:oplock_timeout_handler(351) Jun 13 12:57:29 uhti02 smbd[16322]: [2007/06/13 12:57:29, 0] lib/fault.c:fault_report(41) Jun 13 12:57:29 uhti02 smbd[16322]: ==============================================================Jun 13 12:57:29 uhti02 smbd[16322]: [2007/06/13 12:57:29, 0] lib/fault.c:fault_report(42) Jun 13 12:57:29 uhti02 smbd[16322]: INTERNAL ERROR: Signal 11 in pid 16322 (3.0.24-SerNet-SuSE) Jun 13 12:57:29 uhti02 smbd[16322]: Please read the Trouble-Shooting section of the Samba3-HOWTO Jun 13 12:57:29 uhti02 smbd[16322]: [2007/06/13 12:57:29, 0] lib/fault.c:fault_report(44) Jun 13 12:57:29 uhti02 smbd[16322]: Jun 13 12:57:29 uhti02 smbd[16322]: From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf Jun 13 12:57:29 uhti02 smbd[16322]: [2007/06/13 12:57:29, 0] lib/fault.c:fault_report(45) Jun 13 12:57:29 uhti02 smbd[16322]: ==============================================================Jun 13 12:57:29 uhti02 smbd[16322]: [2007/06/13 12:57:29, 0] lib/util.c:smb_panic(1599) Jun 13 12:57:29 uhti02 smbd[16322]: PANIC (pid 16322): internal error Jun 13 12:57:29 uhti02 smbd[16322]: [2007/06/13 12:57:29, 0] lib/util.c:log_stack_trace(1706) Jun 13 12:57:29 uhti02 smbd[16322]: BACKTRACE: 14 stack frames: Jun 13 12:57:29 uhti02 smbd[16322]: #0 /usr/sbin/smbd(log_stack_trace+0x22) [0x822b6fb] Jun 13 12:57:29 uhti02 smbd[16322]: #1 /usr/sbin/smbd(smb_panic+0x6f) [0x822b59a] Jun 13 12:57:29 uhti02 smbd[16322]: #2 /usr/sbin/smbd [0x8219b3f] Jun 13 12:57:29 uhti02 smbd[16322]: #3 /usr/sbin/smbd [0x8219b50] Jun 13 12:57:29 uhti02 smbd[16322]: #4 [0xffffe420] Jun 13 12:57:29 uhti02 smbd[16322]: #5 /lib/tls/libc.so.6(vsnprintf+0xb6) [0x40285e76] Jun 13 12:57:29 uhti02 smbd[16322]: #6 /usr/sbin/smbd(dbgtext+0x2e) [0x8219956] Jun 13 12:57:29 uhti02 smbd[16322]: #7 /usr/sbin/smbd [0x825b360] Jun 13 12:57:29 uhti02 smbd[16322]: #8 /usr/sbin/smbd(run_events+0x15f) [0x8242d7d] Jun 13 12:57:29 uhti02 smbd[16322]: #9 /usr/sbin/smbd [0x80f2801] Jun 13 12:57:29 uhti02 smbd[16322]: #10 /usr/sbin/smbd(smbd_process+0x10e) [0x80f4122] Jun 13 12:57:29 uhti02 smbd[16322]: #11 /usr/sbin/smbd(main+0x946) [0x82beea9] Jun 13 12:57:29 uhti02 smbd[16322]: #12 /lib/tls/libc.so.6(__libc_start_main+0xd0) [0x40240210] Jun 13 12:57:29 uhti02 smbd[16322]: #13 /usr/sbin/smbd [0x808ceb1] Jun 13 12:57:29 uhti02 smbd[16322]: [2007/06/13 12:57:29, 0] lib/util.c:smb_panic(1607) Jun 13 12:57:29 uhti02 smbd[16322]: smb_panic(): calling panic action [/bin/sleep 90000] Versions: Kernel: 2.6.5-7.97-bigsmp smbd, nmbd, winbindd: Version 3.0.24-SerNet-SuSE As I said earlier this problem occurs intermittently every 2-3 days, in 2 separate Samba installations, and when it occurs Samba requires a restart to clear. Much appreciated. Joe ----- Original Message Follows -----> Hi Samba list, > > We're experiencing some issues with our Samba 3.0.24 > environments. Hopefully somebody can offer suggestions or > guidance. > > A bit of background. We have 3 application environments, > which consist of a Samba host providing file sharing > services to 7 Windows application servers. > > These Samba hosts intermittently experiencing problem > providing file sharing. So far we haven't established a > pattern with the failures, so for now the best we can > establish is that every couple of days a Samba host will > experience a Internal Error (signal 11) in an smbd > process. From that point onwards the smbd process will > operate unreliability such that Windows clients will > generally not be able to connect to the share, file copies > that were underway will abort with errors, etc. All this > will require a restart of the Linux host to clear, and > once restarted things are fine. > > All three environments are the same for hardware/OS and > software. They operate independently of each other. All > experience the same issue. Other than this issue we do not > experience any other Samba problems, the file shares run > without problems, until a signal 11 occurs. > > - SuSE Enterprise Linux 9 (2.6.5-7.97-bigsmp) > - Samba 3.0.24 > - /data (total 1TB, .5TB in use) - /dev/sdc1 type ext3 > (rw,acl,user_xattr) > > The signal 11 crashes appear to have started following our > upgrading to Samba 3.0.24 in March 2007. > > Example message attached in signal_11.txt. I've attached > these instead of placing inline as my webmail has fixed > width formatting which messes up the syslog line - hope > this is okay. > > Things we've tested: > > - fsck > - testparm > - Samba config changes: > kernel oplocks = no > oplocks = False > level2 oplocks = False > > I though I'd preemptively post this to the mailing list to > see if anyone has experienced similar issues. I will post > some 'gdb smb PID' output once I'm able to catch it. > > Our suspicion is that this occurs under load, though we've > not yet been able to reproduce the problem under testing. > Upgrading to 3.0.25 is an option, although we'd like to do > this once we more clearly identified the cause and fix. > > Finally, an example of the volume of errors we're > experiencing (from a single host) is attached in > volume.txt. > > Happy to post other info. > > Kind regards > Joe Murphy > Info Systems Technical Team > joe.murphy@clear.net.nz > > > [Attachment: signal_11.txt] > [Attachment: volume.txt]