Hi everyone I have a four node samba/CTDB cluster exporting CIFS shares from a GPFS 3.5 cluster. Samba/CTDB is at 4.2.3: [root at hostname ~]# rpm -qa | grep sernet sernet-samba-ad-4.2.3-18.el6.x86_64 sernet-samba-libs-4.2.3-18.el6.x86_64 sernet-samba-common-4.2.3-18.el6.x86_64 sernet-samba-4.2.3-18.el6.x86_64 sernet-samba-ctdb-4.2.3-18.el6.x86_64 sernet-samba-client-4.2.3-18.el6.x86_64 sernet-samba-libwbclient-devel-4.2.3-18.el6.x86_64 sernet-build-key-1.1-4.noarch sernet-samba-libsmbclient0-4.2.3-18.el6.x86_64 sernet-samba-winbind-4.2.3-18.el6.x86_64 ..whilst GPFS is at version 3.5.0.22. On every server in the CTDB cluster we are seeing the /var/log/samba/cores/smbd folder filling up core.xxxxx files. As of writing this email, I can see 20-30 dump files being written every second (a few minutes later it's now calmed down). We have samba logging on level 1 at the moment. An output from one of the core dumps is as follows: [2016/03/10 10:05:58.334288, 0] ../source3/lib/dumpcore.c:318(dump_core) dumping core in /var/log/samba/cores/smbd [2016/03/10 10:05:58.620827, 0] ../source3/smbd/oplock.c:192(update_num_read_oplocks) PANIC: assert failed at ../source3/smbd/oplock.c(192): d->num_share_modes == 1 [2016/03/10 10:05:58.620905, 0] ../source3/lib/util.c:788(smb_panic_s3) PANIC (pid 25524): assert failed: d->num_share_modes == 1 [2016/03/10 10:05:58.621578, 0] ../source3/lib/util.c:899(log_stack_trace) BACKTRACE: 27 stack frames: #0 /usr/lib64/samba/libsmbconf.so.0(log_stack_trace+0x1c) [0x7fb908724c41] #1 /usr/lib64/samba/libsmbconf.so.0(smb_panic_s3+0x55) [0x7fb908724d43] #2 /usr/lib64/samba/libsamba-util.so.0(smb_panic+0x35) [0x7fb90a98239e] #3 /usr/lib64/samba/libsmbd-base-samba4.so(update_num_read_oplocks+0x9a) [0x7fb90a5af09e] #4 /usr/lib64/samba/libsmbd-base-samba4.so(+0x1058bd) [0x7fb90a5598bd] #5 /usr/lib64/samba/libsmbd-base-samba4.so(+0x1072e8) [0x7fb90a55b2e8] #6 /usr/lib64/samba/libsmbd-base-samba4.so(create_file_default+0x28b) [0x7fb90a55bf67] #7 /usr/lib64/samba/libsmbd-base-samba4.so(+0x1d9d5b) [0x7fb90a62dd5b] #8 /usr/lib64/samba/libsmbd-base-samba4.so(smb_vfs_call_create_file+0xd4) [0x7fb90a561c07] #9 /usr/lib64/samba/libsmbd-base-samba4.so(smbd_smb2_request_process_create+0x2063) [0x7fb90a590d10] #10 /usr/lib64/samba/libsmbd-base-samba4.so(smbd_smb2_request_dispatch+0xb9b) [0x7fb90a5884cd] #11 /usr/lib64/samba/libsmbd-base-samba4.so(+0x135bd2) [0x7fb90a589bd2] #12 /usr/lib64/samba/libsmbconf.so.0(run_events_poll+0x2c2) [0x7fb90873a24f] #13 /usr/lib64/samba/libsmbconf.so.0(+0x37697) [0x7fb90873a697] #14 /usr/lib64/samba/libtevent.so.0(_tevent_loop_once+0x92) [0x7fb909c388e7] #15 /usr/lib64/samba/libtevent.so.0(tevent_common_loop_wait+0x17) [0x7fb909c38952] #16 /usr/lib64/samba/libtevent.so.0(_tevent_loop_wait+0xa) [0x7fb909c386eb] #17 /usr/lib64/samba/libsmbd-base-samba4.so(smbd_process+0x91a) [0x7fb90a575882] #18 /usr/sbin/smbd(+0x93d9) [0x7fb90afd93d9] #19 /usr/lib64/samba/libsmbconf.so.0(run_events_poll+0x2c2) [0x7fb90873a24f] #20 /usr/lib64/samba/libsmbconf.so.0(+0x37697) [0x7fb90873a697] #21 /usr/lib64/samba/libtevent.so.0(_tevent_loop_once+0x92) [0x7fb909c388e7] #22 /usr/lib64/samba/libtevent.so.0(tevent_common_loop_wait+0x17) [0x7fb909c38952] #23 /usr/lib64/samba/libtevent.so.0(_tevent_loop_wait+0xa) [0x7fb909c386eb] #24 /usr/sbin/smbd(main+0x1922) [0x7fb90afdb1d1] #25 /lib64/libc.so.6(__libc_start_main+0xfd) [0x7fb90722bd5d] #26 /usr/sbin/smbd(+0x5e09) [0x7fb90afd5e09] [2016/03/10 10:05:58.622009, 0] ../source3/lib/dumpcore.c:318(dump_core) dumping core in /var/log/samba/cores/smbd None of that output looks obvious to me as to where to start troubleshooting. A bit of information about the user base: mix of Windows (biggest), Mac (smaller) and Linux (smallest) users all connecting via CIFS. Number of open files across the cluster can go up to 6000 or so. The number of shares exported is 6, with file/folder access in the filesystem controlled by ACLs. I need to know how I can debug what's causing the dumps. Please bear in mind I am not a developer or Linux minded (my background is Windows) but I'm getting by. If you ask me to recompile Samba with symbols for example, I'd say no :) Any advice will be gratefully received. Many thanks Richard
On Thu, Mar 10, 2016 at 10:12:03AM +0000, Sobey, Richard A wrote:> Hi everyone > > I have a four node samba/CTDB cluster exporting CIFS > shares from a GPFS 3.5 cluster. Samba/CTDB is at 4.2.3:This looks like https://bugzilla.samba.org/show_bug.cgi?id=11844 Can you try https://attachments.samba.org/attachment.cgi?id=12010 ? Thanks! Volker -- SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:kontakt at sernet.de
Thank you Volker. Unfortunately we have major problems with 4.2.11 now since we patched for badlock so this is having to take a backseat. Tbh I wouldn't know how to apply that patch anyway :) Cheers Richard -----Original Message----- From: Volker Lendecke [mailto:Volker.Lendecke at SerNet.DE] Sent: 24 April 2016 18:36 To: Sobey, Richard A <r.sobey at imperial.ac.uk> Cc: samba at lists.samba.org Subject: Re: [Samba] Debugging oplock.c core dumps On Thu, Mar 10, 2016 at 10:12:03AM +0000, Sobey, Richard A wrote:> Hi everyone > > I have a four node samba/CTDB cluster exporting CIFS shares from a > GPFS 3.5 cluster. Samba/CTDB is at 4.2.3:This looks like https://bugzilla.samba.org/show_bug.cgi?id=11844 Can you try https://attachments.samba.org/attachment.cgi?id=12010 ? Thanks! Volker -- SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:kontakt at sernet.de