Panu Outinen
1998-Jan-20 17:28 UTC
Major problems with Samba 1.9.18p1 (HPUX 9.05) with oplocks
Hi there! I'm having major problems with the new oplocks feature of Samba 1.9.18p1. After running about a day or more Samba processes seem to stop responding or programs say that they can't open files that do exist and have been opened succesfully before. And after these symptomps come along I've had to restart Samba manually. Some database files were corrupted. I don't know for sure if by this oplock feature or my me killing the smbd processes at the wrong time but I suspect the oplock by its nature. There were about 10-15 hosts connected to this Samba server. On this problem host we use Samba mainly for our own database system since we are running heterogeneous network with many different operating systems (Aix, Solaris, SunOS, but nowadays mainly HPUX & Windows NT) that need to co-operate. The speedup from this oplock feature is great since this host has the database files for our own database system. Not going into details our database system three files form together a database. And these files are constantly opened and closed by programs run in Windows NT's or local or remote HPUX's. ------------------------------------------------------- I'm a software developer (mainly C-code) and I've myself compiled this version of Samba (like all the previous ones for the last 3-4 years!) with HP's ANSI C compiler. Makefile had the following edited lines: FLAGSM = -DHPUX -Aa -D_HPUX_SOURCE -D_POSIX_SOURCE -DFAST_SHARE_MODES LIBSM NOTE -DFAST_SHARE_MODES has been added by me since I've used this mode successfully before (previous successfully used Samba on this host was version 1.9.17p1). The UNIX-version I'm using is 9.05: $ uname -a HP-UX kimmov A.09.05 A 9000/712 2008139064 two-user license All the host using Samba were Windows NT's (3.51 or 4.0 with SPs). ------------------------------------------------------- I'm sorry to say but I've only used debug level 1 for the smbd processes. I looked the oplock code (server.c) and I propably should have had level 3 or even 5. But anyway when the problems started the log.smbd file kept getting following lines: ---------- cut from the beginning near the first error --------------- ... 01/20/1998 14:18:08 jukkav (192.4.0.44) connect to service all as user vertex (uid=201,gid=20) (pid 19178) 01/20/1998 14:18:30 iso_nt (192.4.0.194) connect to service all as user vertex (uid=201,gid=20) (pid 19181) 01/20/1998 14:18:30 oplock_break: receive_smb error (Connection reset by peer) 01/20/1998 14:18:30 oplock_break failed for file PROJEKTIT/setup/ICONS (fnum = 59, dev = 7201600, inode = 2b597). 01/20/1998 14:18:30 oplock_break: client failure in break - shutting down this smbd. 01/20/1998 14:18:30 iso_nt (192.4.0.194) closed connection to service all 01/20/1998 14:18:54 panu_nt (192.4.0.178) connect to service all as user vertex (uid=201,gid=20) (pid 19186) 01/20/1998 14:19:02 request_oplock_break: no response received to oplock break request to pid 19053 on port 1262 for dev = 7201600, inode = 2b597 01/20/1998 14:24:13 nicklaus (192.4.0.177) closed connection to service all 01/20/1998 14:24:33 juhana (192.4.0.179) closed connection to service all 01/20/1998 14:30:09 miikkal (192.4.0.18) closed connection to service all 01/20/1998 14:31:23 sirpa (192.4.0.60) closed connection to service all 01/20/1998 14:33:57 panu_nt (192.4.0.178) closed connection to service all 01/20/1998 14:34:23 panu_nt (192.4.0.178) connect to service all as user vertex (uid=201,gid=20) (pid 19537) 01/20/1998 14:36:31 markkuj (192.4.0.55) closed connection to service all 01/20/1998 14:36:44 miikkal (192.4.0.18) connect to service all as user vertex (uid=201,gid=20) (pid 19627) 01/20/1998 14:36:58 iso_nt (192.4.0.194) closed connection to service all 01/20/1998 14:40:07 reijo_nt (192.4.0.193) connect to service PROJEKTIT as user vertex (uid=201,gid=20) (pid 10743) 01/20/1998 14:44:37 nicklaus (192.4.0.177) connect to service all as user vertex (uid=201,gid=20) (pid 19758) 01/20/1998 14:46:45 juhana (192.4.0.179) connect to service all as user vertex (uid=201,gid=20) (pid 19852) 01/20/1998 14:50:59 reijo_nt (192.4.0.193) closed connection to service PROJEKTIT 01/20/1998 14:52:21 oplock_break: receive_smb error (Connection reset by peer) 01/20/1998 14:52:22 oplock_break failed for file PROJEKTIT/dbases/d_CALENDARc (fnum = 2, dev = 7201600, inode = b9ae). 01/20/1998 14:52:22 oplock_break: client failure in break - shutting down this smbd. 01/20/1998 14:52:22 iso_nt (192.4.0.194) closed connection to service all 01/20/1998 14:52:33 sirpa (192.4.0.60) connect to service all as user vertex (uid=201,gid=20) (pid 20009) 01/20/1998 14:53:15 jarmo (192.4.0.15) connect to service all as user vertex (uid=201,gid=20) (pid 20014) 01/20/1998 14:53:18 request_oplock_break: no response received to oplock break request to pid 18665 on port 1087 for dev = 7201600, inode = b9ae 01/20/1998 14:53:18 oplock_break: end of file from client 01/20/1998 14:53:18 oplock_break failed for file PROJEKTIT/dbases/d_CALENDARl (fnum = 57, dev = 7201600, inode = b9c2). 01/20/1998 14:53:18 oplock_break: client failure in break - shutting down this smbd. 01/20/1998 14:53:18 jarmo (192.4.0.15) closed connection to service all 01/20/1998 14:53:21 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b58c 01/20/1998 14:53:37 nicklaus (192.4.0.177) connect to service all as user vertex (uid=201,gid=20) (pid 20019) 01/20/1998 14:53:53 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b58c 01/20/1998 14:54:01 sirpa (192.4.0.60) connect to service all as user vertex (uid=201,gid=20) (pid 20025) 01/20/1998 14:54:25 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b58c 01/20/1998 14:54:33 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b595 01/20/1998 14:54:57 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b58c 01/20/1998 14:55:05 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b595 01/20/1998 14:55:29 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b58c 01/20/1998 14:55:37 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b595 01/20/1998 14:55:51 sirpa (192.4.0.60) connect to service all as user vertex (uid=201,gid=20) (pid 20124) 01/20/1998 14:56:01 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b58c 01/20/1998 14:56:09 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b595 01/20/1998 14:56:23 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b590 01/20/1998 14:56:33 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b58c 01/20/1998 14:56:36 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 5bef8 01/20/1998 14:56:41 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b595 01/20/1998 14:56:55 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b590 01/20/1998 14:57:03 miikkal (192.4.0.18) connect to service all as user vertex (uid=201,gid=20) (pid 20139) 01/20/1998 14:57:05 panu_nt (192.4.0.178) closed connection to service all 01/20/1998 14:57:05 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b58c 01/20/1998 14:57:13 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b595 01/20/1998 14:57:27 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b590 01/20/1998 14:57:35 request_oplock_break: no response received to oplock break request to pid 19627 on port 1456 for dev = 7201200, inode = 31002 01/20/1998 14:57:37 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b58c 01/20/1998 14:57:45 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b595 01/20/1998 14:57:59 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b590 01/20/1998 14:58:07 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 5bef8 01/20/1998 14:58:07 request_oplock_break: no response received to oplock break request to pid 19627 on port 1456 for dev = 7201200, inode = 31002 01/20/1998 14:58:17 miikkal (192.4.0.18) connect to service all as user vertex (uid=201,gid=20) (pid 20148) 01/20/1998 14:58:17 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b595 01/20/1998 14:58:31 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b590 01/20/1998 14:58:38 request_oplock_break: no response received to oplock break request to pid 20009 on port 1592 for dev = 7201600, inode = 59094 01/20/1998 14:58:39 request_oplock_break: no response received to oplock break request to pid 19627 on port 1456 for dev = 7201200, inode = 31002 01/20/1998 14:58:49 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b595 01/20/1998 14:58:57 jarmo (192.4.0.15) connect to service all as user vertex (uid=201,gid=20) (pid 20154) 01/20/1998 14:59:03 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b590 01/20/1998 14:59:11 request_oplock_break: no response received to oplock break request to pid 20009 on port 1592 for dev = 7201600, inode = 59094 01/20/1998 14:59:11 request_oplock_break: no response received to oplock break request to pid 19627 on port 1456 for dev = 7201200, inode = 31002 01/20/1998 14:59:11 sirpa (192.4.0.60) connect to service all as user vertex (uid=201,gid=20) (pid 20155) 01/20/1998 14:59:21 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b595 01/20/1998 14:59:29 request_oplock_break: no response received to oplock break request to pid 20009 on port 1592 for dev = 7201600, inode = 59094 01/20/1998 14:59:35 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b590 01/20/1998 14:59:43 request_oplock_break: no response received to oplock break request to pid 19627 on port 1456 for dev = 7201200, inode = 31002 01/20/1998 14:59:48 messu10 (192.4.0.243) closed connection to service all 01/20/1998 14:59:53 request_oplock_break: no response received to oplock break request to pid 20014 on port 1598 for dev = 7201600, inode = 26c9b 01/20/1998 14:59:53 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b595 01/20/1998 15:00:02 request_oplock_break: no response received to oplock break request to pid 20009 on port 1592 for dev = 7201600, inode = 59094 01/20/1998 15:00:07 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b590 01/20/1998 15:00:15 request_oplock_break: no response received to oplock break request to pid 19627 on port 1456 for dev = 7201200, inode = 31002 01/20/1998 15:00:19 jarmo (192.4.0.15) connect to service all as user vertex (uid=201,gid=20) (pid 20260) 01/20/1998 15:00:25 request_oplock_break: no response received to oplock break request to pid 20014 on port 1598 for dev = 7201600, inode = 26c9b 01/20/1998 15:00:26 request_oplock_break: no response received to oplock break request to pid 19758 on port 1523 for dev = 7201600, inode = 2b595 ... ---------- cut since similar data keeps coming --------------- I ran some smbstatus listings before restarting Samba and these showed that multiple smbd processes were created for each new host trying to connect or open files from this Samba file server. This can also be seen from this log listing (see e.g. host sirpa!). I had similar symptoms an another host (also HPUX 9.03) where I ran this same version 1.9.18p1. Here the process table got full with <defunct> (zombies) processes (Samba as the parent!) and had to be rebooted. This symptoms disappear when I added the 'dead time = 15' parameter!! I added this parameter also on this major problem host but it didn't help. --------------------------------------------- smb.conf: (I removed the comment lines!) [global] oplocks = true dead time = 15 workgroup = WORKGROUP comment = Samba Server guest account = vertex log level = 1 max log size = 200 case sensitive = no short preserve case = yes preserve case = yes lock directory = /usr/local/samba/var/locks locking = yes share modes = yes socket options = TCP_NODELAY [all] comment = Root Directory path = / writable = true create mode = 0777 directory mode = 0777 guest ok = yes [800] comment = 800 Directory path = /800 writable = true create mode = 0777 directory mode = 0777 guest ok = yes [PROJEKTIT] comment = PROJEKTIT Directory path = /PROJEKTIT writable = true create mode = 0777 directory mode = 0777 guest ok = yes --------------------------------------------- I've currently disabled the oplock feature by the pressure of the database users. So I'm not currently running the smbd processes with oplock with higher debug level. So is anyone else out there having similar problems? ------ Panu Outinen Tel. +358 3 318 2500 Vertex Systems Oy Fax +358 3 318 2450 Vaajakatu 9 http://www.vertex.fi 33720 Tampere, FINLAND email: Panu.Outinen@vertex.fi
Jeremy Allison
1998-Jan-20 19:38 UTC
Major problems with Samba 1.9.18p1 (HPUX 9.05) with oplocks
Panu Outinen wrote:> > Hi there! > > I'm having major problems with the new oplocks feature of Samba 1.9.18p1. > After running about a day or more Samba processes seem to stop responding > or programs say that they can't open files that do exist and have been > opened succesfully before. And after these symptomps come along I've had to > restart Samba manually. > > ... lines deleted.... > > I've currently disabled the oplock feature by the pressure of the database > users. So I'm not currently running the smbd processes with oplock with > higher debug level. So is anyone else out there having similar problems? >We've recently discovered that there is a deadlock condition in the p1 code with respect to the oplocks that can cause this behavior under load. The problem comes when smbd(1) tries to break an oplock that smbd(2) holds at the same time as smbd(2) tries to break an oplock that smbd(1) holds. This can also occur due to a 'ring' of oplock break requests, but the basic problem is the same. It's fairly rare and timing dependent, which is why it didn't show up in all the large scale testing we did before the 1.9.18 release. 1.9.18p1 fixed one artifact of the bug but the generic fix is to kill the deadlock. I'm currently coding up a fix for this (and have some test sites who are willing to test it - although more would be welcome) and once we have determined we have a definate fix then we'll release 1.9.18p2. In the meantime if you run into the problem just setting oplocks = no in the [global] section of the smb.conf will fix the problem (although Samba will be slower). Sorry for the problem, we're working on getting it fixed asap. Jeremy Allison, Samba Team. -- -------------------------------------------------------- Buying an operating system without source is like buying a self-assembly Space Shuttle with no instructions. --------------------------------------------------------