Hello, I am running Samba 2.2.2 with acl-0.7.16 on RedHat 6.2 (2.2.19). The PDC is a Windows 2000 Server and the Samba server is a domain member using Winbind. All the workstations are Windoze 2000 Pro with SP2. Everything seemingly works fine but every day or two I get a runaway SMBD process which hogs the CPU and becomes unkillable. The only resolution is to reboot the server completely. This has occurred at least once when a workstation crashed but we have not proven that this is always the case. I was rather hoping that it was a Windoze problem to do with not having SP2 installed but this has now been disproved. This is a serious problem. I have seen postings here before about disabling OPLOCKS but am reticent to do this becasue of the drop in performance which could put cracks in our arguments for using Samba in the first place. Also I thought 2.2.2 had fixes for OPLOCK bugs! Does anyone have any suggestions other than disabling OPLOCKS ? Even a way of killing the runaway process would be useful at this time ("kill -9" has no affect at all on the rogue SMBD or its children). We could regress to a 'more stable' version but we would lose the functionality of WINBIND which is important to this installation. Any comments or advice much appreciated, Noel Kelly Output of ps ax | grep smb after all killable daemons have gone. ================================================================ 2456 ? D 0:01 /usr/local/samba/bin/smbd -D 2493 ? R 1018:33 /usr/local/samba/bin/smbd -D 2499 ? D 0:00 /usr/local/samba/bin/smbd -D 2501 ? D 0:00 /usr/local/samba/bin/smbd -D 2523 ? D 0:00 /usr/local/samba/bin/smbd -D 2530 ? D 0:00 /usr/local/samba/bin/smbd -D 2531 ? D 0:01 /usr/local/samba/bin/smbd -D 2541 ? D 0:00 /usr/local/samba/bin/smbd -D 2562 ? D 0:00 /usr/local/samba/bin/smbd -D 2563 ? D 0:01 /usr/local/samba/bin/smbd -D 2992 pts/0 S 0:00 grep smb SMB.conf ======= [global] workgroup = UK netbios name = BELLY server string = 2.2.2 Samba Server load printers = yes print command = /usr/bin/lpr -P%p -r %s invalid users = root bin uucp sys encrypt passwords = Yes update encrypted = Yes os level = 0 preferred master = False local master = No domain master = False security = domain password server = BRAIN smb passwd file = /usr/local/samba/private/smbpasswd debug level = 1 wins server = 192.168.5.4 name resolve order = wins host bcast winbind uid = 10000-20000 winbind gid = 10000-20000 winbind enum users = yes winbind enum groups = yes winbind separator = + #template homedir = /raid/homedrives/%U nt acl support = yes # These oplock settings increase file access dramatically but # we might have to negate them if we experience run away smbd # processes oplocks = yes level2 oplocks = yes [printers] printable = yes public = yes printer = lp printing = BSD read only = yes guest ok = yes [homedrives] browseable =yes path=/raid/homes/ writeable = yes create mask = 700 [profiles] browseable = yes path=/raid/profiles/ writeable = yes create mask = 700 inherit permissions = yes [Shared] path = /raid/shared public = no read only = No inherit permissions = yes create mask = 777 directory security mask = 777 force create mode = 0 force directory security mode = 0 nt acl support = yes [Apps] path = /raid/apps public = no read only = No inherit permissions = yes create mask = 777 directory security mask = 777 force create mode = 0 force directory security mode = 0 nt acl support = yes
Nasir Yilmaz (ATM/Network Grp. Bsk. Sistem Mühendisi)
2001-Nov-17 02:09 UTC
2.2.2 runaway SMBD process
Anybody use samba on Tru64 Unix 5.1 ?
On Sat, Nov 17, 2001 at 09:58:18AM -0000, Noel Kelly wrote:> Hello, > > I am running Samba 2.2.2 with acl-0.7.16 on RedHat 6.2 (2.2.19). The PDC is > a Windows 2000 Server and the Samba server is a domain member using Winbind. > All the workstations are Windoze 2000 Pro with SP2. > > Everything seemingly works fine but every day or two I get a runaway SMBD > process which hogs the CPU and becomes unkillable. The only resolution is > to reboot the server completely. This has occurred at least once when a > workstation crashed but we have not proven that this is always the case. I > was rather hoping that it was a Windoze problem to do with not having SP2 > installed but this has now been disproved. > > This is a serious problem. I have seen postings here before about disabling > OPLOCKS but am reticent to do this becasue of the drop in performance which > could put cracks in our arguments for using Samba in the first place. Also > I thought 2.2.2 had fixes for OPLOCK bugs! > > Does anyone have any suggestions other than disabling OPLOCKS ? Even a way > of killing the runaway process would be useful at this time ("kill -9" has > no affect at all on the rogue SMBD or its children). > > We could regress to a 'more stable' version but we would lose the > functionality of WINBIND which is important to this installation.Can you tell me if you're getting any errors in your log files ? When you say "unkillable" does this mean it doesn't respond to a kill -9 ? If so this is a kernel problem not a Samba problem. Thanks, Jeremy.
Jeremy, Below are the relevant parts of the log for the offending process (2493). Definitely looks like an oplock issue. Don't like the look of these lines: [2001/11/16 17:43:08, 0] lib/util.c:smb_panic(1055) PANIC: open_mode_check: Existant process 2493 left active oplock. [2001/11/16 17:43:09, 0] locking/locking.c:delete_fn(253) locking : delete_fn. LOGIC ERROR ! Entry for pid 2496 and it no longer exists ! And yes the process becomes entirely unkillable, not responding to even -9. I have read about a couple of other people having to restart the server as well with such runaway smbds. A totally wayward process and having to shut the machine down does not impress anyone! Much appreciate your input on this, Noel ================================================================== [2001/10/17 16:33:12, 1] smbd/reply.c:reply_sesssetup_and_X(1057) Username guest is invalid on this system ...skipping... brom414 (192.168.5.106) connect to service profiles as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:39:56, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service homedrives as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:39:59, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service Shared as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:39:59, 0] smbd/service.c:make_connection(239) brom414 (192.168.5.106) couldn't find service zrajnic [2001/11/16 17:39:59, 0] smbd/service.c:make_connection(239) brom414 (192.168.5.106) couldn't find service public [2001/11/16 17:40:00, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service homedrives as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:40:00, 0] lib/util_sock.c:read_socket_with_timeout(300) read_socket_with_timeout: timeout read. read error = Connection reset by peer. [2001/11/16 17:40:00, 0] smbd/oplock.c:oplock_break(782) oplock_break: receive_smb error (Connection reset by peer) oplock_break failed for file zrajnic/Application Data/Microsoft/Internet Explorer/Quick Launch/Launch Internet Explorer Browser.lnk (dev = 811, inode = 1409 077). [2001/11/16 17:40:00, 0] smbd/oplock.c:oplock_break(870) oplock_break: client failure in break - shutting down this smbd. [2001/11/16 17:40:00, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service Shared [2001/11/16 17:40:00, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service homedrives [2001/11/16 17:40:00, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service homedrives [2001/11/16 17:40:00, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service homedrives [2001/11/16 17:40:00, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service homedrives [2001/11/16 17:40:00, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service profiles [2001/11/16 17:40:32, 0] smbd/oplock.c:request_oplock_break(1026) request_oplock_break: no response received to oplock break request to pid 2492 on port 1120 for dev = 811, inode = 1409077 for dev = 811, inode = 1409077, tv_sec = 3bf54f1a, tv_usec = eff6a [2001/11/16 17:40:35, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service Shared as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:40:37, 0] smbd/nttrans.c:call_nt_transact_ioctl(1762) call_nt_transact_ioctl: Currently not implemented. [2001/11/16 17:40:57, 1] smbd/service.c:make_connection(610) hamo57 (192.168.5.104) connect to service Shared as user uk+zrajnic (uid=10040, gid=10000) (pid 2456) [2001/11/16 17:41:10, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service profiles [2001/11/16 17:41:30, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service homedrives as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:41:30, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service homedrives as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:41:32, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service profiles as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:41:58, 1] smbd/service.c:close_cnum(650) hamo57 (192.168.5.104) closed connection to service Shared [2001/11/16 17:42:36, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service profiles as user uk+zrajnic (uid=10040, gid=10000) (pid 2496) [2001/11/16 17:43:08, 0] smbd/oplock.c:request_oplock_break(1026) request_oplock_break: no response received to oplock break request to pid 2493 on port 1121 for dev = 811, inode = 344118 for dev = 811, inode = 344118, tv_sec = 3bf54fcd, tv_usec = f2e5a [2001/11/16 17:43:08, 0] smbd/open.c:open_mode_check(555) open_mode_check: exlusive oplock left by process 2493 after break ! For file zrajnic/Application Data/Microsoft/Internet Explorer/prf1A.tmp, dev 811, inod e = 344118. Deleting it to continue... [2001/11/16 17:43:08, 0] lib/util.c:smb_panic(1055) PANIC: open_mode_check: Existant process 2493 left active oplock. [2001/11/16 17:43:09, 0] locking/locking.c:delete_fn(253) locking : delete_fn. LOGIC ERROR ! Entry for pid 2496 and it no longer exists ! [2001/11/16 17:43:09, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service profiles as user uk+zrajnic (uid=10040, gid=10000) (pid 2497) [2001/11/16 17:43:41, 0] smbd/oplock.c:request_oplock_break(1026) request_oplock_break: no response received to oplock break request to pid 2493 on port 1121 for dev = 811, inode = 344118 for dev = 811, inode = 344118, tv_sec = 3bf54fcd, tv_usec = f2e5a [2001/11/16 17:44:13, 0] smbd/oplock.c:request_oplock_break(1026) request_oplock_break: no response received to oplock break request to pid 2493 on port 1121 for dev = 811, inode = 344118 for dev = 811, inode = 344118, tv_sec = 3bf54fcd, tv_usec = f2e5a ================================================================== -----Original Message----- From: jra@samba.org [mailto:jra@samba.org] Sent: 17 November 2001 22:47 To: Noel Kelly Cc: 'samba@lists.samba.org' Subject: Re: 2.2.2 runaway SMBD process On Sat, Nov 17, 2001 at 09:58:18AM -0000, Noel Kelly wrote:> Hello, > > I am running Samba 2.2.2 with acl-0.7.16 on RedHat 6.2 (2.2.19). The PDCis> a Windows 2000 Server and the Samba server is a domain member usingWinbind.> All the workstations are Windoze 2000 Pro with SP2. > > Everything seemingly works fine but every day or two I get a runaway SMBD > process which hogs the CPU and becomes unkillable. The only resolution is > to reboot the server completely. This has occurred at least once when a > workstation crashed but we have not proven that this is always the case.I> was rather hoping that it was a Windoze problem to do with not having SP2 > installed but this has now been disproved. > > This is a serious problem. I have seen postings here before aboutdisabling> OPLOCKS but am reticent to do this becasue of the drop in performancewhich> could put cracks in our arguments for using Samba in the first place.Also> I thought 2.2.2 had fixes for OPLOCK bugs! > > Does anyone have any suggestions other than disabling OPLOCKS ? Even away> of killing the runaway process would be useful at this time ("kill -9" has > no affect at all on the rogue SMBD or its children). > > We could regress to a 'more stable' version but we would lose the > functionality of WINBIND which is important to this installation.Can you tell me if you're getting any errors in your log files ? When you say "unkillable" does this mean it doesn't respond to a kill -9 ? If so this is a kernel problem not a Samba problem. Thanks, Jeremy.