Hello,
I am running Samba 2.2.2 with acl-0.7.16 on RedHat 6.2 (2.2.19).  The PDC is
a Windows 2000 Server and the Samba server is a domain member using Winbind.
All the workstations are Windoze 2000 Pro with SP2.
Everything seemingly works fine but every day or two I get a runaway SMBD
process which hogs the CPU and becomes unkillable.  The only resolution is
to reboot the server completely.  This has occurred at least once when a
workstation crashed but we have not proven that this is always the case.  I
was rather hoping that it was a Windoze problem to do with not having SP2
installed but this has now been disproved.
This is a serious problem.  I have seen postings here before about disabling
OPLOCKS but am reticent to do this becasue of the drop in performance which
could put cracks in our arguments for using Samba in the first place.  Also
I thought 2.2.2 had fixes for OPLOCK bugs!
Does anyone have any suggestions other than disabling OPLOCKS ?  Even a way
of killing the runaway process would be useful at this time ("kill -9"
has
no affect at all on the rogue SMBD or its children).
We could regress to a 'more stable' version but we would lose the
functionality of WINBIND which is important to this installation.
Any comments or advice much appreciated,
Noel Kelly
Output of ps ax | grep smb after all killable daemons have gone.
================================================================
 2456 ?        D      0:01 /usr/local/samba/bin/smbd -D
 2493 ?        R    1018:33 /usr/local/samba/bin/smbd -D
 2499 ?        D      0:00 /usr/local/samba/bin/smbd -D
 2501 ?        D      0:00 /usr/local/samba/bin/smbd -D
 2523 ?        D      0:00 /usr/local/samba/bin/smbd -D
 2530 ?        D      0:00 /usr/local/samba/bin/smbd -D
 2531 ?        D      0:01 /usr/local/samba/bin/smbd -D
 2541 ?        D      0:00 /usr/local/samba/bin/smbd -D
 2562 ?        D      0:00 /usr/local/samba/bin/smbd -D
 2563 ?        D      0:01 /usr/local/samba/bin/smbd -D
 2992 pts/0    S      0:00 grep smb
SMB.conf
=======
[global]
        workgroup = UK
        netbios name = BELLY
        server string = 2.2.2 Samba Server
        load printers = yes
        print command = /usr/bin/lpr -P%p -r %s
        invalid users = root bin uucp sys
        encrypt passwords = Yes
        update encrypted = Yes
        os level = 0
        preferred master = False
        local master = No
        domain master = False
        security = domain
        password server = BRAIN
        smb passwd file = /usr/local/samba/private/smbpasswd
        debug level = 1
        wins server = 192.168.5.4
        name resolve order = wins host bcast
        winbind uid = 10000-20000
        winbind gid = 10000-20000
        winbind enum users = yes
        winbind enum groups = yes
        winbind separator = +
        #template homedir = /raid/homedrives/%U
        nt acl support = yes
        # These oplock settings increase file access dramatically but
        # we might have to negate them if we experience run away smbd
 	  # processes
        oplocks = yes
        level2 oplocks = yes
[printers]
        printable = yes
        public = yes
        printer = lp
        printing = BSD
        read only = yes
        guest ok = yes
[homedrives]
        browseable =yes
        path=/raid/homes/
        writeable = yes
        create mask = 700
[profiles]
        browseable = yes
        path=/raid/profiles/
        writeable = yes
        create mask = 700
        inherit permissions = yes
[Shared]
        path = /raid/shared
        public = no
        read only = No
        inherit permissions = yes
        create mask = 777
        directory security mask = 777
        force create mode = 0
        force directory security mode = 0
        nt acl support = yes
[Apps]
        path = /raid/apps
        public = no
        read only = No
        inherit permissions = yes
        create mask = 777
        directory security mask = 777
        force create mode = 0
        force directory security mode = 0
        nt acl support = yes
Nasir Yilmaz (ATM/Network Grp. Bsk. Sistem Mühendisi)
2001-Nov-17  02:09 UTC
2.2.2 runaway SMBD process
Anybody use samba on Tru64 Unix 5.1 ?
On Sat, Nov 17, 2001 at 09:58:18AM -0000, Noel Kelly wrote:> Hello, > > I am running Samba 2.2.2 with acl-0.7.16 on RedHat 6.2 (2.2.19). The PDC is > a Windows 2000 Server and the Samba server is a domain member using Winbind. > All the workstations are Windoze 2000 Pro with SP2. > > Everything seemingly works fine but every day or two I get a runaway SMBD > process which hogs the CPU and becomes unkillable. The only resolution is > to reboot the server completely. This has occurred at least once when a > workstation crashed but we have not proven that this is always the case. I > was rather hoping that it was a Windoze problem to do with not having SP2 > installed but this has now been disproved. > > This is a serious problem. I have seen postings here before about disabling > OPLOCKS but am reticent to do this becasue of the drop in performance which > could put cracks in our arguments for using Samba in the first place. Also > I thought 2.2.2 had fixes for OPLOCK bugs! > > Does anyone have any suggestions other than disabling OPLOCKS ? Even a way > of killing the runaway process would be useful at this time ("kill -9" has > no affect at all on the rogue SMBD or its children). > > We could regress to a 'more stable' version but we would lose the > functionality of WINBIND which is important to this installation.Can you tell me if you're getting any errors in your log files ? When you say "unkillable" does this mean it doesn't respond to a kill -9 ? If so this is a kernel problem not a Samba problem. Thanks, Jeremy.
Jeremy, Below are the relevant parts of the log for the offending process (2493). Definitely looks like an oplock issue. Don't like the look of these lines: [2001/11/16 17:43:08, 0] lib/util.c:smb_panic(1055) PANIC: open_mode_check: Existant process 2493 left active oplock. [2001/11/16 17:43:09, 0] locking/locking.c:delete_fn(253) locking : delete_fn. LOGIC ERROR ! Entry for pid 2496 and it no longer exists ! And yes the process becomes entirely unkillable, not responding to even -9. I have read about a couple of other people having to restart the server as well with such runaway smbds. A totally wayward process and having to shut the machine down does not impress anyone! Much appreciate your input on this, Noel ================================================================== [2001/10/17 16:33:12, 1] smbd/reply.c:reply_sesssetup_and_X(1057) Username guest is invalid on this system ...skipping... brom414 (192.168.5.106) connect to service profiles as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:39:56, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service homedrives as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:39:59, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service Shared as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:39:59, 0] smbd/service.c:make_connection(239) brom414 (192.168.5.106) couldn't find service zrajnic [2001/11/16 17:39:59, 0] smbd/service.c:make_connection(239) brom414 (192.168.5.106) couldn't find service public [2001/11/16 17:40:00, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service homedrives as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:40:00, 0] lib/util_sock.c:read_socket_with_timeout(300) read_socket_with_timeout: timeout read. read error = Connection reset by peer. [2001/11/16 17:40:00, 0] smbd/oplock.c:oplock_break(782) oplock_break: receive_smb error (Connection reset by peer) oplock_break failed for file zrajnic/Application Data/Microsoft/Internet Explorer/Quick Launch/Launch Internet Explorer Browser.lnk (dev = 811, inode = 1409 077). [2001/11/16 17:40:00, 0] smbd/oplock.c:oplock_break(870) oplock_break: client failure in break - shutting down this smbd. [2001/11/16 17:40:00, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service Shared [2001/11/16 17:40:00, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service homedrives [2001/11/16 17:40:00, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service homedrives [2001/11/16 17:40:00, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service homedrives [2001/11/16 17:40:00, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service homedrives [2001/11/16 17:40:00, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service profiles [2001/11/16 17:40:32, 0] smbd/oplock.c:request_oplock_break(1026) request_oplock_break: no response received to oplock break request to pid 2492 on port 1120 for dev = 811, inode = 1409077 for dev = 811, inode = 1409077, tv_sec = 3bf54f1a, tv_usec = eff6a [2001/11/16 17:40:35, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service Shared as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:40:37, 0] smbd/nttrans.c:call_nt_transact_ioctl(1762) call_nt_transact_ioctl: Currently not implemented. [2001/11/16 17:40:57, 1] smbd/service.c:make_connection(610) hamo57 (192.168.5.104) connect to service Shared as user uk+zrajnic (uid=10040, gid=10000) (pid 2456) [2001/11/16 17:41:10, 1] smbd/service.c:close_cnum(650) brom414 (192.168.5.106) closed connection to service profiles [2001/11/16 17:41:30, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service homedrives as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:41:30, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service homedrives as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:41:32, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service profiles as user uk+zrajnic (uid=10040, gid=10000) (pid 2493) [2001/11/16 17:41:58, 1] smbd/service.c:close_cnum(650) hamo57 (192.168.5.104) closed connection to service Shared [2001/11/16 17:42:36, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service profiles as user uk+zrajnic (uid=10040, gid=10000) (pid 2496) [2001/11/16 17:43:08, 0] smbd/oplock.c:request_oplock_break(1026) request_oplock_break: no response received to oplock break request to pid 2493 on port 1121 for dev = 811, inode = 344118 for dev = 811, inode = 344118, tv_sec = 3bf54fcd, tv_usec = f2e5a [2001/11/16 17:43:08, 0] smbd/open.c:open_mode_check(555) open_mode_check: exlusive oplock left by process 2493 after break ! For file zrajnic/Application Data/Microsoft/Internet Explorer/prf1A.tmp, dev 811, inod e = 344118. Deleting it to continue... [2001/11/16 17:43:08, 0] lib/util.c:smb_panic(1055) PANIC: open_mode_check: Existant process 2493 left active oplock. [2001/11/16 17:43:09, 0] locking/locking.c:delete_fn(253) locking : delete_fn. LOGIC ERROR ! Entry for pid 2496 and it no longer exists ! [2001/11/16 17:43:09, 1] smbd/service.c:make_connection(610) brom414 (192.168.5.106) connect to service profiles as user uk+zrajnic (uid=10040, gid=10000) (pid 2497) [2001/11/16 17:43:41, 0] smbd/oplock.c:request_oplock_break(1026) request_oplock_break: no response received to oplock break request to pid 2493 on port 1121 for dev = 811, inode = 344118 for dev = 811, inode = 344118, tv_sec = 3bf54fcd, tv_usec = f2e5a [2001/11/16 17:44:13, 0] smbd/oplock.c:request_oplock_break(1026) request_oplock_break: no response received to oplock break request to pid 2493 on port 1121 for dev = 811, inode = 344118 for dev = 811, inode = 344118, tv_sec = 3bf54fcd, tv_usec = f2e5a ================================================================== -----Original Message----- From: jra@samba.org [mailto:jra@samba.org] Sent: 17 November 2001 22:47 To: Noel Kelly Cc: 'samba@lists.samba.org' Subject: Re: 2.2.2 runaway SMBD process On Sat, Nov 17, 2001 at 09:58:18AM -0000, Noel Kelly wrote:> Hello, > > I am running Samba 2.2.2 with acl-0.7.16 on RedHat 6.2 (2.2.19). The PDCis> a Windows 2000 Server and the Samba server is a domain member usingWinbind.> All the workstations are Windoze 2000 Pro with SP2. > > Everything seemingly works fine but every day or two I get a runaway SMBD > process which hogs the CPU and becomes unkillable. The only resolution is > to reboot the server completely. This has occurred at least once when a > workstation crashed but we have not proven that this is always the case.I> was rather hoping that it was a Windoze problem to do with not having SP2 > installed but this has now been disproved. > > This is a serious problem. I have seen postings here before aboutdisabling> OPLOCKS but am reticent to do this becasue of the drop in performancewhich> could put cracks in our arguments for using Samba in the first place.Also> I thought 2.2.2 had fixes for OPLOCK bugs! > > Does anyone have any suggestions other than disabling OPLOCKS ? Even away> of killing the runaway process would be useful at this time ("kill -9" has > no affect at all on the rogue SMBD or its children). > > We could regress to a 'more stable' version but we would lose the > functionality of WINBIND which is important to this installation.Can you tell me if you're getting any errors in your log files ? When you say "unkillable" does this mean it doesn't respond to a kill -9 ? If so this is a kernel problem not a Samba problem. Thanks, Jeremy.