Over the past 2 days, my smbd processes are multiplying worse than rabbits. At least with rabbits it's easy to tell why you wind up with so many... Here's the setup: Samba version 3.2.8-0.24 installed via rpm on a FC6 box with 2 nics, both on the same subnet. The samba server acts as a PDC in a mixed environment but mostly windows pc's running xp. It is used mostly for file (Quickbooks) and print services. We have a small home business network with usually only 3 accounts connected, and no more than 10 open files at any given time. I noticed yesterday that I was having trouble logging in from one machine - share's not available, etc. So I checked the daemon to make sure that it was running an lo and behold - there were over 400 instances of smbd. It took two attempts to stop them all from the init script - the first run seemed to only kill about 2 or 3 and the second one finished the job. So I restarted and monitored the situation and all seemed okay for a few hours and then after about 3 hours they started multiplying again until I restarted the daemon again. There was just normal file activity during the couple of hours that it took to go from 3 to over 400 pid's. I increased the log level to 3 but all I am seeing is the normal wins, nmbd connections. I am seeing a few errors but nothing significant I don't think - tell me if I'm wrong - but nothing in these log files that correlate to the timing of the runaway processes: [2009/02/11 16:30:16, 2] smbd/server.c:open_sockets_smbd(580) waiting for a connection *** WARNING *** The programme 'smbd' uses the Apple Bonjour compatiblity layer of Avahi. *** WARNING *** Please fix your application to use the native API of Avahi! *** WARNING *** For more information see <http://0pointer.de/avahi-compat?s=libdns_sd&e=smbd> [2009/02/12 17:12:32, 3] lib/util_sock.c:interpret_string_addr_internal(122) interpret_string_addr_internal: getaddrinfo failed for name :: [Address family for hostname not supported] Feb 12 10:07:01 yoda smbd[18495]: [2009/02/12 10:07:01, 0] lib/util_sock.c:read_socket_with_timeout(939) Feb 12 10:07:01 yoda smbd[18495]: [2009/02/12 10:07:01, 0] lib/util_sock.c:get_peer_addr_internal(1676) Feb 12 10:07:01 yoda smbd[18495]: getpeername failed. Error was Transport endpoint is not connected Feb 12 10:07:01 yoda smbd[18495]: read_socket_with_timeout: client 0.0.0.0 read error = Connection reset by peer. Feb 12 10:07:01 yoda smbd[18495]: [2009/02/12 10:07:01, 0] lib/util_sock.c:write_data(1136) Feb 12 10:07:01 yoda smbd[18495]: [2009/02/12 10:07:01, 0] lib/util_sock.c:get_peer_addr_internal(1676) Feb 12 10:07:01 yoda smbd[18495]: getpeername failed. Error was Transport endpoint is not connected Feb 12 10:07:01 yoda smbd[18495]: write_data: write failure in writing to client 0.0.0.0. Error Broken pipe Feb 12 10:07:01 yoda smbd[18495]: [2009/02/12 10:07:01, 0] smbd/process.c:srv_send_smb(74) Feb 12 10:07:01 yoda smbd[18495]: Error writing 75 bytes to client. -1. (Transport endpoint is not connected) Feb 12 11:44:29 yoda smbd[20241]: [2009/02/12 11:44:29, 0] smbd/nttrans.c:call_nt_transact_ioctl(2029) Feb 12 11:44:29 yoda smbd[20241]: call_nt_transact_ioctl(0x90073): Currently not implemented. Feb 12 16:15:15 yoda smbd[2092]: [2009/02/12 16:15:15, 0] smbd/nttrans.c:call_nt_transact_ioctl(2029) Feb 12 16:15:15 yoda smbd[2092]: call_nt_transact_ioctl(0x9005c): Currently not implemented. Feb 12 17:13:29 yoda smbd[4331]: [2009/02/12 17:13:29, 0] smbd/nttrans.c:call_nt_transact_ioctl(2029) Feb 12 17:13:29 yoda smbd[4331]: call_nt_transact_ioctl(0x9005c): Currently not implemented. smb.conf sans shares: [global] workgroup = ET server string = Samba PDC at %h running %v interfaces = eth0 bind interfaces only = Yes username map = /etc/samba/smbusers log level = 2 log file = /var/log/samba/log.%m time server = Yes printcap name = cups logon script = %u.bat logon path logon home domain logons = Yes preferred master = Yes domain master = Yes dns proxy = No wins support = Yes hosts allow = 10.10.10.0/255.255.255.0, 128.125.63.0/255.255.255.0, 127. cups options = raw oplocks = No level2 oplocks = No Any suggestions as to where to look next? Thanks in advance... Ed ........................................................................... Randomly Generated Quote (710 of 1503): If at first you don't succeed, redefine success.
On Fri, Feb 13, 2009 at 05:01:43AM -0800, Ed Kasky wrote:> It took two attempts to stop them all from the init script - the > first run seemed to only kill about 2 or 3 and the second one > finished the job. So I restarted and monitored the situation and all > seemed okay for a few hours and then after about 3 hours they started > multiplying again until I restarted the daemon again. There was just > normal file activity during the couple of hours that it took to go > from 3 to over 400 pid's.Can you try to figure out what these smbds do? You might want to strace -p <smbd-pid> some of the smbds and send the output. You might also want to see what process state according to ps the smbds are. Volker -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.samba.org/archive/samba/attachments/20090213/9f4f1cf2/attachment.bin
Am Friday 13 February 2009 14:01:43 schrieb Ed Kasky:> Over the past 2 days, my smbd processes are multiplying worse than > rabbits. At least with rabbits it's easy to tell why you wind up > with so many... > > Here's the setup: > Samba version 3.2.8-0.24 installed via rpm on a FC6 box with 2 nics, > both on the same subnet. The samba server acts as a PDC in a mixedPlease explain your reasons for using two NICs with IPs on the same subnet and how you make sure packets sent out through one interface don't get their replies routed through the other.
At 06:18 AM Friday, 2/13/2009, you wrote -=>>On Fri, Feb 13, 2009 at 05:01:43AM -0800, Ed Kasky wrote: > > It took two attempts to stop them all from the init script - the > > first run seemed to only kill about 2 or 3 and the second one > > finished the job. So I restarted and monitored the situation and all > > seemed okay for a few hours and then after about 3 hours they started > > multiplying again until I restarted the daemon again. There was just > > normal file activity during the couple of hours that it took to go > > from 3 to over 400 pid's. > >Can you try to figure out what these smbds do? You might >want to > >strace -p <smbd-pid> > >some of the smbds and send the output. You might also want >to see what process state according to ps the smbds are. > >VolkerAll the old ones say the same thing: # strace -p 22122 Process 22122 attached - interrupt to quit write(22, "q", 1 In the last hour they have increased from 3 to over 40 that are inactive. Smbstatus shows some locked files - an outlook.pst, a Quickbooks timer file and a couple of folders and 2 pid's for the shares. I set the deadtime = 10 in smb.conf but it does not appear to be working. Ed ~~~~~~~~~ Randomly Generated Quote (1160 of 1229): When in doubt, use brute force. -- Ken Thompson
On Fri, Feb 13, 2009 at 01:58:02PM -0800, Ed Kasky wrote:> All the old ones say the same thing: > > # strace -p 22122 > Process 22122 attached - interrupt to quit > write(22, "q", 1 > > In the last hour they have increased from 3 to over 40 that are > inactive. Smbstatus shows some locked files - an outlook.pst, a > Quickbooks timer file and a couple of folders and 2 pid's for the shares. > > I set the deadtime = 10 in smb.conf but it does not appear to be working.Ok, you need to figure out now where fd "22" points. lsof together with "ls -l /proc/22122/fd" might help. I'd suspect that it's a socket where the other end is dead for some reason. If you can figure out what daemon is on the other end, you might try to restart it. Volker -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.samba.org/archive/samba/attachments/20090213/eac1cb57/attachment.bin
At 04:53 AM Monday, 2/16/2009, you wrote -=>>On Sun, Feb 15, 2009 at 03:34:40PM -0800, Ed Kasky wrote: > > Well, everything went okay for about 36 hours and then at around > > 10:30 am today, I noticed 3 new pid's every 6 minutes and growing. I > > traced a couple of the first ones and they all come back pretty much > > with the same info. I did notice some sockets in the list though - > > but am not sure how to read it all: > >You need to figure out what the other end of > >smbd 13336 root 22u unix 0xf1fd4000 2447396 socket > >is. Maybe you find another "0xf1fd4000" in lsof?Since I wrote last night I restarted smbd and it was okay for about 12 hours. And then: # /sbin/service smb status smbd (pid 18249 18060 18051 18041 17661 17656 17652 17543 17522 17514 17357 17353 17346 17345 17339 17336 13300 12984 12958 12957 12712 12384 12379 12342 12005 11973 11968 11870 11842 11839 11723 11717 11714 11482 11480 11475) is running... nmbd (pid 11478) is running... I found a couple of things. The first smbd started has a couple of sockets: smbd 11475 root 21u unix 0xdb59ba00 3225813 socket smbd 11475 root 22u unix 0xdb59be00 3225814 socket smbd 11475 root 25u unix 0xca340200 3225816 socket 11480 - no sockets 11482 - smbd 11482 root 23u unix 0xed4c1400 3225837 socket 11714: smbd 11714 root 21u unix 0xdb59ba00 3225813 socket smbd 11714 root 22u unix 0xdb59be00 3225814 socket smbd 11714 root 25u unix 0xca340200 3225816 socket 11717: smbd 11717 root 21u unix 0xdb59ba00 3225813 socket smbd 11717 root 22u unix 0xdb59be00 3225814 socket smbd 11717 root 25u unix 0xca340200 3225816 socket and on up the list all have the same three. I did notice this though as I was looking at the lists not sure if it means anything. The machine is a Mac PB5 : # /usr/sbin/lsof | grep pbg5mac smbd 12957 root 6u IPv4 3927677 TCP yoda.wrenkasky.com:netbios-ssn->pbg5mac.wrenkasky.com:52093 (CLOSE_WAIT) smbd 12958 root 6u IPv4 3927823 TCP yoda.wrenkasky.com:netbios-ssn->pbg5mac.wrenkasky.com:52094 (CLOSE_WAIT) smbd 12984 root 6u IPv4 3928044 TCP yoda.wrenkasky.com:netbios-ssn->pbg5mac.wrenkasky.com:52095 (CLOSE_WAIT) smbd 17336 root 29u IPv4 4022743 TCP yoda.wrenkasky.com:netbios-ssn->pbg5mac.wrenkasky.com:52103 (CLOSE_WAIT) smbd 17339 root 6u IPv4 4022891 TCP yoda.wrenkasky.com:netbios-ssn->pbg5mac.wrenkasky.com:52107 (CLOSE_WAIT) smbd 17345 root 6u IPv4 4023050 TCP yoda.wrenkasky.com:netbios-ssn->pbg5mac.wrenkasky.com:52112 (CLOSE_WAIT) smbd 17346 root 6u IPv4 4023164 TCP yoda.wrenkasky.com:netbios-ssn->pbg5mac.wrenkasky.com:52114 (CLOSE_WAIT) smbd 17353 root 6u IPv4 4023360 TCP yoda.wrenkasky.com:netbios-ssn->pbg5mac.wrenkasky.com:52116 (CLOSE_WAIT) smbd 17357 root 6u IPv4 4023506 TCP yoda.wrenkasky.com:netbios-ssn->pbg5mac.wrenkasky.com:52117 (CLOSE_WAIT) I restarted again as communication with smb was timing out. I keep searching the archives of this list and the Internet but have not found a whole lot that relates directly to this issue and the things I have tried so far have not fixed it obviously. Ed . . . . . . . . . . . . . . . . . . Randomly generated quote 419 of 492: "To err is human, to forgive is against company policy."
On Mon, Feb 16, 2009 at 01:17:29PM -0800, Ed Kasky wrote:> # /sbin/service smb status > smbd (pid 18249 18060 18051 18041 17661 17656 17652 17543 17522 17514 > 17357 17353 17346 17345 17339 17336 13300 12984 12958 12957 12712 > 12384 12379 12342 12005 11973 11968 11870 11842 11839 11723 11717 > 11714 11482 11480 11475) is running... > nmbd (pid 11478) is running... > > I found a couple of things. The first smbd started has a couple of sockets: > smbd 11475 root 21u unix 0xdb59ba00 3225813 > socket > smbd 11475 root 22u unix 0xdb59be00 3225814 > socket > smbd 11475 root 25u unix 0xca340200 3225816 > socketNone of this explains what fd 22 is connected to. You need to scan for 0xdb59ba00 (or whatever it will be next time), and find the non-smbd process that holds this socket. Restart that one or don't start it at all. Volker -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.samba.org/archive/samba/attachments/20090217/f333221b/attachment.bin
Apparently Analagous Threads
- OSX causing multiple CLOSE_WAIT's
- cups_async_callback(504) error after upgrade to 3.2.7
- What version plays well with Windows 7?
- [Bug 1129] sshd hangs for command-only invocations due to fork/child signals
- OpenSolaris better Than Solaris10u6 with requards to ARECA Raid Card