Hi, I need some help! We have Samba 2.2.5 running on AIX 4.3.3. The server is a very heavy duty server, and has 3 main shares, which are very large. The problem is that after a few days of running, the server suddenly stops killing off newly spawned smbd processes. They do not die, even when the original smbd process is killed. They will respond to a kill -9, but killing off the main PID does not kill the runaway children. (Instead, their PPID goes to 1 ) Killing off all the smbd processes with kill -9, and restarting the daemon fixes the problem. This really isn't acceptable. The only thing I can see in the log is that there is rogue machine doing connects with a bad userid/password pair. This is occuring in the log every second or so. The problem here is that we are running at a high debug level, and the log is overwritten within minutes due to this problem. At the moment I am running a test to see if I can reliably reproduce the problem by replicating the rogue login attempts on a test box. There's another test in the pipeline to save a log file large enough to record when it starts so we have more of an idea whats going on, as the logs written after the processes start running away aren't much help. Thoughts : It could be that samba isn't closing the socket properly. This fits, if the process is waiting for the socket to close, but it stays open (deadlocked condition?). Why this might be, I can only guess. I've proved that it works again on the same machine after killing off the runaways, so it does not appear to be an environment issue. It might be a problem with AIX and Samba under certain conditions. A Memory leak maybe? In any case, I would be grateful for some ideas/help. Thanks, Robert
On Tue, 15 Oct 2002 16:28:42 +0100 Robert Euston <robert@euston.fslife.co.uk> wrote:> Hi, > > I need some help! > > We have Samba 2.2.5 running on AIX 4.3.3. The server is a very heavy > duty server, and has 3 main shares, which are very large. > > The problem is that after a few days of running, the server suddenly > stops killing off newly spawned smbd processes. They do not die, even > when the original smbd process is killed. They will respond to a kill > -9, but killing off the main PID does not kill the runaway children. > (Instead, their PPID goes to 1 ) > > Killing off all the smbd processes with kill -9, and restarting the > daemon fixes the problem. > > This really isn't acceptable. The only thing I can see in the log is > that there is rogue machine doing connects with a bad userid/password > pair. This is occuring in the log every second or so. The problem here > is that we are running at a high debug level, and the log is overwritten > within minutes due to this problem. > > At the moment I am running a test to see if I can reliably reproduce the > problem by replicating the rogue login attempts on a test box. There's > another test in the pipeline to save a log file large enough to record > when it starts so we have more of an idea whats going on, as the logs > written after the processes start running away aren't much help. > > Thoughts : > > It could be that samba isn't closing the socket properly. This fits, if > the process is waiting for the socket to close, but it stays open > (deadlocked condition?). Why this might be, I can only guess. I've > proved that it works again on the same machine after killing off the > runaways, so it does not appear to be an environment issue. > > It might be a problem with AIX and Samba under certain conditions. > > A Memory leak maybe?I'm curious to see if this is in any way related to a problem we have. Are the ruaway processes using cpu or are they completely idle? We have a problem with smbd getting into a loop and consuming cpu. We do also see processes which do not get closed down, but it is just odd processes and I think they will kill with just a -TERM. Phil. --------------------------------------- Phil Chambers (postmaster@exeter.ac.uk) University of Exeter
Hi - Thanks for the reply - Unfortunately we are running a 24 hour service, so a stop and start is not a good option for us. I'm looking at downgrading to an earlier version to see if that alleviates the problem. I think that this is looking like a bug in the samba code. Robert Joshua Weage wrote:> I've had a similar problem under HP-UX, and I've seen another report of > it on Linux. I kill and restart samba every night using cron. That > takes care of most problems. Even so, I've had this happen twice, but > I'm not sure what is causing it. > >>We have Samba 2.2.5 running on AIX 4.3.3. The server is a very heavy >>duty server, and has 3 main shares, which are very large. >> >>The problem is that after a few days of running, the server suddenly >>stops killing off newly spawned smbd processes. They do not die, even >> >>when the original smbd process is killed. They will respond to a kill >> >>-9, but killing off the main PID does not kill the runaway children. >>(Instead, their PPID goes to 1 ) >> >>Killing off all the smbd processes with kill -9, and restarting the >>daemon fixes the problem.