Christoph Kaegi
2007-Oct-25 19:55 UTC
[Samba] Accumulating smbd processes and sockets in CLOSE_WAIT state
Hello list Our central fileserver is a Samba 3.0.25b on Solaris 9 and has 10'000 users (several hundreds at the same time). This week it died on us and when I inspected the machine, it was out of 8GB Memory and 16GB Swap because thousands of smbd processes were running. netstat -na showed that many hundreds of connections to port 445 were in CLOSE_WAIT state. We first thought it could be some sort of DoS Attack, but now I also discovered a lot of the following entries in smbd.log at the times the server became unresponsive: ---------------------------- 8< ---------------------------- [2007/10/25 15:40:30, 0] lib/util_tdb.c:tdb_chainlock_with_timeout_internal(84) tdb_chainlock_with_timeout_internal: alarm (10) timed out for key replay cache mutex in tdb /etc/samba/private/secrets.tdb ---------------------------- 8< ---------------------------- The same thing happened three times now, all of them at a time when presumably a peak of users (around 600-900) tried to use the server. Every time the number of network connections in CLOSE_WAIT state and the number of smbd processes was massively increasing. Others seem to have similar problems (like http://marc.info/?l=samba&m=119263114612187&w=2). The fileserver has been performing OK now for several months with this Samba Release. I'd be grateful if anybody could give me some insight about how we can solve this. Loosing fileservice for all of staff and students several times a week builds some considerable pressure on me... Thanks Chris -- ---------------------------------------------------------------------- Christoph Kaegi kaph@zhaw.ch ----------------------------------------------------------------------
Mike Eggleston
2007-Oct-25 20:03 UTC
[Samba] Accumulating smbd processes and sockets in CLOSE_WAIT state
On Thu, 25 Oct 2007, Christoph Kaegi might have said:> > Hello list > > Our central fileserver is a Samba 3.0.25b on Solaris 9 and has > 10'000 users (several hundreds at the same time). > > This week it died on us and when I inspected the machine, it > was out of 8GB Memory and 16GB Swap because thousands of > smbd processes were running. > netstat -na showed that many hundreds of connections to > port 445 were in CLOSE_WAIT state. > > We first thought it could be some sort of DoS Attack, but now I > also discovered a lot of the following entries in smbd.log at > the times the server became unresponsive: > > ---------------------------- 8< ---------------------------- > [2007/10/25 15:40:30, 0] lib/util_tdb.c:tdb_chainlock_with_timeout_internal(84) > tdb_chainlock_with_timeout_internal: alarm (10) timed out for key replay cache mutex in tdb /etc/samba/private/secrets.tdb > ---------------------------- 8< ---------------------------- > > The same thing happened three times now, all of them at a time > when presumably a peak of users (around 600-900) tried to use > the server. Every time the number of network connections in > CLOSE_WAIT state and the number of smbd processes was massively > increasing. > > Others seem to have similar problems (like > http://marc.info/?l=samba&m=119263114612187&w=2). > > The fileserver has been performing OK now for several months > with this Samba Release. > > I'd be grateful if anybody could give me some insight > about how we can solve this. > Loosing fileservice for all of staff and students > several times a week builds some considerable pressure > on me...A recent problem I had of hundreds of smbd processes running for only 15 users was fixed by adding 'deadtime = 60' to the global section of /etc/samba/smb.conf. Mike
Christoph Kaegi
2007-Nov-01 15:05 UTC
[Samba] Accumulating smbd processes and sockets in CLOSE_WAIT state
Hello list The below mentionned problem just occured again. We had about 673 smbd Processes running and 1746 Locks (as reported by smbstatus) when it happened. Again, the only unusual thing smbd.log said was: ---------------------------- 8< ---------------------------- [2007/11/01 15:44:14, 0] lib/util_tdb.c:tdb_chainlock_with_timeout_internal(84) tdb_chainlock_with_timeout_internal: alarm (10) timed out for key replay cache mutex in tdb /etc/samba/private/secrets.tdb ---------------------------- 8< ---------------------------- Restarting samba helped for the moment but when will the problem occur again? What could trigger such a problem? And what can I do to better diagnose it? Thanks Chris On 25.10-21:48, Christoph Kaegi wrote:> > Our central fileserver is a Samba 3.0.25b on Solaris 9 and has > 10'000 users (several hundreds at the same time). > > This week it died on us and when I inspected the machine, it > was out of 8GB Memory and 16GB Swap because thousands of > smbd processes were running. > netstat -na showed that many hundreds of connections to > port 445 were in CLOSE_WAIT state. > > We first thought it could be some sort of DoS Attack, but now I > also discovered a lot of the following entries in smbd.log at > the times the server became unresponsive: > > ---------------------------- 8< ---------------------------- > [2007/10/25 15:40:30, 0] lib/util_tdb.c:tdb_chainlock_with_timeout_internal(84) > tdb_chainlock_with_timeout_internal: alarm (10) timed out for key replay cache mutex in tdb /etc/samba/private/secrets.tdb > ---------------------------- 8< ---------------------------- > > The same thing happened three times now, all of them at a time > when presumably a peak of users (around 600-900) tried to use > the server. Every time the number of network connections in > CLOSE_WAIT state and the number of smbd processes was massively > increasing. > > Others seem to have similar problems (like > http://marc.info/?l=samba&m=119263114612187&w=2). > > The fileserver has been performing OK now for several months > with this Samba Release. > > I'd be grateful if anybody could give me some insight > about how we can solve this. > Loosing fileservice for all of staff and students > several times a week builds some considerable pressure > on me... >-- ---------------------------------------------------------------------- Christoph Kaegi kaph@zhaw.ch ----------------------------------------------------------------------