ard@waikato.ac.nz
2002-Jul-18 16:46 UTC
[Samba] 2500 smbd processes for 30 users. tdb_oob len beyond eof.
Yesterday my Samba server stopped answering mount requests for the first time in four months. There were 2500 smbd processes running, consuming both CPUs, a gig of RAM, and a gig of swap. Sysstat provided interesting graphs. "smbstatus" showed about 30 connected users. Normally I have about ten times that. New mount requests were being refused, or timing out: # mount -tsmbfs //server/share /mnt -ousername=etc,uid=etc Password: 15438: session setup failed: SUCCESS - 0 SMB connection failed Windows users were unable to log in because this Samba box is their home H: drive server. I fixed it with "killall smbd" and now everything is back to normal (smbd is started by inetd). I did not delete any log files or the connection database, as others on this list have had to do. Occasionally in the past smbstatus has reported "tdb_oob len beyond eof" but I have always ignored it because smbd continues working. Only now, after searching the mailing list archives, I see that it has been terminal for some users. The system is a dual P3 Compaq, 2.4.18-ac1 (-ac1 to get proper quotas), Samba-2.2.3a. The original installation was Slackware-8.0, if I remember correctly, but all apps are compiled from source rather than not taken from packages. Load peaks around 300-400 concurrent connections per day with deadtime set to 30 minutes. This peak happens between 12pm and 1pm every weekday. The "crisis" happened at 12:35pm Thursday. Logging went into orbit: # grep smbd daemon.log | cut -f 1 -d: | uniq -c 304 Jul 18 00 120 Jul 18 01 122 Jul 18 02 98 Jul 18 03 102 Jul 18 04 102 Jul 18 05 118 Jul 18 06 170 Jul 18 07 2012 Jul 18 08 4896 Jul 18 09 8262 Jul 18 10 8348 Jul 18 11 1886813 Jul 18 12 618309 Jul 18 13 78478 Jul 18 14 323 Jul 18 15 342 Jul 18 16 [...continues around 300 for the rest of the day...] # grep tdb_oob daemon.log |cut -f 1 -d: |uniq -c 937028 Jul 18 12 300362 Jul 18 13 35386 Jul 18 14 # grep tdb_oob daemon.log | cut -f 1,2 -d: | uniq -c 9007 Jul 18 12:04 18203 Jul 18 12:05 21734 Jul 18 12:06 [...about 20000 *every* minute until 12:30, then linear dropoff to 3000 per minute at 14:10, ...] 3041 Jul 18 14:11 2134 Jul 18 14:12 1659 Jul 18 14:13 <---- killall smbd here 6 Jul 18 14:14 <---- ...taking several minutes to complete 6 Jul 18 14:15 5 Jul 18 14:16 6 Jul 18 14:17 5 Jul 18 14:18 2 Jul 18 14:19 Next week I will upgrade to 2.2.5, but with a large user base I have to take some care. In the meantime, can anybody offer any suggestions? _________________________________________________________________________ Andrew Donkin Waikato University, Hamilton, New Zealand P.S. does anybody else think that splitting log lines in two, the way Samba does, is madness?
Bob Crandell
2002-Jul-18 17:54 UTC
[Samba] 2500 smbd processes for 30 users. tdb_oob len beyond eof.
I have almost the same setup except not as many users. That particular problem went away when I stopped using inetd. ard@waikato.ac.nz wrote*:> > >Yesterday my Samba server stopped answering mount requests for the first >time in four months. There were 2500 smbd processes running, consuming >both CPUs, a gig of RAM, and a gig of swap. Sysstat provided interesting >graphs. "smbstatus" showed about 30 connected users. Normally I have >about ten times that. > >New mount requests were being refused, or timing out: > > # mount -tsmbfs //server/share /mnt -ousername=etc,uid=etc > Password: > 15438: session setup failed: SUCCESS - 0 > SMB connection failed > >Windows users were unable to log in because this Samba box is their home H: >drive server. > >I fixed it with "killall smbd" and now everything is back to normal (smbd >is started by inetd). I did not delete any log files or the connection >database, as others on this list have had to do. > >Occasionally in the past smbstatus has reported "tdb_oob len beyond eof" >but I have always ignored it because smbd continues working. Only now, >after searching the mailing list archives, I see that it has been terminal >for some users. > >The system is a dual P3 Compaq, 2.4.18-ac1 (-ac1 to get proper quotas), >Samba-2.2.3a. The original installation was Slackware-8.0, if I remember >correctly, but all apps are compiled from source rather than not taken from >packages. Load peaks around 300-400 concurrent connections per day with >deadtime set to 30 minutes. This peak happens between 12pm and 1pm every >weekday. The "crisis" happened at 12:35pm Thursday. > >Logging went into orbit: > ># grep smbd daemon.log | cut -f 1 -d: | uniq -c > 304 Jul 18 00 > 120 Jul 18 01 > 122 Jul 18 02 > 98 Jul 18 03 > 102 Jul 18 04 > 102 Jul 18 05 > 118 Jul 18 06 > 170 Jul 18 07 > 2012 Jul 18 08 > 4896 Jul 18 09 > 8262 Jul 18 10 > 8348 Jul 18 11 >1886813 Jul 18 12 > 618309 Jul 18 13 > 78478 Jul 18 14 > 323 Jul 18 15 > 342 Jul 18 16 >[...continues around 300 for the rest of the day...] > ># grep tdb_oob daemon.log |cut -f 1 -d: |uniq -c > 937028 Jul 18 12 > 300362 Jul 18 13 > 35386 Jul 18 14 > ># grep tdb_oob daemon.log | cut -f 1,2 -d: | uniq -c > 9007 Jul 18 12:04 > 18203 Jul 18 12:05 > 21734 Jul 18 12:06 >[...about 20000 *every* minute until 12:30, then linear dropoff to 3000 per > minute at 14:10, ...] > 3041 Jul 18 14:11 > 2134 Jul 18 14:12 > 1659 Jul 18 14:13 <---- killall smbd here > 6 Jul 18 14:14 <---- ...taking several minutes to complete > 6 Jul 18 14:15 > 5 Jul 18 14:16 > 6 Jul 18 14:17 > 5 Jul 18 14:18 > 2 Jul 18 14:19 > >Next week I will upgrade to 2.2.5, but with a large user base I have to >take some care. In the meantime, can anybody offer any suggestions? > > >_________________________________________________________________________ >Andrew Donkin Waikato University, Hamilton, New Zealand > > >P.S. does anybody else think that splitting log lines in two, the way Samba >does, is madness? > >-- Bob Crandell Assured Computing When you need to be sure. Cell 541-914-3985 FAX 240-371-7237 bob@assuredcomp.com www.assuredcomp.com Eugene, Or. 97402