Hi! Before anything, here is my hardware/software informations : Hardware : -IBM x345, 1 CPU, 1G RAM, IBM ServeRAID controlle -6 HD used with LVM, 2 volume group, 12 logical volumes all running ext3 Software : -RedHat Linux Enterprise AS (Academic) 3.0 update 1 -Kernel 2.4.21-4.0.2.EL -samba-3.0.2-6.3E -Running an apache 2 web server -On normal use, there is only 10-15 computers "samba" connected to the server with network drives Recently (this week), I start having problem with a samba server. I kept having like (many times each seconds) : [...] Apr 4 00:14:30 rohan smbd[3170]: write_socket_data: write failure. Error = Connection reset by peer Apr 4 00:14:30 rohan smbd[3170]: [2004/04/04 00:14:30, 0] lib/util_sock.c:write_socket(413) Apr 4 00:14:30 rohan smbd[3170]: write_socket: Error writing 4 bytes to socket 5: ERRNO = Connection reset by peer Apr 4 00:14:30 rohan smbd[3170]: [2004/04/04 00:14:30, 0] lib/util_sock.c:send_smb(605) Apr 4 00:14:30 rohan smbd[3170]: Error writing 4 bytes to client. -1. (Connection reset by peer) Apr 4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:get_peer_addr(952) Apr 4 00:46:30 rohan smbd[4201]: getpeername failed. Error was Transport endpoint is not connected Apr 4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:get_peer_addr(952) Apr 4 00:46:30 rohan smbd[4201]: getpeername failed. Error was Transport endpoint is not connected Apr 4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:write_socket_data(388) Apr 4 00:46:30 rohan smbd[4201]: write_socket_data: write failure. Error = Connection reset by peer Apr 4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:write_socket(413) Apr 4 00:46:30 rohan smbd[4201]: write_socket: Error writing 4 bytes to socket 16: ERRNO = Connection reset by peer Apr 4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:send_smb(605) Apr 4 00:46:30 rohan smbd[4201]: Error writing 4 bytes to client. -1. (Connection reset by peer) [...] At some point, the samba server is going crazy and I have seen up to 11000 "smbd -D" process with a wooping load average of 600 !! Needless to say the server was dying and almost frozen and I had to reboot. I start monitoring more carefully the server and when the # of process was to high (normally there is 8-12 smbd process), I have to "killall -9 smbd" and start over. 3 seconds after I was start, I often saw 200 process and had to kill it again. I added some option in smb.conf : deadtime = 60 debug uid = yes debug pid = yes oplocks = no log level = 1 max connections = 50 max smbd processes = 50 hostname lookups = no socket options = TCP_NODELAY SO_KEEPALIVE With no success! I was surprise to see that "max smbd processes = 50" did not prevent samba to grow up to an amazing number of process very quickly (+1000) I started to log with iptables what was happening on the IP layer (I logged incoming packets matching udp/tcp on port 137/138/139/445). Very quickly, another storm occured and the server was receiveing A LOT of packets : [...] Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=48 TOS=0x00 PREC=0x00 TTL=127 ID=54407 DF PROTO=TCP SPT=3863 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54408 DF PROTO=TCP SPT=3863 DPT=445 WINDOW=65535 RES=0x00 ACK URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54410 DF PROTO=TCP SPT=3863 DPT=445 WINDOW=65535 RES=0x00 ACK FIN URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=48 TOS=0x00 PREC=0x00 TTL=127 ID=54411 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54412 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65535 RES=0x00 ACK URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=48 TOS=0x00 PREC=0x00 TTL=127 ID=54413 DF PROTO=TCP SPT=3866 DPT=139 WINDOW=65535 RES=0x00 SYN URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=177 TOS=0x00 PREC=0x00 TTL=127 ID=54414 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65535 RES=0x00 ACK PSH URGP=0 Apr 8 11:19:17 rohan smbd[5095]: [2004/04/08 11:19:17, 0, pid=5095, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952) Apr 8 11:19:17 rohan smbd[5095]: getpeername failed. Error was Transport endpoint is not connected Apr 8 11:19:17 rohan smbd[5095]: [2004/04/08 11:19:17, 0, pid=5095, effective(0, 0), real(0, 0)] lib/util_sock.c:read_socket_data(342) Apr 8 11:19:17 rohan smbd[5095]: read_socket_data: recv failure for 4. Error = Connection reset by peer Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54415 DF PROTO=TCP SPT=3866 DPT=139 WINDOW=65535 RES=0x00 ACK URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54416 DF PROTO=TCP SPT=3863 DPT=445 WINDOW=65535 RES=0x00 ACK URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54417 DF PROTO=TCP SPT=3866 DPT=139 WINDOW=0 RES=0x00 RST URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=242 TOS=0x00 PREC=0x00 TTL=127 ID=54418 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65404 RES=0x00 ACK PSH URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=276 TOS=0x00 PREC=0x00 TTL=127 ID=54419 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65152 RES=0x00 ACK PSH URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=134 TOS=0x00 PREC=0x00 TTL=127 ID=54420 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65042 RES=0x00 ACK PSH URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=140 TOS=0x00 PREC=0x00 TTL=127 ID=54440 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64990 RES=0x00 ACK PSH URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=200 TOS=0x00 PREC=0x00 TTL=127 ID=54441 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64883 RES=0x00 ACK PSH URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=208 TOS=0x00 PREC=0x00 TTL=127 ID=54442 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64755 RES=0x00 ACK PSH URGP=0 Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=180 TOS=0x00 PREC=0x00 TTL=127 ID=54445 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64647 RES=0x00 ACK PSH URGP=0 Apr 8 11:19:17 rohan smbd[5098]: [2004/04/08 11:19:17, 0, pid=5098, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952) Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=202 TOS=0x00 PREC=0x00 TTL=127 ID=54446 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64459 RES=0x00 ACK PSH URGP=0 Apr 8 11:19:17 rohan smbd[5100]: [2004/04/08 11:19:17, 0, pid=5100, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952) Apr 8 11:19:17 rohan smbd[5102]: [2004/04/08 11:19:17, 0, pid=5102, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952) Apr 8 11:19:17 rohan smbd[5098]: getpeername failed. Error was Transport endpoint is not connected Apr 8 11:19:17 rohan smbd[5120]: [2004/04/08 11:19:17, 0, pid=5120, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952) [...] When I saw this, I disconnected x.y.16.19 from the network (it seems to be infected by a virus) and everything returned to normal. I know the real problem is the client that is infected, but I don't think it's a normal behaviour for samba to FREEZE a server because of such an event. Any clue of what's happening and if there is a fix for samba ? Why the "max smbd processes" directive isn't respected ? I don't really want my server to die every time there is some windows machines infected on the campus (more than 2000 computers). The virus seems to be W32.spybot.worm.gen . Thanks and have a nice day, -- Mathieu Legar?, analyste en informatique (r?seau/syst?me) Service de soutien p?dagogique et technologique Universit? du Qu?bec a Trois-Rivi?res Courriel : legare at uqtr.ca PGP : http://www.uqtr.ca/~legare/public.pgp