Carsten Menke
2004-Nov-12 04:33 UTC
[Samba] [Very Strange] Windows Networking suddendly stopped working
Hi list, I hope that maybe one of you can shed some light on this, as this is a very strange case and I don't even have the slightest clue from what this symptom may all come from. Maybe not Samba, maybe Hardware, maybe buggy windows ..... Problem: We have running a Samba 3.0.7 (from backports.org) Debian 3.0 STABLE Server here for over a year now. On Tuesday this week we got the first report from 1 user, that the Windows Network Neighborhood is not accessable, we did not thought of a problem first. On Wednesday the next 2 reports from different users on different machines came in. And finally yesterday another 5 reports ( All within the same company using the same Server) NOTE: The error messages following were translated from a german localized version of Windows XP Prof. so they might be called differently in the english version So we started looking into the Problem, the error message given was, when trying to access the Windows Networking Neighborhood "The network is not existent or was not started". It turned out that the services "Server Service, Computer Browser Service" were not started. Trying to start them manually ended with a timeout. The Event Log is showing nothing. And now the Fun begins .... We thought first of a network problem and to isolate this we directly connected 1 PC via a crosslink cable to the Samba Server, giving the PC a static IP address. (Normally we use DHCP). Even now the error message was the same, more strangely, if you pulled out the network cable completley the Computer and all Services started normally. Pluggin the cable back in, the same problems arose. Unfortunatley enough is, that sometimes it works *with* the network cable plugged in but then a minute later it doesn't. So in the next step we replaced the NIC of the Server with a new one, thinking we solved the problem (While replacing the NIC requries a complete reboot this step was done with this as well, a tdbbackup -v *.tdb was also done, showing everything is ok, and I removed manually the browse.dat ). The first try succeeded, but the second try was again the same result. And we found out that this problem is bound to the computer not the user, as the user can logon on another computer normal without any problems. All computers were running Windows XP Prof SP1a. Altough there is a virus Scanner (CA Etrust Inoculan) with up-to-date signatures on the computer, we scanned the computers in question by 2 additional Anti Virus Packages, 1. H+BEDV AntiVir and 2. Kasperksy, all scanners marking the computer clean. Running "nbtstat -RR" didn't also not solve the problem. Next, our MCSE decided to install Windows XP SP2 on the computers in question, and gues what, that solved the problem so far. So my question is what is the **REAL** Problem we seeing here, I don't believe that the solution is SP2, and normally I wouldn't worry if it would be 1 Computer showing this odd behavior, but the number increasing of the computers showing the same sympthoms within 3 days does make me nervous. I have looked thru samba log files and they were showing the things below and also rather frequently "No route to host" here is the output of them. There is no router between the PC's and the server, they are connected via a 3Com Super Stack III Switch. Logfile: [2004/11/12 02:34:14, 0] lib/util_sock.c:get_peer_addr(1000) getpeername failed. Error was Transport endpoint is not connected [2004/11/12 02:34:14, 0] lib/util_sock.c:write_socket_data(430) write_socket_data: write failure. Error = Connection reset by peer [2004/11/12 02:34:14, 0] lib/util_sock.c:send_smb(647) Error writing 4 bytes to client. -1. (Connection reset by peer) [2004/11/12 02:34:14, 0] smbd/service.c:make_connection(800) neckar (192.168.1.65) couldn't find service user [2004/11/12 02:38:40, 0] rpc_server/srv_util.c:get_alias_user_groups(219) get_alias_user_groups: gid of user xxx doesn't exist. Check your /etc/passwd and /etc/group files I double checked the above message, the gid *is* in /etc/group and the userid does also exist. I'm glad for every hint etc. etc. I could get, as it seems that this could turn out to be a real problem Regards Carsten