In problem-fixing, the "nicest" problems are those reproducible on demand. Alas, this one is not nice. System: Samba 2.0.6 on Solaris 2.7 (also see similar on Solaris 2.6). Symptom (user perspective): PC suddenly, spontaneously freezes and eventually the connection times out. Symptom (log file): [2000/02/15 09:27:30, 0] ../lib/util_sock.c:read_socket_data(474) read_socket_data: recv failure for 4. Error = Connection timed out We have several hundred simultaneous connections from our NT-based classrooms to about four Solaris servers. All usually works very well. The above events are therefore relatively rare, but nevertheless still add up to several occurences per day. Naturally, I'm not looking forward to debugging this one. So just before I start, is this a "known" problem? Any fixes, workarounds? Might it be addressed in pre-2.0.7 (whose "WHATSNEW.txt" is not yet made)? Any hints? -- : David Lee I.T. Service : : Systems Programmer Computer Centre : : University of Durham : : http://www.dur.ac.uk/~dcl0tdl South Road : : Durham : : Phone: +44 191 374 2882 U.K. :
David Lee said:> In problem-fixing, the "nicest" problems are those reproducible on > demand. Alas, this one is not nice. > > System: Samba 2.0.6 on Solaris 2.7 (also see similar on Solaris 2.6). > > Symptom (log file): [2000/02/15 09:27:30, 0] ../lib/ > util_sock.c:read_socket_data(474) read_socket_data: recv failure for > 4. Error = Connection timed out >This prompted me to look and see if we had any similar error messages - and yes we did - though I am unaware of user problems :-) We are running Samba 2.0.6 on Solaris 2.5.1 and 7.>From a quick survey - in our case the connections were closed forwith by theserver - there were some cases, though as I say no user complaints.>From examining the code - this error causes read_smb_length() to fail causingserver exit - this is within reply_writebraw(). I would have thought may be naively, that the server exit would cause the client to re-establish the connection - though I can find no evidence for this. -- ----------------------------------------------------------------------------- | Peter Polkinghorne, Computer Centre, Brunel University, Uxbridge, UB8 3PH,| | Peter.Polkinghorne@brunel.ac.uk +44 1895 274000 x2561 UK | -----------------------------------------------------------------------------
On Fri, 18 Feb 2000, David Lee wrote:> In problem-fixing, the "nicest" problems are those reproducible on demand. > Alas, this one is not nice. > > System: Samba 2.0.6 on Solaris 2.7 (also see similar on Solaris 2.6). > > Symptom (user perspective): PC suddenly, spontaneously freezes and > eventually the connection times out. > > Symptom (log file): > [2000/02/15 09:27:30, 0] ../lib/util_sock.c:read_socket_data(474) > read_socket_data: recv failure for 4. Error = Connection timed out > > We have several hundred simultaneous connections from our NT-based > classrooms to about four Solaris servers. All usually works very well. > The above events are therefore relatively rare, but nevertheless still add > up to several occurences per day. > > Naturally, I'm not looking forward to debugging this one. So just before > I start, is this a "known" problem? Any fixes, workarounds? Might it be > addressed in pre-2.0.7 (whose "WHATSNEW.txt" is not yet made)? Any hints?We have done further work and eventually tracked it down. Samba is innocent, the problem lies "in the network", beyond the control of Samba. In that sense, you good Samba folk can regard the problem as null and closed. (For us, the problem is very real and lives on...) [ For the curious: The PCs are separated from the server by a router. So in normal operation, ARP/RARP exchanges will ensure that the PC learns an arp entry that maps the server-IP-address to the router-MAC-address. When this problem strikes, we find the arp entry has the server's own MAC address which has somehow leaked throught our ancient, flaky, router. (The usual "shouldn't happen" thing that we all know and love.) Anyway I have passed this problem to the network people. ] -- : David Lee I.T. Service : : Systems Programmer Computer Centre : : University of Durham : : http://www.dur.ac.uk/~dcl0tdl South Road : : Durham : : Phone: +44 191 374 2882 U.K. :
I too have experienced this problem with the clients, primarily Win98, here on our network. However, we are using Slackware v7.0 ( Samba 2.0.5a) on a PII 350 with 192 meg of ram. At first I thought that it might have something to do with the oplock feature, so I disabled it in smb.conf, and for a while, it appeared as though I was right. But it has re-surfaced, and I am getting quite a lot of complaints from employees who rely heavily on that particular server. There is one thing that the complaints all have in common and that is Microsoft Office. "I was working on a Word doc, stopped to take a call and when I went back to my work, the server won't let me save." Any success in solving this issue? oooo<<<000~XX~000>>>oooo Gary Yamamoto Custom Baits Ken Sasaki President mailto:ksasaki@gyb.baits.com www.yamamoto.baits.com oooo^^^<:}}}}}}}}}}}}><^^^oooo