Martin Rootes
1999-Nov-15 17:32 UTC
HELP: Connections dropping whilst processes increasing.
I'm running Samba 2.0.4b on a E450 running Solaris 7 which acts as a diskspace server for students, normally everything runs quite smoothly with numbers of connections in the region of about 800/900. Then every now and again students will start seeing their connections being lost and others will not get a connection when they log in, looking at the system shows no obvious problems, except the error messages below in log.smb, and the fact that the number of smbd processes increases to unfeasible levels (I have seen as many as 5000). Does anyone have any idea as to what may be causing this, or any idea as to how to diagnose the problem. Martin Rootes Systems Support Sheffield Hallam University [1999/11/10 14:56:08, 0] smbd/oplock.c:request_oplock_break(996) request_oplock_break: no response received to oplock break request to pid 7118 on port 51376 for dev = 2580006, inode = 8385207 for dev = 2580006, inode = 8385207, tv_sec = 382985c5, tv_usec = 798c7 [1999/11/10 14:56:08, 0] lib/util_sock.c:client_addr(889) getpeername failed. Error was Transport endpoint is not connected [1999/11/10 14:56:08, 0] lib/util_sock.c:write_data(415) write_data: write failure. Error = Broken pipe [1999/11/10 14:56:08, 0] lib/util_sock.c:write_socket(191) write_socket: Error writing 4 bytes to socket 7: ERRNO = Broken pipe [1999/11/10 14:56:08, 0] lib/util_sock.c:send_smb(606) Error writing 4 bytes to client. -1. Exiting ------------------------------------------------------------------------------ Martin Rootes - Senior Systems Programmer/Analyst, Sheffield Hallam University Email : M.Rootes@shu.ac.uk ------------------------------------------------------------------------------
Did anyone ever give an answer for this problem? We've been experiencing something very similar to Martin Rootes' problem, on an HP9000 K-series server, with anywhere from hundreds to thousands of extra, unkillable smbd processes. The odd thing is, the system load goes *very* high, but it doesn't seem to affect anything other than further smbd services, including preventing successful logons. It's odd - it only happens on that one server (we run Samba on five production servers), and there are few differences between that host and the others. As you can imagine, I really need to determine if our problem is that there's something wrong with Samba, or if this is due to either the other processes on that server or something different about the clients that predominantly use that server. Very unfortunately, the only way to get rid of those hundreds to thousands of extra processes is to restart the server. An increasingly unacceptable solution. My management and the support staffer on that campus believe that Samba is the problem, because it displays this behavior (difficulty logging in, and enormous numbers of unkillable smbd processes). I believe it's something else, but need to prove it. Let's see, the only configuration options were --prefix, --with-quotas, and --with-mmap (which I guess we'll stop using Real Soon). The logon script mounts the user's home directory, a shared directory, sets the time, and some antiviral housekeeping. Help! Anything anyone's found or any insights will be helpful. We require domain logons, and they've worked fine for a couple of years now (from Samba 1.9.18 to now - 2.0.5a [don't suggest upgrading to 2.0.6 - I've got a major problem there on a testbed system]), Win95 & plain passwords, logon scripts generated by rootpreexec calling a perl script in [netlogon]. I believe the campus with the problem has a persistent share defined on the clients, and I know that's not the case for the other campuses. log.smb on that campus shows *no* entries for "connect to service netlogon", but many "closed connection to service netlogon", which should not be happening. On the other hand, that server began running both Oracle and OpenView for network monitoring and management a few months before these problems started to appear. I didn't want to shower you all with log details, my smb.conf file, or the logonscript (of course, I'll provide info if it'll help) - but can *anyone* provide some advice, insight, or <gasp> solutions? c -- Clifford Green Internet - green@umdnj.edu Academic Computing Services voice - 732-235-5250 UMDNJ-IST fax - 732-235-5252
Paulo Afonso Graner Fessel
1999-Nov-22 19:38 UTC
HELP: Connections dropping whilst processes increasing.
On Sat, 20 Nov 1999, Cliff Green (green@UMDNJ.EDU) wrote:> Did anyone ever give an answer for this problem?I'm getting EXACTLY the same problem here, with Red Hat 6.0 + custom Linux Kernel 2.2.13 (with e2comp patch basically).> We've been experiencing something very similar to Martin Rootes' problem, > on an HP9000 K-series server, with anywhere from hundreds to thousands of > extra, unkillable smbd processes. The odd thing is, the system load goes > *very* high, but it doesn't seem to affect anything other than further > smbd services, including preventing successful logons.Same thing here - I can do whatever I want to, except to use Samba services. Telnet, httpd, LDAP, etc., everything but Samba works OK. Includes, the problem is easily reproductible in one station: it's only neccessary to open a Word document, make one or two modifications and save it twice. The first save works ok, the second locks the machine and turns the user's original smbd mad, making the server spawn two or three other process for the same user.> It's odd - it only happens on that one server (we run Samba on five > production servers), and there are few differences between that host and > the others. As you can imagine, I really need to determine if our problem > is that there's something wrong with Samba, or if this is due to either > the other processes on that server or something different about the > clients that predominantly use that server.How many users use this specific server and how do they use it? What applications are involved in client side (Word, Excel, xBASE apps...) and in server side (daemons?) ?> Very unfortunately, the only way to get rid of those hundreds to thousands > of extra processes is to restart the server. An increasingly unacceptable > solution.I repeat, it's EXACTLY the same way here. Unacceptability considerations included. =:-0> My management and the support staffer on that campus believe that Samba is > the problem, because it displays this behavior (difficulty logging in, and > enormous numbers of unkillable smbd processes). I believe it's something > else, but need to prove it.Hmmm... How is this server connected to the stations that show the problem? Here I think that our problem may be our switch (a 3Com SuperStack 1000 with OLD firmware and low-capacity buffers, since it's a workgroup switch and not a backbone switch). I say this because I'm observing *collisions* in ports that are reserved to the *server* and *workstations* (no hubs involved).> Let's see, the only configuration options were --prefix, --with-quotas, > and --with-mmap (which I guess we'll stop using Real Soon).If I'm not mistaken, mmap suport is disabled by default in the current (and not-so-current) versions of Samba, so I think it's not an issue (unless you have enabled it explictly).> The logon script mounts the user's home directory, a shared directory, > sets the time, and some antiviral housekeeping.I don't have logon scripts here. I map the drives using Network Neighborhood. Hmmm, we also have antiviral software running (McAfee ViruScan), what's yours?> Help! > > Anything anyone's found or any insights will be helpful.I've found "window frozen" problems and acknowledge-time problems ("acks too long") between station and server. In the first case, this is a signal of buffer exhaustion and so I'll be setting up separate switchless network for us in a separate interface on the server, plus the "usual" network interface that will remain connected to the switch. I'll put one smbd listening in each interface. Thus if the smbd linked to the interface connected to the switch locks I'll know that the issue is the switch issue.> log.smb on that campus shows *no* entries for "connect to > service netlogon", but many "closed connection to service netlogon", which > should not be happening.I'm not sure, but it *seemed* to be happening here too - I'll check out.> On the other hand, that server began running both Oracle and OpenView for > network monitoring and management a few months before these problems > started to appear.I don't think this is the problem, as I don't run neither of these here and I have the problem too. BUT... ...humm, I'm running snmpd here and I think you're doing this too, as I think this server of yours is SNMP-manageable. Or not? I say this because I'm running snmpd here (and actually it's basically useless).> I didn't want to shower you all with log details, my smb.conf file, or the > logonscript (of course, I'll provide info if it'll help) - but can > *anyone* provide some advice, insight, or <gasp> solutions?I'm looking for solutions, also. Unfortunatley, I still don't have any concrete answers. But if you could check out these points would be interesting to see whether there are (or not) other similarities (besides the problem itself). P. -- "The one that doesn't run the risk doesn't snap" (Mill?r, "Li??es de Ingl?s Audiovisual", Pasquim n?117)
don_mccall@hp.com
1999-Nov-22 20:20 UTC
HELP: Connections dropping whilst processes increasing.
Hi Cliff, If you have dde on your HP-UX system you could try attaching to some of these runaway smbd processes and get a stack trace to see WHAT they are doing; perhaps catch them forking and see what they were trying to do at the time; the command syntax for dde would be: dde -ui line -attach <PID of the smbd process> /...path to smbd/smbd (-ui line puts it in line mode so you don't have to deal with the xwindows gui over a modem) dde 'stops' the process and gives you a prompt, where you can type 'tb' to get a traceback of the routines called to get to whereever the stop occurred. You can then type "go", to start the process running where it left off, and <cntl c> to interrupt it again, and do another 'tb' to get another stack, etc - to see if you are in a loop, and what you are executing so frantically... May help, may not... I suppose you have already tried turning on a higher level of debugging to see if any useful debug statements are being generated during this runaway condition, but if not, that would be useful as well.... I haven't seen this behavior on my 11.0 system at this time, but it's a diag system, and not very heavily loaded... Hope this helps, Don