skeet@Bridgewater.EDU wrote:> > > Basically what it looks like to me is that an smbd process is spawning a > child which goes defunct for some reason I have yet to determine and the > smbd process hangs out there waiting for the child. Of course the child is > never going to answer being defunct, so the parent smbd process sticks > around and keeps its oplocks. >That would definately explain the oplock break errors. However the actual status of a "defunct" process is listed from the ps man page as : "A process that has exited and has a parent, but has not yet been waited for by the parent, is marked <defunct>." However, looking at the 1.9.18p10 code (and Samba2.0 is similar here) the parent goes into the "talktochild()" function and then goest into a sys_wait() call to wait for the dead child (first doing a kill() on the child if the password change failed, just to make sure the child is dead). If it's hanging - then it's doing so within the talktochild() call. Can you do a truss on the parent when it's in this state to see what system call(s) it's doing ? Trying this with p10 would also help track this one. Cheers, Jeremy Allison, Samba Team. -------------------------------------------------------- Buying an operating system without source is like buying a self-assembly Space Shuttle with no instructions. --------------------------------------------------------
On Tue, 3 Nov 1998, Jeremy Allison wrote:> skeet@Bridgewater.EDU wrote: > > > > Basically what it looks like to me is that an smbd process is spawning a > > child which goes defunct for some reason I have yet to determine and the > > smbd process hangs out there waiting for the child. Of course the child is > > never going to answer being defunct, so the parent smbd process sticks > > around and keeps its oplocks. > > > However, looking at the 1.9.18p10 code (and Samba2.0 is > similar here) the parent goes into the "talktochild()" > function and then goes into a sys_wait() call to wait > for the dead child (first doing a kill() on the child > if the password change failed, just to make sure the > child is dead). > > If it's hanging - then it's doing so within the talktochild() > call. > > Can you do a truss on the parent when it's in this > state to see what system call(s) it's doing ?I'll have to wait until the crops up again. It does so once or twice a day or so. A few more details here: While we are predominantly seeing this with our DMB (the logon server where of course the password changes are taking place), we're also seeing the same behavior on some of our other systems which are not involved with password changes. The instances with the DMB are mostly with the USER.DAT files in the "profiles" share. I haven't scoured all the log files to make sure this is for certain, but it appears many of the occurances happen after a previous network connection is improperly terminated. The logfiles which show the connection associated with the offending PID show the connects to the shares, but do not show disconnects from one or more shares.> Trying this with p10 would also help track this > one.We tried upgrading to p10 on this particular system (all the other servers are already running p10) but ran into a serious problem with password change attempts. (I was waiting until I'd had a chance to gather more info on this before reporting it) I'm still working with the Network Administrator to debug this, but it appears on the surface to him like the password change program as specified in smb.conf is not being called with all the proper arguments as listed in smb.conf. Here's what these entries look like: passwd program = /usr/local/sbin/BCpassman -user %u terra pluto passwd chat = *new*password* %n\n *retype*new*password* %n\n*All\nis\nwell\n BCpassman appears not to receive the "terra" and "pluto" arguments when it is called, which is bombing out the password syncronization we do. As a result we moved back to p8 until the password syncing can be done okay in p10. ---------------------------------------------------------------------- Douglas K. Fischer DFischer@Bridgewater.EDU (540) 828 - 5343 Network Systems Engineer C. E. Shull Information Technology Center College Box 36 Bridgewater College Bridgewater, VA 22812 ----------------------------------------------------------------------
I had similar problem with NCR unix, High C compiler and samba p10 a few weeks ago. I have no problem now, when i compile samba with gcc 2.7.2.f.1 . ( I have serious problem with samba and Delphi, but it's another story ) Ivan Kuncl ( kuncl@vsbohem.cz ) On Wed, Nov 04, 1998 at 07:19:49AM +1100, Jeremy Allison wrote:> skeet@Bridgewater.EDU wrote: > > > > > > Basically what it looks like to me is that an smbd process is spawning a > > child which goes defunct for some reason I have yet to determine and the > > smbd process hangs out there waiting for the child. Of course the child is > > never going to answer being defunct, so the parent smbd process sticks > > around and keeps its oplocks. > > > > That would definately explain the oplock break errors. > > However the actual status of a "defunct" process is > listed from the ps man page as : > > "A process that has exited and has a parent, but has not > yet been waited for by the parent, is marked <defunct>." > > However, looking at the 1.9.18p10 code (and Samba2.0 is > similar here) the parent goes into the "talktochild()" > function and then goest into a sys_wait() call to wait > for the dead child (first doing a kill() on the child > if the password change failed, just to make sure the > child is dead). > > If it's hanging - then it's doing so within the talktochild() > call. > > Can you do a truss on the parent when it's in this > state to see what system call(s) it's doing ? > > Trying this with p10 would also help track this > one. > > Cheers, > > Jeremy Allison, > Samba Team. > > -------------------------------------------------------- > Buying an operating system without source is like buying > a self-assembly Space Shuttle with no instructions. > --------------------------------------------------------
On Tue, 3 Nov 1998, Jeremy Allison wrote:> > Basically what it looks like to me is that an smbd process is spawning a > > child which goes defunct for some reason I have yet to determine and the > > smbd process hangs out there waiting for the child. Of course the child is > > never going to answer being defunct, so the parent smbd process sticks > > around and keeps its oplocks. > > > If it's hanging - then it's doing so within the talktochild() > call. > > Can you do a truss on the parent when it's in this > state to see what system call(s) it's doing ? > > Trying this with p10 would also help track this > one.Okay, we're running p10 on this system now (some of the password change problems were due to the chat script, but some still exist). I happened on a request_oplock_break just a few minutes ago. The offending process was the parent of a password change whose child went defunct (the password change attempt simply froze up - never returned any response to the PC). Upon logging in again, the user received the following oplock errors: 1998/11/04 17:34:03 request_oplock_break: no response received to oplock break request to pid 705 on port 64662 for dev = 1980040, inode = 2325 PID 705 is an smbd process with a defunct child (pid 732). Trussing 705 produces the following: 705: waitid(P_PID, 732, 0x08046F38, WEXITED|WTRAPPED) (sleeping...) The part of the password change responsible for syncronizing passwords worked fine (debugging in this program showed that all went properly and it exited cleanly). The smbpasswd entry however was never updated. Hope this helps. Douglas ---------------------------------------------------------------------- Douglas K. Fischer DFischer@Bridgewater.EDU (540) 828 - 5343 Network Systems Engineer C. E. Shull Information Technology Center College Box 36 Bridgewater College Bridgewater, VA 22812 ----------------------------------------------------------------------