We have an occasional problem which manifests with multiple processes being created for a particular user. For example, for a user "xy004": xy004 8463 0.0 0.0 6.02M 0K ?? IW 11:48:59 0:02.02 smbd xy004 9426 0.0 0.0 5.99M 0K ?? IW 11:52:18 0:03.68 smbd xy004 10433 0.0 0.0 5.81M 0K ?? IW 12:17:20 0:00.85 smbd xy004 12211 0.0 0.0 5.81M 0K ?? IW 12:29:50 0:00.88 smbd xy004 12588 0.0 0.0 5.81M 0K ?? IW 12:16:05 0:00.55 smbd xy004 12968 0.0 0.0 5.81M 0K ?? IW 12:18:40 0:00.85 smbd xy004 13069 0.0 0.0 5.81M 0K ?? IW 12:19:55 0:00.86 smbd xy004 13223 0.0 0.0 5.81M 0K ?? IW 12:21:10 0:00.83 smbd xy004 13396 0.0 0.0 5.81M 0K ?? IW 12:22:27 0:00.86 smbd xy004 13579 0.0 0.0 5.81M 0K ?? IW 12:23:42 0:00.82 smbd xy004 13772 0.0 0.0 5.81M 0K ?? IW 12:24:52 0:00.87 smbd xy004 13887 0.0 0.0 5.81M 0K ?? IW 12:26:07 0:00.83 smbd xy004 14064 0.0 0.0 5.81M 0K ?? IW 12:27:18 0:00.82 smbd xy004 14220 0.0 0.0 5.81M 0K ?? IW 12:28:35 0:00.85 smbd xy004 14743 0.0 0.0 5.81M 0K ?? IW 12:31:05 0:00.98 smbd xy004 14963 0.0 0.0 5.81M 0K ?? IW 12:32:30 0:00.93 smbd xy004 15609 0.0 0.0 5.81M 0K ?? IW 12:33:55 0:00.87 smbd This has manifested in two ways: 1) For just a single user. The processes remain following disconnection from Samba, and *CANNOT* be killed (with "kill -9" etc). In this situtaion they have status <defunct>, and the user cannot reconnect to Samba. This has caused a problem to the user - they couldn't connect to Samba, except if the Samba server was rebooted. As we have parallel servers, we have however been able to tell users to use a different server. 2) For multiple users. This has happened once, but required an emergency reboot of the server due to the process table limit being reached - the above process list was obtained just before shutting down the server in this case. We are using Compaq Tru64 Unix (ex- Digital Unix) and Samba 2.0.6. Has anyone else encountered these problems and know what causes them? I can find no evidence in the log files of anything wrong for the user to which this happens. TIA Andrew =====================================================================Dr Andrew Boswell email : A.Boswell@uea..ac.uk School Liaison Consultant phone : +44-1603-593856 IT and Computing Services fax : +44-1603-593467 University of East Anglia Norwich, NR4 7TJ, UK
I've seen this on AIX 4.3.2 with 2.0.6. We chalked it up to Win2k because we had all NT4 and one NT4 box upgraded to Lose2k which was part of the domain at that point. We had to reboot on two consecutive Tuesdays (no relevance really) and we've been fine ever since...perhaps we picked the wrong culprit? We had 156 blocked processes waiting for a resource - what I have no idea because the log was barren. We had 1342 processes that were smbd's. And get this - LOTS of idle - NOT wait time. Bill Andrew Boswell wrote:> > We have an occasional problem which manifests with multiple processes > being created for a particular user. For example, for a user "xy004": > > xy004 8463 0.0 0.0 6.02M 0K ?? IW 11:48:59 0:02.02 smbd > xy004 9426 0.0 0.0 5.99M 0K ?? IW 11:52:18 0:03.68 smbd > xy004 10433 0.0 0.0 5.81M 0K ?? IW 12:17:20 0:00.85 smbd > xy004 12211 0.0 0.0 5.81M 0K ?? IW 12:29:50 0:00.88 smbd > xy004 12588 0.0 0.0 5.81M 0K ?? IW 12:16:05 0:00.55 smbd > xy004 12968 0.0 0.0 5.81M 0K ?? IW 12:18:40 0:00.85 smbd > xy004 13069 0.0 0.0 5.81M 0K ?? IW 12:19:55 0:00.86 smbd > xy004 13223 0.0 0.0 5.81M 0K ?? IW 12:21:10 0:00.83 smbd > xy004 13396 0.0 0.0 5.81M 0K ?? IW 12:22:27 0:00.86 smbd > xy004 13579 0.0 0.0 5.81M 0K ?? IW 12:23:42 0:00.82 smbd > xy004 13772 0.0 0.0 5.81M 0K ?? IW 12:24:52 0:00.87 smbd > xy004 13887 0.0 0.0 5.81M 0K ?? IW 12:26:07 0:00.83 smbd > xy004 14064 0.0 0.0 5.81M 0K ?? IW 12:27:18 0:00.82 smbd > xy004 14220 0.0 0.0 5.81M 0K ?? IW 12:28:35 0:00.85 smbd > xy004 14743 0.0 0.0 5.81M 0K ?? IW 12:31:05 0:00.98 smbd > xy004 14963 0.0 0.0 5.81M 0K ?? IW 12:32:30 0:00.93 smbd > xy004 15609 0.0 0.0 5.81M 0K ?? IW 12:33:55 0:00.87 smbd > > This has manifested in two ways: > > 1) For just a single user. The processes remain following > disconnection from Samba, and *CANNOT* be killed (with "kill -9" etc). > In this situtaion they have status <defunct>, and the user cannot > reconnect to Samba. This has caused a problem to the user - they > couldn't connect to Samba, except if the Samba server was rebooted. As > we have parallel servers, we have however been able to tell users to > use a different server. > > 2) For multiple users. This has happened once, but required an > emergency reboot of the server due to the process table limit being > reached - the above process list was obtained just before shutting down > the server in this case. > > We are using Compaq Tru64 Unix (ex- Digital Unix) and Samba 2.0.6. > > Has anyone else encountered these problems and know what causes them? > I can find no evidence in the log files of anything wrong for the user > to which this happens. > > TIA > > Andrew > > =====================================================================> Dr Andrew Boswell email : A.Boswell@uea..ac.uk > School Liaison Consultant phone : +44-1603-593856 > IT and Computing Services fax : +44-1603-593467 > University of East Anglia > Norwich, NR4 7TJ, UK-- /------------------------------------------------------\ | | | William E. Jojo, Jr. | | | | Senior Systems and Network Specialist | | | | Hudson Valley Community College | | | | (518) 629 7540 | | | | jojowil@hvcc.edu | | | \------------------------------------------------------/ We are young wandering the face of the earth Wondering what our dreams might be worth Learning that we're only immortal... ...for a limited time
Andrew Boswell asked about multiple processes, e.g.: | xy004 8463 0.0 0.0 6.02M 0K ?? IW 11:48:59 0:02.02 | smbd | xy004 9426 0.0 0.0 5.99M 0K ?? IW 11:52:18 0:03.68 | smbd ... A quick avoidance: try setting keepalive = 30 in the globals section of your smb.conf. This will trigger a check for a live connection after every 30 seconds of inactivity. (For systems which aren't suffering from the problem you have, something like every 10 minutes is perfectly sufficient). Can you get logs at log level = 3 for one of the failures? That could help us track it down. If only machine xyz fails, you can just log for it: create an xyz.conf file containing log level = 3 log file = %m.log and add include = /some/path/to/%m.conf to the globals section. This will log just xyz at a high level to xzy.log (see also http://www.oreilly.com/catalog/samba/chapter/book/ch09_01.html) --dave -- David Collier-Brown, | Always do right. This will gratify some people 185 Ellerslie Ave., | and astonish the rest. -- Mark Twain Willowdale, Ontario | //www.oreilly.com/catalog/samba/author.html Work: (905) 415-2849 Home: (416) 223-8968 Email: davecb@canada.sun.com
I've seen this happen when one or more NFS mounts hang. The processes in question hang on the NFS access and cant be killed. The NT redirector times the session out, closes the connection, and opens a new one. This will happen every 45 seconds (the default redirector timeout). Limiting your samba exports to local filesystems should prevent this, as well as improve response time. Frank Varnavas> > From: William Jojo <jojowil@hvcc.edu> > To: A.Boswell@uea.ac.uk > Subject: Re: Multiple smbd processes generated > > I've seen this on AIX 4.3.2 with 2.0.6. We chalked it up to > Win2k because we had > all NT4 and one NT4 box upgraded to Lose2k which was part of > the domain at that > point. We had to reboot on two consecutive Tuesdays (no > relevance really) and > we've been fine ever since...perhaps we picked the wrong culprit? > > We had 156 blocked processes waiting for a resource - what I > have no idea > because the log was barren. We had 1342 processes that were > smbd's. And get this > - LOTS of idle - NOT wait time. > > Bill > > Andrew Boswell wrote: > > > > We have an occasional problem which manifests with multiple > processes > > being created for a particular user. For example, for a > user "xy004": > > > > xy004 8463 0.0 0.0 6.02M 0K ?? IW > 11:48:59 0:02.02 smbd > > xy004 9426 0.0 0.0 5.99M 0K ?? IW > 11:52:18 0:03.68 smbd > > xy004 10433 0.0 0.0 5.81M 0K ?? IW > 12:17:20 0:00.85 smbd > > xy004 12211 0.0 0.0 5.81M 0K ?? IW > 12:29:50 0:00.88 smbd > > xy004 12588 0.0 0.0 5.81M 0K ?? IW > 12:16:05 0:00.55 smbd > > xy004 12968 0.0 0.0 5.81M 0K ?? IW > 12:18:40 0:00.85 smbd > > xy004 13069 0.0 0.0 5.81M 0K ?? IW > 12:19:55 0:00.86 smbd > > xy004 13223 0.0 0.0 5.81M 0K ?? IW > 12:21:10 0:00.83 smbd > > xy004 13396 0.0 0.0 5.81M 0K ?? IW > 12:22:27 0:00.86 smbd > > xy004 13579 0.0 0.0 5.81M 0K ?? IW > 12:23:42 0:00.82 smbd > > xy004 13772 0.0 0.0 5.81M 0K ?? IW > 12:24:52 0:00.87 smbd > > xy004 13887 0.0 0.0 5.81M 0K ?? IW > 12:26:07 0:00.83 smbd > > xy004 14064 0.0 0.0 5.81M 0K ?? IW > 12:27:18 0:00.82 smbd > > xy004 14220 0.0 0.0 5.81M 0K ?? IW > 12:28:35 0:00.85 smbd > > xy004 14743 0.0 0.0 5.81M 0K ?? IW > 12:31:05 0:00.98 smbd > > xy004 14963 0.0 0.0 5.81M 0K ?? IW > 12:32:30 0:00.93 smbd > > xy004 15609 0.0 0.0 5.81M 0K ?? IW > 12:33:55 0:00.87 smbd > > > > This has manifested in two ways: > > > > 1) For just a single user. The processes remain following > > disconnection from Samba, and *CANNOT* be killed (with > "kill -9" etc). > > In this situtaion they have status <defunct>, and the user cannot > > reconnect to Samba. This has caused a problem to the user - they > > couldn't connect to Samba, except if the Samba server was > rebooted. As > > we have parallel servers, we have however been able to tell users to > > use a different server. > > > > 2) For multiple users. This has happened once, but required an > > emergency reboot of the server due to the process table limit being > > reached - the above process list was obtained just before > shutting down > > the server in this case. > > > > We are using Compaq Tru64 Unix (ex- Digital Unix) and Samba 2.0.6. > > > > Has anyone else encountered these problems and know what > causes them? > > I can find no evidence in the log files of anything wrong > for the user > > to which this happens. > > > > TIA > > > > Andrew > > > > > =====================================================================> > Dr Andrew Boswell email : A.Boswell@uea..ac.uk > > School Liaison Consultant phone : +44-1603-593856 > > IT and Computing Services fax : +44-1603-593467 > > University of East Anglia > > Norwich, NR4 7TJ, UK > > -- > > > /------------------------------------------------------\ > | | > | William E. Jojo, Jr. | > | | > | Senior Systems and Network Specialist | > | | > | Hudson Valley Community College | > | | > | (518) 629 7540 | > | | > | jojowil@hvcc.edu | > | | > \------------------------------------------------------/ >
Andrew Boswell wrote:> > This has manifested in two ways: > > 1) For just a single user. The processes remain following > disconnection from Samba, and *CANNOT* be killed (with "kill -9" etc). > In this situtaion they have status <defunct>, and the user cannot > reconnect to Samba. This has caused a problem to the user - they > couldn't connect to Samba, except if the Samba server was rebooted. As > we have parallel servers, we have however been able to tell users to > use a different server.If the process cannot be killed with -9, then it isn't a Samba problem. Some kernel resource the processes are waiting on is not responding (usually nfs). Are you re-exporting NFS drives ? Regards, Jeremy Allison, Samba Team. -- -------------------------------------------------------- Buying an operating system without source is like buying a self-assembly Space Shuttle with no instructions. --------------------------------------------------------