Dragan.Krnic@bahn.de
2004-Jul-07 21:34 UTC
Antwort: Re: [Samba] 3.0.4: smbd's + nscd's = 100% CPU; load > 4
Dragan Krnic DB Fernverkehr AG P 955 - 7166 ____________________________________________________ Internetauftritt der Deutschen Bahn AG >> http://www.bahn.de Dragan Krnic 07.07.2004 20:50 An: Hansjoerg.Maurer, jra Kopie: samba Blindkopie: dkrnic@t-online.de, dkrnic@lycos.com Thema: Re: [Samba] 3.0.4: smbd's + nscd's = 100% CPU; load > 4 --------------------------------------------------------------------------------------------------------------------------- As an epilogue, here's a little script to find out if there are any usernames in /etc/group which don't correspond to an existing user: ungroup.awk < /etc/group | sort -r | \ regroup.awk | while read i;\ do id $i > /dev/null;\ done 2>&1 | \ grep -v :x: | sort Whereby "ungroup.awk" is #!/bin/awk -f { partsNo = split ( $0, partString, ":" ); namesNo = split ( partString[4], userName, "," ); printf ( "%s:%s:%s:\n", partString[1], partString[2], partString[3] ); for ( i = 1;i <= namesNo; i++ ) printf ( "%s:%s\n", partString[1], userName[i]); } and "regroup.awk" is #!/bin/awk -f { partsNo = split ( $0, partString, ":" ); if ( partsNo == 2 ) printf ( "%s\n", partString[2] ); else printf ( "%s\n", $0 ); } Feel free to make it better but it did find 5 more users which are either no more on the books or their names are ever so slightly different. This is of course a temporary cure for bad bookkeeping, until Jeremy finds out why such innocuous causes have such drastic consequences and fixes it. Or doesn't. Cheers>>>>>>> the new 3.0.4 Samba installation seems to work fine >>>>>>> except that from time to time but at least a couple >>>>>>> of times a day one or more smbd processes start >>>>>>> running at 20%-40% CPU each and 6 nscd processes >>>>>>> then share the remaining CPU power. System 70%-80% >>>>>>> users the rest 20%-30%. Load rises fast to over 4. >>>>>>> >>>>>>> I'm sure that each such process is just idling, >>>>>>> but why does it engage so much nscd processing? >>>>>>> >>>>>>> As soon as I kill the excessive smbd process(es) >>>>>>> the situation drops to normal, i.e. load < 0,1 >>>>>>> no perceptible CPU%. >>>>>>> >>>>>>> Does anyone know what's happening? >>>>>>> >>>>>> What does strace say ? Can you attach with gdb to >>>>>> a CPU bound process and give a backtrace ? >>>>>> >>>>> Ah backtrace, to see the steps to the black hole? >>>>> OK. Will do, sooen as I get back to office (tomorrow). >>>>> >>>> A pot watched never boils. >>>> But as soon as it happens again, I'll consult >>>> strace and gdb to see how and why it happens. >>>> >>> >>> It does, when you stop watching. >>> I was away yesterday and what do I see >>> this morning: >>> >>> top - 09:55:35 up 6 days, 17:28, 6 users, load average: 3.59,3.82, 3.15>>> Tasks: 182 total, 3 running, 179 sleeping, 0 stopped, 0 zombie >>> Cpu(s): 21.6% user, 78.4% system, 0.0% nice, 0.0% idle >>> Mem: 2060704k total, 2004324k used, 56380k free, 185272kbuffers>>> Swap: 2402296k total, 5236k used, 2397060k free, 1068752kcached>>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ Command >>> 12808 robf 25 0 2996 2596 2244 R 23.6 0.1 4:46.98 smbd >>> 12741 robf 22 0 3028 2628 2280 R 20.6 0.1 5:44.55 smbd >>> 2354 root 15 0 724 716 536 S 15.3 0.0 150:31.23 nscd >>> 2356 root 15 0 724 716 536 S 14.9 0.0 150:35.87 nscd >>> 2352 root 15 0 724 716 536 S 9.0 0.0 151:14.08 nscd >>> 2353 root 15 0 724 716 536 S 6.3 0.0 150:39.89 nscd >>> 2355 root 15 0 724 716 536 S 6.3 0.0 150:31.82 nscd >>> 2350 root 15 0 724 716 536 S 3.7 0.0 150:49.78 nscd >>> >>> I attached both of the smbd processes to gdb and >>> backtrace was always: >>> >>> #0 0x402e5328 in read () from /lib/libc.so.6 >>> #1 0x40343b90 in __DTOR_END__ () from /lib/libc.so.6 >>> #2 0x4031d58b in __nscd_getpwnam_r () from /lib/libc.so.6 >>> #3 0x402c130d in getpwnam_r@@GLIBC_2.1.2 () from /lib/libc.so.6 >>> #4 0x402c0e6f in getpwnam () from /lib/libc.so.6 >>> #5 0x081298fd in get_memberuids () >>> #6 0x08129b09 in _samr_query_groupmem () >>> #7 0x081215d1 in api_samr_query_groupmem () >>> #8 0x081358b2 in api_rpcTNP () >>> #9 0x08135632 in api_pipe_request () >>> #10 0x0812fc10 in process_request_pdu () >>> #11 0x0812fdec in process_complete_pdu () >>> #12 0x08130069 in process_incoming_data () >>> #13 0x08130220 in write_to_internal_pipe () >>> #14 0x081301a4 in write_to_pipe () >>> #15 0x080883e0 in api_fd_reply () >>> #16 0x080885b9 in named_pipe () >>> #17 0x080891bc in reply_trans () >>> #18 0x080c80e0 in switch_message () >>> #19 0x080c8172 in construct_reply () >>> #20 0x080c8491 in process_smb () >>> #21 0x080c9004 in smbd_process () >>> #22 0x081f812e in main () >>> #23 0x402268ae in __libc_start_main () from /lib/libc.so.6 >>> >>> After killing both smbd processes with -9 the top soon >>> stabilizes at: >>> >>> top - 10:12:01 up 6 days, 17:45, 6 users, load average: 0.03,0.19, 1.17>>> Tasks: 175 total, 2 running, 173 sleeping, 0 stopped, 0 zombie >>> Cpu(s): 0.0% user, 0.7% system, 0.0% nice, 99.3% idle >>> Mem: 2060704k total, 2034272k used, 26432k free, 185272kbuffers>>> Swap: 2402296k total, 5236k used, 2397060k free, 1100020kcached>>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ Command >>> 15182 root 15 0 956 956 700 R 0.3 0.0 0:03.67 top >>> >>> Unfortunately I didn't trace the nscd processes. >>> What a shame! I'll do it next time. >>> >>> Nobody complained yet about reduced performance. >>> >>> It's hard to tell when this behaviour started. >>> The upper bound seems to be 9 hours, >>> the combined run times of the nscd processe, >>> some time during the night when the computers >>> were totally quiet. The lower bound based on >>> the run times of the smbd processes is more >>> like half an hour ago. >>> >>> This is the fourth out of 5 times that the same user, >>> "robf", is involved as the effective UID of the smbd process. >>> The other one time was root's own smbd. >>> >>> Jeremy, can I provide more information? >> >> I had a similiar Problem , and a loglevel of 4 shows , >> that samba was trying to look up a user nobody and a >> user Administrator, all the time. >> If I killed nscd the load of the ldap server becomes high... >> >> I added these user to my ldap backend, and the problem disappears. > > It's a valuable pointer but I'm not sure it really > applies here, Hans. For one, your problem seems to > have been persistent up until you added those users > to your ldap backend. In my case, it happens very > intermittently. Besides, my passdb backend is the > default "smbpasswd". > > I'll give it a more thorough check next time it > happens, including user name lookups.Bingo! It is exactly the same case. Two user names were spelled out slightly wrong in the /etc/group. As a consequence, under certain circumstances the "smbd" process keeps trying to resolve the name and doesn't take "no" from "nscd" for an answer. Each "smbd" process is looping around these 5 system calls: 1) create a socket, 2) connect to nscd's socket, 3) write the mis-spelled name, 4) read negative answer 5) close socket: socket(PF_UNIX,SOCK_STREAM,0)=26 connect(26,{sa_family=AF_UNIX,path="/var/run/.nscd_socket"},110)=0 writev(26,[{"\2\0...\0\22\0...",12},{"GeorgeDubbyaBusch\0",18}],2)=30 read(26,"\2\0\0\...\0\377\377\377\377\377\377"...,36)=36 close(26)=0 and the nscds spin like this poll({fd=3,events=POLLRDNORM,revents=POLLRDNORM}],1,-1)=1 accept(3,,NULL)=9 read(9,"\2\0\0\0\0\0\0\0\22\0\0\0",12)=12 read(9,"GeorgeDubbyaBusch\0",18)=18 write(9,"\2\0\...\0\377\377\377\377\377\377"...,36=36 close(9)=0 Since both mis-spelled names are among the earliest user names in 2 most frequently used groups (one is "users"), it's hard to tell why the smbd processes spin out of control so infrequently. Jeremy will know more about that. I'll write a script to check such discrepancies in the file. Thanks Hans. Thanks Jeremy. --------- Diese E-Mail k?nnte vertrauliche und/oder rechtlich gesch?tzte Informationen enthalten. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind nicht gestattet. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ----------