Hi Everyone, the new 3.0.4 Samba installation seems to work find except that from time to time but at least a couple of times a day one or more smbd processes start running at 20%-40% CPU each and 6 nscd processes then share the remaining CPU power. System 70%-80% users the rest 20%-30%. Load rises fast to over 4. I'm sure that each such process is just idling, but why does it engage so much nscd processing? As soon as I kill the excessive smbd process(es) the situation drops to normal, i.e. load < 0,1 no perceptible CPU%. Does anyone know what's happening?
On Thu, Jul 01, 2004 at 08:58:27AM -0500, Dragan Krnic wrote:> Hi Everyone, > > the new 3.0.4 Samba installation seems to work find > except that from time to time but at least a couple > of times a day one or more smbd processes start > running at 20%-40% CPU each and 6 nscd processes > then share the remaining CPU power. System 70%-80% > users the rest 20%-30%. Load rises fast to over 4. > > I'm sure that each such process is just idling, > but why does it engage so much nscd processing? > > As soon as I kill the excessive smbd process(es) > the situation drops to normal, i.e. load < 0,1 > no perceptible CPU%. > > Does anyone know what's happening?What does strace say ? Can you attach with gdb to a CPU bound process and give a backtrace ? Jeremy.
> > the new 3.0.4 Samba installation seems to work find > > except that from time to time but at least a couple > > of times a day one or more smbd processes start > > running at 20%-40% CPU each and 6 nscd processes > > then share the remaining CPU power. System 70%-80% > > users the rest 20%-30%. Load rises fast to over 4. > > > > I'm sure that each such process is just idling, > > but why does it engage so much nscd processing? > > > > As soon as I kill the excessive smbd process(es) > > the situation drops to normal, i.e. load < 0,1 > > no perceptible CPU%. > > > > Does anyone know what's happening? > > What does strace say ? Can you attach with gdb to > a CPU bound process and give a backtrace ?Ah backtrace, to see the steps to the black hole? OK. Will do, sooen as I get back to office (tomorrow).
>>> the new 3.0.4 Samba installation seems to work find >>> except that from time to time but at least a couple >>> of times a day one or more smbd processes start >>> running at 20%-40% CPU each and 6 nscd processes >>> then share the remaining CPU power. System 70%-80% >>> users the rest 20%-30%. Load rises fast to over 4. >>> >>> I'm sure that each such process is just idling, >>> but why does it engage so much nscd processing? >>> >>> As soon as I kill the excessive smbd process(es) >>> the situation drops to normal, i.e. load < 0,1 >>> no perceptible CPU%. >>> >>> Does anyone know what's happening? >> >> What does strace say ? Can you attach with gdb to >> a CPU bound process and give a backtrace ? > > Ah backtrace, to see the steps to the black hole? > OK. Will do, sooen as I get back to office (tomorrow).A pot watched never boils. But as soon as it happens again, I'll consult strace and gdb to see how and why it happens.
>>>> the new 3.0.4 Samba installation seems to work fine >>>> except that from time to time but at least a couple >>>> of times a day one or more smbd processes start >>>> running at 20%-40% CPU each and 6 nscd processes >>>> then share the remaining CPU power. System 70%-80% >>>> users the rest 20%-30%. Load rises fast to over 4. >>>> >>>> I'm sure that each such process is just idling, >>>> but why does it engage so much nscd processing? >>>> >>>> As soon as I kill the excessive smbd process(es) >>>> the situation drops to normal, i.e. load < 0,1 >>>> no perceptible CPU%. >>>> >>>> Does anyone know what's happening? >>> >>> What does strace say ? Can you attach with gdb to >>> a CPU bound process and give a backtrace ? >> >> Ah backtrace, to see the steps to the black hole? >> OK. Will do, sooen as I get back to office (tomorrow). > > A pot watched never boils. > But as soon as it happens again, I'll consult > strace and gdb to see how and why it happens.It does, when you stop watching. I was away yesterday and what do I see this morning: top - 09:55:35 up 6 days, 17:28, 6 users, load average: 3.59, 3.82, 3.15 Tasks: 182 total, 3 running, 179 sleeping, 0 stopped, 0 zombie Cpu(s): 21.6% user, 78.4% system, 0.0% nice, 0.0% idle Mem: 2060704k total, 2004324k used, 56380k free, 185272k buffers Swap: 2402296k total, 5236k used, 2397060k free, 1068752k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ Command 12808 robf 25 0 2996 2596 2244 R 23.6 0.1 4:46.98 smbd 12741 robf 22 0 3028 2628 2280 R 20.6 0.1 5:44.55 smbd 2354 root 15 0 724 716 536 S 15.3 0.0 150:31.23 nscd 2356 root 15 0 724 716 536 S 14.9 0.0 150:35.87 nscd 2352 root 15 0 724 716 536 S 9.0 0.0 151:14.08 nscd 2353 root 15 0 724 716 536 S 6.3 0.0 150:39.89 nscd 2355 root 15 0 724 716 536 S 6.3 0.0 150:31.82 nscd 2350 root 15 0 724 716 536 S 3.7 0.0 150:49.78 nscd I attached both of the smbd processes to gdb and backtrace was always: #0 0x402e5328 in read () from /lib/libc.so.6 #1 0x40343b90 in __DTOR_END__ () from /lib/libc.so.6 #2 0x4031d58b in __nscd_getpwnam_r () from /lib/libc.so.6 #3 0x402c130d in getpwnam_r@@GLIBC_2.1.2 () from /lib/libc.so.6 #4 0x402c0e6f in getpwnam () from /lib/libc.so.6 #5 0x081298fd in get_memberuids () #6 0x08129b09 in _samr_query_groupmem () #7 0x081215d1 in api_samr_query_groupmem () #8 0x081358b2 in api_rpcTNP () #9 0x08135632 in api_pipe_request () #10 0x0812fc10 in process_request_pdu () #11 0x0812fdec in process_complete_pdu () #12 0x08130069 in process_incoming_data () #13 0x08130220 in write_to_internal_pipe () #14 0x081301a4 in write_to_pipe () #15 0x080883e0 in api_fd_reply () #16 0x080885b9 in named_pipe () #17 0x080891bc in reply_trans () #18 0x080c80e0 in switch_message () #19 0x080c8172 in construct_reply () #20 0x080c8491 in process_smb () #21 0x080c9004 in smbd_process () #22 0x081f812e in main () #23 0x402268ae in __libc_start_main () from /lib/libc.so.6 After killing both smbd processes with -9 the top soon stabilizes at: top - 10:12:01 up 6 days, 17:45, 6 users, load average: 0.03, 0.19, 1.17 Tasks: 175 total, 2 running, 173 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0% user, 0.7% system, 0.0% nice, 99.3% idle Mem: 2060704k total, 2034272k used, 26432k free, 185272k buffers Swap: 2402296k total, 5236k used, 2397060k free, 1100020k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ Command 15182 root 15 0 956 956 700 R 0.3 0.0 0:03.67 top Unfortunately I didn't trace the nscd processes. What a shame! I'll do it next time. Nobody complained yet about reduced performance. It's hard to tell when this behaviour started. The upper bound seems to be 9 hours, the combined run times of the nscd processe, some time during the night when the computers were totally quiet. The lower bound based on the run times of the smbd processes is more like half an hour ago. This is the fourth out of 5 times that the same user, "robf", is involved as the effective UID of the smbd process. The other one time was root's own smbd. Jeremy, can I provide more information? Cheers, Dragan
>>>>>> the new 3.0.4 Samba installation seems to work fine >>>>>> except that from time to time but at least a couple >>>>>> of times a day one or more smbd processes start >>>>>> running at 20%-40% CPU each and 6 nscd processes >>>>>> then share the remaining CPU power. System 70%-80% >>>>>> users the rest 20%-30%. Load rises fast to over 4. >>>>>> >>>>>> I'm sure that each such process is just idling, >>>>>> but why does it engage so much nscd processing? >>>>>> >>>>>> As soon as I kill the excessive smbd process(es) >>>>>> the situation drops to normal, i.e. load < 0,1 >>>>>> no perceptible CPU%. >>>>>> >>>>>> Does anyone know what's happening? >>>>>> >>>>> What does strace say ? Can you attach with gdb to >>>>> a CPU bound process and give a backtrace ? >>>>> >>>> Ah backtrace, to see the steps to the black hole? >>>> OK. Will do, sooen as I get back to office (tomorrow). >>>> >>> A pot watched never boils. >>> But as soon as it happens again, I'll consult >>> strace and gdb to see how and why it happens. >>> >> >> It does, when you stop watching. >> I was away yesterday and what do I see >> this morning: >> >> top - 09:55:35 up 6 days, 17:28, 6 users, load average: 3.59, 3.82,3.15>> Tasks: 182 total, 3 running, 179 sleeping, 0 stopped, 0 zombie >> Cpu(s): 21.6% user, 78.4% system, 0.0% nice, 0.0% idle >> Mem: 2060704k total, 2004324k used, 56380k free, 185272kbuffers>> Swap: 2402296k total, 5236k used, 2397060k free, 1068752kcached>> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ Command >> 12808 robf 25 0 2996 2596 2244 R 23.6 0.1 4:46.98 smbd >> 12741 robf 22 0 3028 2628 2280 R 20.6 0.1 5:44.55 smbd >> 2354 root 15 0 724 716 536 S 15.3 0.0 150:31.23 nscd >> 2356 root 15 0 724 716 536 S 14.9 0.0 150:35.87 nscd >> 2352 root 15 0 724 716 536 S 9.0 0.0 151:14.08 nscd >> 2353 root 15 0 724 716 536 S 6.3 0.0 150:39.89 nscd >> 2355 root 15 0 724 716 536 S 6.3 0.0 150:31.82 nscd >> 2350 root 15 0 724 716 536 S 3.7 0.0 150:49.78 nscd >> >> I attached both of the smbd processes to gdb and >> backtrace was always: >> >> #0 0x402e5328 in read () from /lib/libc.so.6 >> #1 0x40343b90 in __DTOR_END__ () from /lib/libc.so.6 >> #2 0x4031d58b in __nscd_getpwnam_r () from /lib/libc.so.6 >> #3 0x402c130d in getpwnam_r@@GLIBC_2.1.2 () from /lib/libc.so.6 >> #4 0x402c0e6f in getpwnam () from /lib/libc.so.6 >> #5 0x081298fd in get_memberuids () >> #6 0x08129b09 in _samr_query_groupmem () >> #7 0x081215d1 in api_samr_query_groupmem () >> #8 0x081358b2 in api_rpcTNP () >> #9 0x08135632 in api_pipe_request () >> #10 0x0812fc10 in process_request_pdu () >> #11 0x0812fdec in process_complete_pdu () >> #12 0x08130069 in process_incoming_data () >> #13 0x08130220 in write_to_internal_pipe () >> #14 0x081301a4 in write_to_pipe () >> #15 0x080883e0 in api_fd_reply () >> #16 0x080885b9 in named_pipe () >> #17 0x080891bc in reply_trans () >> #18 0x080c80e0 in switch_message () >> #19 0x080c8172 in construct_reply () >> #20 0x080c8491 in process_smb () >> #21 0x080c9004 in smbd_process () >> #22 0x081f812e in main () >> #23 0x402268ae in __libc_start_main () from /lib/libc.so.6 >> >> After killing both smbd processes with -9 the top soon >> stabilizes at: >> >> top - 10:12:01 up 6 days, 17:45, 6 users, load average: 0.03, 0.19,1.17>> Tasks: 175 total, 2 running, 173 sleeping, 0 stopped, 0 zombie >> Cpu(s): 0.0% user, 0.7% system, 0.0% nice, 99.3% idle >> Mem: 2060704k total, 2034272k used, 26432k free, 185272kbuffers>> Swap: 2402296k total, 5236k used, 2397060k free, 1100020kcached>> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+Command>> 15182 root 15 0 956 956 700 R 0.3 0.0 0:03.67 top >> >> Unfortunately I didn't trace the nscd processes. >> What a shame! I'll do it next time. >> >> Nobody complained yet about reduced performance. >> >> It's hard to tell when this behaviour started. >> The upper bound seems to be 9 hours, >> the combined run times of the nscd processe, >> some time during the night when the computers >> were totally quiet. The lower bound based on >> the run times of the smbd processes is more >> like half an hour ago. >> >> This is the fourth out of 5 times that the same user, >> "robf", is involved as the effective UID of the smbd process. >> The other one time was root's own smbd. >> >> Jeremy, can I provide more information? > > I had a similiar Problem , and a loglevel of 4 shows , > that samba was trying to look up a user nobody and a > user Administrator, all the time. > If I killed nscd the load of the ldap server becomes high... > > I added these user to my ldap backend, and the problem disappears.It's a valuable pointer but I'm not sure it really applies here, Hans. For one, your problem seems to have been persistent up until you added those users to your ldap backend. In my case, it happens very intermittently. Besides, my passdb backend is the default "smbpasswd". I'll give it a more thorough check next time it happens, including user name lookups.
>> #5 0x081298fd in get_memberuids () >> #6 0x08129b09 in _samr_query_groupmem () >> #7 0x081215d1 in api_samr_query_groupmem () >> #8 0x081358b2 in api_rpcTNP () >> #9 0x08135632 in api_pipe_request () >> #10 0x0812fc10 in process_request_pdu () >> #11 0x0812fdec in process_complete_pdu () >> #12 0x08130069 in process_incoming_data () >> #13 0x08130220 in write_to_internal_pipe () >> #14 0x081301a4 in write_to_pipe () >> #15 0x080883e0 in api_fd_reply () >> #16 0x080885b9 in named_pipe () >> #17 0x080891bc in reply_trans () >> #18 0x080c80e0 in switch_message () >> #19 0x080c8172 in construct_reply () >> #20 0x080c8491 in process_smb () >> #21 0x080c9004 in smbd_process () >> #22 0x081f812e in main () >> #23 0x402268ae in __libc_start_main () from /lib/libc.so.6 > > get_memberuids is expanding out the list of users in a group. Can you > tell me what the group membership matrix looks like for this user ?He belongs to 5 groups, with 53, 14, 7, 21 and 128 members. Most users belong to 3-4 project groups, 1-2 interest groups plus users(== "Domain Users").
Dragan.Krnic@bahn.de
2004-Jul-07 18:50 UTC
[Samba] 3.0.4: smbd's + nscd's = 100% CPU; load > 4
>>>>>>> the new 3.0.4 Samba installation seems to work fine >>>>>>> except that from time to time but at least a couple >>>>>>> of times a day one or more smbd processes start >>>>>>> running at 20%-40% CPU each and 6 nscd processes >>>>>>> then share the remaining CPU power. System 70%-80% >>>>>>> users the rest 20%-30%. Load rises fast to over 4. >>>>>>> >>>>>>> I'm sure that each such process is just idling, >>>>>>> but why does it engage so much nscd processing? >>>>>>> >>>>>>> As soon as I kill the excessive smbd process(es) >>>>>>> the situation drops to normal, i.e. load < 0,1 >>>>>>> no perceptible CPU%. >>>>>>> >>>>>>> Does anyone know what's happening? >>>>>>> >>>>>> What does strace say ? Can you attach with gdb to >>>>>> a CPU bound process and give a backtrace ? >>>>>> >>>>> Ah backtrace, to see the steps to the black hole? >>>>> OK. Will do, sooen as I get back to office (tomorrow). >>>>> >>>> A pot watched never boils. >>>> But as soon as it happens again, I'll consult >>>> strace and gdb to see how and why it happens. >>>> >>> >>> It does, when you stop watching. >>> I was away yesterday and what do I see >>> this morning: >>> >>> top - 09:55:35 up 6 days, 17:28, 6 users, load average: 3.59,3.82, 3.15>>> Tasks: 182 total, 3 running, 179 sleeping, 0 stopped, 0 zombie >>> Cpu(s): 21.6% user, 78.4% system, 0.0% nice, 0.0% idle >>> Mem: 2060704k total, 2004324k used, 56380k free, 185272kbuffers>>> Swap: 2402296k total, 5236k used, 2397060k free, 1068752kcached>>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ Command >>> 12808 robf 25 0 2996 2596 2244 R 23.6 0.1 4:46.98 smbd >>> 12741 robf 22 0 3028 2628 2280 R 20.6 0.1 5:44.55 smbd >>> 2354 root 15 0 724 716 536 S 15.3 0.0 150:31.23 nscd >>> 2356 root 15 0 724 716 536 S 14.9 0.0 150:35.87 nscd >>> 2352 root 15 0 724 716 536 S 9.0 0.0 151:14.08 nscd >>> 2353 root 15 0 724 716 536 S 6.3 0.0 150:39.89 nscd >>> 2355 root 15 0 724 716 536 S 6.3 0.0 150:31.82 nscd >>> 2350 root 15 0 724 716 536 S 3.7 0.0 150:49.78 nscd >>> >>> I attached both of the smbd processes to gdb and >>> backtrace was always: >>> >>> #0 0x402e5328 in read () from /lib/libc.so.6 >>> #1 0x40343b90 in __DTOR_END__ () from /lib/libc.so.6 >>> #2 0x4031d58b in __nscd_getpwnam_r () from /lib/libc.so.6 >>> #3 0x402c130d in getpwnam_r@@GLIBC_2.1.2 () from /lib/libc.so.6 >>> #4 0x402c0e6f in getpwnam () from /lib/libc.so.6 >>> #5 0x081298fd in get_memberuids () >>> #6 0x08129b09 in _samr_query_groupmem () >>> #7 0x081215d1 in api_samr_query_groupmem () >>> #8 0x081358b2 in api_rpcTNP () >>> #9 0x08135632 in api_pipe_request () >>> #10 0x0812fc10 in process_request_pdu () >>> #11 0x0812fdec in process_complete_pdu () >>> #12 0x08130069 in process_incoming_data () >>> #13 0x08130220 in write_to_internal_pipe () >>> #14 0x081301a4 in write_to_pipe () >>> #15 0x080883e0 in api_fd_reply () >>> #16 0x080885b9 in named_pipe () >>> #17 0x080891bc in reply_trans () >>> #18 0x080c80e0 in switch_message () >>> #19 0x080c8172 in construct_reply () >>> #20 0x080c8491 in process_smb () >>> #21 0x080c9004 in smbd_process () >>> #22 0x081f812e in main () >>> #23 0x402268ae in __libc_start_main () from /lib/libc.so.6 >>> >>> After killing both smbd processes with -9 the top soon >>> stabilizes at: >>> >>> top - 10:12:01 up 6 days, 17:45, 6 users, load average: 0.03,0.19, 1.17>>> Tasks: 175 total, 2 running, 173 sleeping, 0 stopped, 0 zombie >>> Cpu(s): 0.0% user, 0.7% system, 0.0% nice, 99.3% idle >>> Mem: 2060704k total, 2034272k used, 26432k free, 185272kbuffers>>> Swap: 2402296k total, 5236k used, 2397060k free, 1100020kcached>>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ Command >>> 15182 root 15 0 956 956 700 R 0.3 0.0 0:03.67 top >>> >>> Unfortunately I didn't trace the nscd processes. >>> What a shame! I'll do it next time. >>> >>> Nobody complained yet about reduced performance. >>> >>> It's hard to tell when this behaviour started. >>> The upper bound seems to be 9 hours, >>> the combined run times of the nscd processe, >>> some time during the night when the computers >>> were totally quiet. The lower bound based on >>> the run times of the smbd processes is more >>> like half an hour ago. >>> >>> This is the fourth out of 5 times that the same user, >>> "robf", is involved as the effective UID of the smbd process. >>> The other one time was root's own smbd. >>> >>> Jeremy, can I provide more information? >> >> I had a similiar Problem , and a loglevel of 4 shows , >> that samba was trying to look up a user nobody and a >> user Administrator, all the time. >> If I killed nscd the load of the ldap server becomes high... >> >> I added these user to my ldap backend, and the problem disappears. > > It's a valuable pointer but I'm not sure it really > applies here, Hans. For one, your problem seems to > have been persistent up until you added those users > to your ldap backend. In my case, it happens very > intermittently. Besides, my passdb backend is the > default "smbpasswd". > > I'll give it a more thorough check next time it > happens, including user name lookups.Bingo! It is exactly the same case. Two user names were spelled out slightly wrong in the /etc/group. As a consequence, under certain circumstances the "smbd" process keeps trying to resolve the name and doesn't take "no" from "nscd" for an answer. Each "smbd" process is looping around these 5 system calls: 1) create a socket, 2) connect to nscd's socket, 3) write the mis-spelled name, 4) read negative answer 5) close socket: socket(PF_UNIX,SOCK_STREAM,0)=26 connect(26,{sa_family=AF_UNIX,path="/var/run/.nscd_socket"},110)=0 writev(26,[{"\2\0...\0\22\0...",12},{"GeorgeDubbyaBusch\0",18}],2)=30 read(26,"\2\0\0\...\0\377\377\377\377\377\377"...,36)=36 close(26)=0 and the nscds spin like this poll({fd=3,events=POLLRDNORM,revents=POLLRDNORM}],1,-1)=1 accept(3,,NULL)=9 read(9,"\2\0\0\0\0\0\0\0\22\0\0\0",12)=12 read(9,"GeorgeDubbyaBusch\0",18)=18 write(9,"\2\0\...\0\377\377\377\377\377\377"...,36=36 close(9)=0 Since both mis-spelled names are among the earliest user names in 2 most frequently used groups (one is "users"), it's hard to tell why the smbd processes spin out of control so infrequently. Jeremy will know more about that. I'll write a script to check such discrepancies in the file. Thanks Hans. Thanks Jeremy. --------- Diese E-Mail k?nnte vertrauliche und/oder rechtlich gesch?tzte Informationen enthalten. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind nicht gestattet. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ----------