Thomas Adam
2009-May-05 13:31 UTC
[Samba] winbindd stays at 100% CPU having joined a domain
[ Please would anyone replying to this maintain the Cc list here as my dear colleague is not subscribed to this mailing list. ] Hello all, I am trying to track down an oddity with winbindd. We're using samba only to join a given domain -- so the config file is very minimal, as per: [global] workgroup = TEDDYBEARS netbios name = SMOOTHWALL realm = TEDDYBEARS.LOCAL security = ads password server = ad.teddybears.local winbind separator = / winbind enum users = yes winbind enum groups = yes Shortly after joining a domain though, winbindd will suddenly start using 100% CPU and sit there. Attaching strace to this process shows nothing -- literally, it's making no system calls whatsoever. Attaching gdb to this running process and using "next" reveals that it's doing the following: memcpy() -> tdb_read() -> tdb_find() ... and then stops in tdb_find() claiming there's no more line numbers to look at. So on the tentative premise the problem lies in tbd_find(), I didn't find anything obvious, but noticed the following call-chain is happening: tdb_find() |---> tdb_key_compare() +---> tdb_parse_data() Not knowing *anything* about samba internals, can someone suggest whether this is even a relevant starting point? We're currently using samba 3.2.8 -- and in trying to diagnose this, have gone through some releases, trying samba 3.2.10 and then samba 3.3.3 -- alas, this problem still remains in both those versions. Our use of Samba is deployed as part of a commercial product. The kernel we're running is 2.6.16.60-3-smp. Is there something that we're not adding to our configuration file (as in the above) which might aid in solving this? It's certainly a difficult problem to replicate -- and perhaps the only likely aspect worth mentioning is that samba is being asked to join a domain against Windows Server 2008 -- should that be relevant. Having done a lot of background work trying to track this down, there's almost no information about this in the mailing list archives which suggests to me some very odd corner-case bug, a problem with Samba and Windows Server 2008, or a configuration problem. Given that the configuration referenced above is the same for all customers (barring the obvious changes for values on netbios, workgroup, realm, etc.) it's likely to be something else, but I am open to suggestions. I'm attaching a version of a backtrace from GDB of a running version of winbindd exhibiting this problem. The version of winbind is from version 3.2.4. There's two backtraces of the same process for comparison. If there's any further information, don't hesitiate to let me know. Kindly, -- Thomas Adam Senior Developer Smoothwall Ltd. Email: thomas.adam@smoothwall.net SmoothWall Limited 1 John Charles Way Leeds LS12 6QA United Kingdom Phone: 1 800 959 3760 (USA, Canada and North America) 0870 1 999 500 (United Kingdom) +44 870 1 999 500 (all other countries) Fax: +44 870 1 991 399 Web: smoothwall.net SmoothWall Limited is registered in England, Company Number: 429824i7 This email and any attachments transmitted with it are confidential to the intended recipient(s) and may not be communicated to any other person or published by any means without the express permission of SmoothWall Limited. Any views expressed in this message are solely those of the author. See: smoothwall.net/company/email.php for the full text of this notice. -------------- next part -------------- #0 0x402aa970 in tdb_find () from /modules/guardian/usr/lib/libtdb.so.1 #1 0x402aaa8d in tdb_update_hash () from /modules/guardian/usr/lib/libtdb.so.1 #2 0x402ab3c4 in tdb_store () from /modules/guardian/usr/lib/libtdb.so.1 #3 0x0812a48e in tdb_store_bystring () #4 0x0837f493 in netsamlogon_cache_store () #5 0x080aea38 in winbindd_dual_pam_auth_crap () #6 0x080c45d4 in child_process_request () #7 0x080c6a58 in fork_domain_child () #8 0x080c4191 in schedule_async_request () #9 0x080c3afe in async_request () #10 0x0809d467 in init_child_connection () #11 0x080c430e in async_domain_request () #12 0x0809cab8 in add_trusted_domains () #13 0x0809d1b8 in rescan_trusted_domains () #14 0x080946eb in process_loop () #15 0x0809571e in main () #0 0x4038979c in memcpy () from /lib/libc.so.6 #1 0x402af1a2 in tdb_read () from /modules/guardian/usr/lib/libtdb.so.1 #2 0x402afb6a in tdb_rec_read () from /modules/guardian/usr/lib/libtdb.so.1 #3 0x402aa90c in tdb_find () from /modules/guardian/usr/lib/libtdb.so.1 #4 0x402aaa8d in tdb_update_hash () from /modules/guardian/usr/lib/libtdb.so.1 #5 0x402ab3c4 in tdb_store () from /modules/guardian/usr/lib/libtdb.so.1 #6 0x0812a48e in tdb_store_bystring () #7 0x0837f493 in netsamlogon_cache_store () #8 0x080aea38 in winbindd_dual_pam_auth_crap () #9 0x080c45d4 in child_process_request () #10 0x080c6a58 in fork_domain_child () #11 0x080c4191 in schedule_async_request () #12 0x080c3afe in async_request () #13 0x0809d467 in init_child_connection () #14 0x080c430e in async_domain_request () #15 0x0809cab8 in add_trusted_domains () #16 0x0809d1b8 in rescan_trusted_domains () #17 0x080946eb in process_loop () #18 0x0809571e in main ()
Volker Lendecke
2009-May-05 13:39 UTC
[Samba] winbindd stays at 100% CPU having joined a domain
On Tue, May 05, 2009 at 09:43:36AM +0100, Thomas Adam wrote:> [ Please would anyone replying to this maintain the Cc list here as my > dear colleague is not subscribed to this mailing list. ] > > Hello all, > > I am trying to track down an oddity with winbindd. We're using samba only > to join a given domain -- so the config file is very minimal, as per: > > [global] > workgroup = TEDDYBEARS > netbios name = SMOOTHWALL > realm = TEDDYBEARS.LOCAL > security = ads > password server = ad.teddybears.local > winbind separator = / > winbind enum users = yes > winbind enum groups = yes > > Shortly after joining a domain though, winbindd will suddenly start using > 100% CPU and sit there. > > Attaching strace to this process shows nothing -- literally, it's making > no system calls whatsoever. Attaching gdb to this running process and > using "next" reveals that it's doing the following: > > memcpy() -> tdb_read() -> tdb_find()Very likely that is a corrupted tdb file, probably the netsamlogon_cache.tdb. What kind of file system do you run this on? Volker -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : lists.samba.org/archive/samba/attachments/20090505/12987544/attachment.bin
Thomas Adam
2009-May-12 13:31 UTC
[Samba] winbindd stays at 100% CPU having joined a domain
[ Please would anyone replying to this maintain the Cc list here as neither myself or my dear colleague are subscribed to this mailing list. ] Hello all, I am trying to track down an oddity with winbindd. We're using samba only to join a given domain -- so the config file is very minimal, as per: [global] workgroup = TEDDYBEARS netbios name = SMOOTHWALL realm = TEDDYBEARS.LOCAL security = ads password server = ad.teddybears.local winbind separator = / winbind enum users = yes winbind enum groups = yes Shortly after joining a domain though, winbindd will suddenly start using 100% CPU and sit there. Attaching strace to this process shows nothing -- literally, it's making no system calls whatsoever. Attaching gdb to this running process and using "next" reveals that it's doing the following: memcpy() -> tdb_read() -> tdb_find() ... and then stops in tdb_find() claiming there's no more line numbers to look at. So on the tentative premise the problem lies in tbd_find(), I didn't find anything obvious, but noticed the following call-chain is happening: tdb_find() |---> tdb_key_compare() +---> tdb_parse_data() Not knowing *anything* about samba internals, can someone suggest whether this is even a relevant starting point? We're currently using samba 3.2.8 -- and in trying to diagnose this, have gone through some releases, trying samba 3.2.10 and then samba 3.3.3 -- alas, this problem still remains in both those versions. Our use of Samba is deployed as part of a commercial product. The kernel we're running is 2.6.16.60-3-smp. Is there something that we're not adding to our configuration file (as in the above) which might aid in solving this? It's certainly a difficult problem to replicate -- and perhaps the only likely aspect worth mentioning is that samba is being asked to join a domain against Windows Server 2008 -- should that be relevant. Having done a lot of background work trying to track this down, there's almost no information about this in the mailing list archives which suggests to me some very odd corner-case bug, a problem with Samba and Windows Server 2008, or a configuration problem. Given that the configuration referenced above is the same for all customers (barring the obvious changes for values on netbios, workgroup, realm, etc.) it's likely to be something else, but I am open to suggestions. I'm attaching a version of a backtrace from GDB of a running version of winbindd exhibiting this problem. The version of winbind is from version 3.2.4. There's two backtraces of the same process for comparison. If there's any further information, don't hesitiate to let me know. Kindly, -- Thomas Adam Senior Developer Smoothwall Ltd. Email: thomas.adam@smoothwall.net SmoothWall Limited 1 John Charles Way Leeds LS12 6QA United Kingdom Phone: 1 800 959 3760 (USA, Canada and North America) 0870 1 999 500 (United Kingdom) +44 870 1 999 500 (all other countries) Fax: +44 870 1 991 399 Web: smoothwall.net SmoothWall Limited is registered in England, Company Number: 429824i7 This email and any attachments transmitted with it are confidential to the intended recipient(s) and may not be communicated to any other person or published by any means without the express permission of SmoothWall Limited. Any views expressed in this message are solely those of the author. See: smoothwall.net/company/email.php for the full text of this notice. -------------- next part -------------- #0 0x402aa970 in tdb_find () from /modules/guardian/usr/lib/libtdb.so.1 #1 0x402aaa8d in tdb_update_hash () from /modules/guardian/usr/lib/libtdb.so.1 #2 0x402ab3c4 in tdb_store () from /modules/guardian/usr/lib/libtdb.so.1 #3 0x0812a48e in tdb_store_bystring () #4 0x0837f493 in netsamlogon_cache_store () #5 0x080aea38 in winbindd_dual_pam_auth_crap () #6 0x080c45d4 in child_process_request () #7 0x080c6a58 in fork_domain_child () #8 0x080c4191 in schedule_async_request () #9 0x080c3afe in async_request () #10 0x0809d467 in init_child_connection () #11 0x080c430e in async_domain_request () #12 0x0809cab8 in add_trusted_domains () #13 0x0809d1b8 in rescan_trusted_domains () #14 0x080946eb in process_loop () #15 0x0809571e in main () #0 0x4038979c in memcpy () from /lib/libc.so.6 #1 0x402af1a2 in tdb_read () from /modules/guardian/usr/lib/libtdb.so.1 #2 0x402afb6a in tdb_rec_read () from /modules/guardian/usr/lib/libtdb.so.1 #3 0x402aa90c in tdb_find () from /modules/guardian/usr/lib/libtdb.so.1 #4 0x402aaa8d in tdb_update_hash () from /modules/guardian/usr/lib/libtdb.so.1 #5 0x402ab3c4 in tdb_store () from /modules/guardian/usr/lib/libtdb.so.1 #6 0x0812a48e in tdb_store_bystring () #7 0x0837f493 in netsamlogon_cache_store () #8 0x080aea38 in winbindd_dual_pam_auth_crap () #9 0x080c45d4 in child_process_request () #10 0x080c6a58 in fork_domain_child () #11 0x080c4191 in schedule_async_request () #12 0x080c3afe in async_request () #13 0x0809d467 in init_child_connection () #14 0x080c430e in async_domain_request () #15 0x0809cab8 in add_trusted_domains () #16 0x0809d1b8 in rescan_trusted_domains () #17 0x080946eb in process_loop () #18 0x0809571e in main ()