Hello group, I'm hitting a problem where group resolution through winbind seems to time out sometimes, which in turn causes invalid group information to propagate to Cronjob users or similar. The profiling information that winbind spits out is the following: [2019/08/22 14:34:14.809540, 0] ../source3/winbindd/winbindd.c:1058(winbind_client_processed) winbind_client_processed: request took 60.000364 seconds [struct process_request_state] ../source3/winbindd/winbindd.c:683 [2019/08/22 14:33:14.809169] ../source3/winbindd/winbindd.c:853 [2019/08/22 14:34:14.809533] [60.000364] -> TEVENT_REQ_DONE (2 0)) [struct winbindd_getgroups_state] ../source3/winbindd/winbindd_getgroups.c:53 [2019/08/22 14:33:14.809175] ../source3/winbindd/winbindd_getgroups.c:107 [2019/08/22 14:34:14.809485] [60.000310] -> TEVENT_REQ_USER_ERROR (3 10483072397370982515)) [struct wb_lookupname_state] ../source3/winbindd/wb_lookupname.c:47 [2019/08/22 14:33:14.809184] ../source3/winbindd/wb_lookupname.c:95 [2019/08/22 14:34:14.809483] [60.000299] -> TEVENT_REQ_USER_ERROR (3 10483072397370982515)) [struct dcerpc_wbint_LookupName_state] default/librpc/gen_ndr/ndr_winbind_c.c:781 [2019/08/22 14:33:14.809188] default/librpc/gen_ndr/ndr_winbind_c.c:847 [2019/08/22 14:34:14.809481] [60.000293] -> TEVENT_REQ_DONE (2 0)) [struct dcerpc_wbint_LookupName_r_state] default/librpc/gen_ndr/ndr_winbind_c.c:693 [2019/08/22 14:33:14.809189] default/librpc/gen_ndr/ndr_winbind_c.c:727 [2019/08/22 14:34:14.809479] [60.000290] -> TEVENT_REQ_DONE (2 0)) [struct dcerpc_binding_handle_call_state] ../librpc/rpc/binding_handle.c:371 [2019/08/22 14:33:14.809190] ../librpc/rpc/binding_handle.c:520 [2019/08/22 14:34:14.809477] [60.000287] -> TEVENT_REQ_DONE (2 0)) [struct dcerpc_binding_handle_raw_call_state] ../librpc/rpc/binding_handle.c:149 [2019/08/22 14:33:14.809197] ../librpc/rpc/binding_handle.c:203 [2019/08/22 14:34:14.809471] [60.000274] -> TEVENT_REQ_DONE (2 0)) [struct wbint_bh_raw_call_state] ../source3/winbindd/winbindd_dual_ndr.c:89 [2019/08/22 14:33:14.809199] ../source3/winbindd/winbindd_dual_ndr.c:204 [2019/08/22 14:34:14.809469] [60.000270] -> TEVENT_REQ_DONE (2 0)) [struct wb_domain_request_state] ../source3/winbindd/winbindd_dual.c:473 [2019/08/22 14:33:14.809202] ../source3/winbindd/winbindd_dual.c:708 [2019/08/22 14:34:14.809460] [60.000258] -> TEVENT_REQ_DONE (2 0)) [struct wb_child_request_state] ../source3/winbindd/winbindd_dual.c:198 [2019/08/22 14:33:44.809349] ../source3/winbindd/winbindd_dual.c:273 [2019/08/22 14:34:14.809457] [30.000108] -> TEVENT_REQ_DONE (2 0)) [struct tevent_queue_wait_state] ../tevent_queue.c:336 [2019/08/22 14:33:44.809350] ../tevent_queue.c:355 [2019/08/22 14:33:44.809351] [0.000001] -> TEVENT_REQ_DONE (2 0)) [struct wb_simple_trans_state] ../nsswitch/wb_reqtrans.c:375 [2019/08/22 14:33:44.809352] ../nsswitch/wb_reqtrans.c:432 [2019/08/22 14:34:14.809450] [30.000098] -> TEVENT_REQ_DONE (2 0)) [struct req_write_state] ../nsswitch/wb_reqtrans.c:158 [2019/08/22 14:33:44.809353] ../nsswitch/wb_reqtrans.c:194 [2019/08/22 14:33:44.809377] [0.000024] -> TEVENT_REQ_DONE (2 0)) [struct writev_state] ../lib/async_req/async_sock.c:263 [2019/08/22 14:33:44.809353] ../lib/async_req/async_sock.c:412 [2019/08/22 14:33:44.809376] [0.000023] -> TEVENT_REQ_DONE (2 0)) [struct resp_read_state] ../nsswitch/wb_reqtrans.c:222 [2019/08/22 14:33:44.809378] ../nsswitch/wb_reqtrans.c:275 [2019/08/22 14:34:14.809449] [30.000071] -> TEVENT_REQ_DONE (2 0)) [struct read_packet_state] ../lib/async_req/async_sock.c:458 [2019/08/22 14:33:44.809379] ../lib/async_req/async_sock.c:546 [2019/08/22 14:34:14.809446] [30.000067] -> TEVENT_REQ_DONE (2 0)) [struct resp_write_state] ../nsswitch/wb_reqtrans.c:307 [2019/08/22 14:34:14.809495] ../nsswitch/wb_reqtrans.c:344 [2019/08/22 14:34:14.809531] [0.000036] -> TEVENT_REQ_DONE (2 0)) [struct writev_state] ../lib/async_req/async_sock.c:263 [2019/08/22 14:34:14.809496] ../lib/async_req/async_sock.c:412 [2019/08/22 14:34:14.809528] [0.000032] -> TEVENT_REQ_DONE (2 0)) (so, basically, most of the time is spent in wb_domain_request_state, with half of that in resp_read_state) . The system in question is doing absolutely nothing, no network traffic, and the CPU is idling (i.e., winbind seems to be sleeping), when this is logged, and there's also no definitive moment where an answer from one of the DCs arrives which causes completion. I've tried to follow the control path in the corresponding sources noted in the profiling, but couldn't find a specific point that would cause hangs, that's why I thought asking here whether anyone has seen this behaviour. The Samba package that's being used is the Debian 10 prepackaged Samba 4.9.5, and this is on a simple domain-joined client which has correct Kerberos and PAM configuration to mirror the AD accounts to the operating system. The system has nested groups on, but the complexity of those is minor, and all in all, there are (only) about 100 groups and 250 accounts in the complete domain. The system in question was updated from Debian 9 (using the prepackaged Samba/winbind sources of stretch), and the configuration hasn't changed. Debian 9 packaged Samba did not show this behaviour. Thanks for any insights on what I might do to debug this further! -- --- Heiko Wundram.
On 22/08/2019 13:41, Heiko Wundram via samba wrote:> Hello group, > > I'm hitting a problem where group resolution through winbind seems to > time out sometimes, which in turn causes invalid group information to > propagate to Cronjob users or similar. The profiling information that > winbind spits out is the following: > > [2019/08/22 14:34:14.809540,? 0] > ../source3/winbindd/winbindd.c:1058(winbind_client_processed) > ? winbind_client_processed: request took 60.000364 seconds > ? [struct process_request_state] ../source3/winbindd/winbindd.c:683 > [2019/08/22 14:33:14.809169] ../source3/winbindd/winbindd.c:853 > [2019/08/22 14:34:14.809533] [60.000364] -> TEVENT_REQ_DONE (2 0)) > ?? [struct winbindd_getgroups_state] > ../source3/winbindd/winbindd_getgroups.c:53 [2019/08/22 > 14:33:14.809175] ../source3/winbindd/winbindd_getgroups.c:107 > [2019/08/22 14:34:14.809485] [60.000310] -> TEVENT_REQ_USER_ERROR (3 > 10483072397370982515)) > ??? [struct wb_lookupname_state] > ../source3/winbindd/wb_lookupname.c:47 [2019/08/22 14:33:14.809184] > ../source3/winbindd/wb_lookupname.c:95 [2019/08/22 14:34:14.809483] > [60.000299] -> TEVENT_REQ_USER_ERROR (3 10483072397370982515)) > ???? [struct dcerpc_wbint_LookupName_state] > default/librpc/gen_ndr/ndr_winbind_c.c:781 [2019/08/22 > 14:33:14.809188] default/librpc/gen_ndr/ndr_winbind_c.c:847 > [2019/08/22 14:34:14.809481] [60.000293] -> TEVENT_REQ_DONE (2 0)) > ????? [struct dcerpc_wbint_LookupName_r_state] > default/librpc/gen_ndr/ndr_winbind_c.c:693 [2019/08/22 > 14:33:14.809189] default/librpc/gen_ndr/ndr_winbind_c.c:727 > [2019/08/22 14:34:14.809479] [60.000290] -> TEVENT_REQ_DONE (2 0)) > ?????? [struct dcerpc_binding_handle_call_state] > ../librpc/rpc/binding_handle.c:371 [2019/08/22 14:33:14.809190] > ../librpc/rpc/binding_handle.c:520 [2019/08/22 14:34:14.809477] > [60.000287] -> TEVENT_REQ_DONE (2 0)) > ??????? [struct dcerpc_binding_handle_raw_call_state] > ../librpc/rpc/binding_handle.c:149 [2019/08/22 14:33:14.809197] > ../librpc/rpc/binding_handle.c:203 [2019/08/22 14:34:14.809471] > [60.000274] -> TEVENT_REQ_DONE (2 0)) > ???????? [struct wbint_bh_raw_call_state] > ../source3/winbindd/winbindd_dual_ndr.c:89 [2019/08/22 > 14:33:14.809199] ../source3/winbindd/winbindd_dual_ndr.c:204 > [2019/08/22 14:34:14.809469] [60.000270] -> TEVENT_REQ_DONE (2 0)) > ????????? [struct wb_domain_request_state] > ../source3/winbindd/winbindd_dual.c:473 [2019/08/22 14:33:14.809202] > ../source3/winbindd/winbindd_dual.c:708 [2019/08/22 14:34:14.809460] > [60.000258] -> TEVENT_REQ_DONE (2 0)) > ?????????? [struct wb_child_request_state] > ../source3/winbindd/winbindd_dual.c:198 [2019/08/22 14:33:44.809349] > ../source3/winbindd/winbindd_dual.c:273 [2019/08/22 14:34:14.809457] > [30.000108] -> TEVENT_REQ_DONE (2 0)) > ??????????? [struct tevent_queue_wait_state] ../tevent_queue.c:336 > [2019/08/22 14:33:44.809350] ../tevent_queue.c:355 [2019/08/22 > 14:33:44.809351] [0.000001] -> TEVENT_REQ_DONE (2 0)) > ??????????? [struct wb_simple_trans_state] > ../nsswitch/wb_reqtrans.c:375 [2019/08/22 14:33:44.809352] > ../nsswitch/wb_reqtrans.c:432 [2019/08/22 14:34:14.809450] [30.000098] > -> TEVENT_REQ_DONE (2 0)) > ???????????? [struct req_write_state] ../nsswitch/wb_reqtrans.c:158 > [2019/08/22 14:33:44.809353] ../nsswitch/wb_reqtrans.c:194 [2019/08/22 > 14:33:44.809377] [0.000024] -> TEVENT_REQ_DONE (2 0)) > ????????????? [struct writev_state] ../lib/async_req/async_sock.c:263 > [2019/08/22 14:33:44.809353] ../lib/async_req/async_sock.c:412 > [2019/08/22 14:33:44.809376] [0.000023] -> TEVENT_REQ_DONE (2 0)) > ???????????? [struct resp_read_state] ../nsswitch/wb_reqtrans.c:222 > [2019/08/22 14:33:44.809378] ../nsswitch/wb_reqtrans.c:275 [2019/08/22 > 14:34:14.809449] [30.000071] -> TEVENT_REQ_DONE (2 0)) > ????????????? [struct read_packet_state] > ../lib/async_req/async_sock.c:458 [2019/08/22 14:33:44.809379] > ../lib/async_req/async_sock.c:546 [2019/08/22 14:34:14.809446] > [30.000067] -> TEVENT_REQ_DONE (2 0)) > ?? [struct resp_write_state] ../nsswitch/wb_reqtrans.c:307 [2019/08/22 > 14:34:14.809495] ../nsswitch/wb_reqtrans.c:344 [2019/08/22 > 14:34:14.809531] [0.000036] -> TEVENT_REQ_DONE (2 0)) > ??? [struct writev_state] ../lib/async_req/async_sock.c:263 > [2019/08/22 14:34:14.809496] ../lib/async_req/async_sock.c:412 > [2019/08/22 14:34:14.809528] [0.000032] -> TEVENT_REQ_DONE (2 0)) > > (so, basically, most of the time is spent in wb_domain_request_state, > with half of that in resp_read_state) . The system in question is > doing absolutely nothing, no network traffic, and the CPU is idling > (i.e., winbind seems to be sleeping), when this is logged, and there's > also no definitive moment where an answer from one of the DCs arrives > which causes completion. I've tried to follow the control path in the > corresponding sources noted in the profiling, but couldn't find a > specific point that would cause hangs, that's why I thought asking > here whether anyone has seen this behaviour. > > The Samba package that's being used is the Debian 10 prepackaged Samba > 4.9.5, and this is on a simple domain-joined client which has correct > Kerberos and PAM configuration to mirror the AD accounts to the > operating system. The system has nested groups on, but the complexity > of those is minor, and all in all, there are (only) about 100 groups > and 250 accounts in the complete domain. The system in question was > updated from Debian 9 (using the prepackaged Samba/winbind sources of > stretch), and the configuration hasn't changed. Debian 9 packaged > Samba did not show this behaviour. > > Thanks for any insights on what I might do to debug this further! >Sorry, but you cannot rely on the output of 'id' to identify your users group memberships. Having said that, I wouldn't have thought it would take 60 secs to do nothing ;-) Please post your smb.conf. Rowland
Am 22.08.2019 15:31, schrieb Rowland penny via samba:> Sorry, but you cannot rely on the output of 'id' to identify your > users group memberships. Having said that, I wouldn't have thought it > would take 60 secs to do nothing ;-)That's clear and I know, but I can rely on login sessions to have the correct groups when Kerberos has completed. That's what this (and also recursive group membership) is about; it's a shared directory which is used by several users to store files which are then hosted by an Apache webserver (with the directory in question having g+s, and default ACLs for a shared group). So, yeah, I do know the limitations of winbind. ;-)> Please post your smb.conf.Anyway, here goes: [global] security = ADS workgroup = <wg> realm = <domain> idmap config * : backend = tdb idmap config * : range = 60001-65000 idmap config <wg> : backend = rid idmap config <wg> : range = 65001-100000 winbind nss info = template template shell = /bin/bash template homedir = /home/%U winbind nested groups = yes winbind refresh tickets = yes winbind offline logon = yes winbind enum users = yes winbind enum groups = yes allow trusted domains = no dedicated keytab file = /etc/krb5.keytab kerberos method = secrets and keytab client signing = auto server signing = auto bind interfaces only = yes interfaces = lo eth0 So, nothing out of the ordinary; I've replaced workgroup and domain by placeholders. krb5.conf is set up as follows: [libdefaults] default_realm = <domain> ticket_lifetime = 1d clockskew = 300 forwardable = true proxiable = true dns_lookup_realm = true dns_lookup_kdc = true [realms] <domain> = { auth_to_local = RULE:[1:<wg>\$1] } with the same placeholders. Thanks for any hints! -- --- Heiko.
> -----Oorspronkelijk bericht----- > Van: samba [mailto:samba-bounces at lists.samba.org] Namens > Heiko Wundram via samba > Verzonden: donderdag 22 augustus 2019 16:13 > Aan: samba at lists.samba.org > Onderwerp: Re: [Samba] Winbind timeouts/hangs(?) > > Am 22.08.2019 15:31, schrieb Rowland penny via samba: > > Sorry, but you cannot rely on the output of 'id' to identify your > > users group memberships. Having said that, I wouldn't have > thought it > > would take 60 secs to do nothing ;-) > > That's clear and I know, but I can rely on login sessions to have the > correct groups when Kerberos has completed. That's what this > (and also > recursive group membership) is about; it's a shared directory > which is > used by several users to store files which are then hosted by > an Apache > webserver (with the directory in question having g+s, and > default ACLs > for a shared group). So, yeah, I do know the limitations of > winbind. ;-) > > > Please post your smb.conf. > > Anyway, here goes: > > [global] > security = ADS > workgroup = <wg>wg = <WG>> realm = <domain>!!!! realm = <REALM> realm is not <domain> !!!!> > idmap config * : backend = tdb > idmap config * : range = 60001-65000 > > idmap config <wg> : backend = rid > idmap config <wg> : range = 65001-100000 > > winbind nss info = template > template shell = /bin/bash > template homedir = /home/%U > > winbind nested groups = yes > winbind refresh tickets = yes > winbind offline logon = yes> winbind enum users = yes > winbind enum groups = yesSet the 2 enum to no.> allow trusted domains = no > > dedicated keytab file = /etc/krb5.keytab > kerberos method = secrets and keytab > > client signing = auto > server signing = autoRemove these 2> > bind interfaces only = yes > interfaces = lo eth0 > > So, nothing out of the ordinary; I've replaced workgroup and > domain by > placeholders. > > krb5.conf is set up as follows: > > [libdefaults] > default_realm = <domain>!!!! realm = <REALM> realm is not <domain> !!!!> ticket_lifetime = 1d > clockskew = 300 > forwardable = true > proxiable = true > dns_lookup_realm = truedns_lookup_realm = false> dns_lookup_kdc = true> > [realms] > <domain> = { > auth_to_local = RULE:[1:<wg>\$1] > } > > with the same placeholders.> > Thanks for any hints! > > -- > --- Heiko. >I really advice to keep some parts CAPS not caps, correct. So in smb.conf Netbios = IN-CAPS Workgroup = IN-CAPS REALM = IN-CAPS dns-domain = no-caps dns-search = no-caps Small things but these small things help a lot! Greetz, Louis