Tried a Samba 4.9.3 with the patch applied and it still froze up on us at 17:00…
So - still no go.
However, one thing that I came to think about was that the test system where I
ran the “smbclient” (the monitoring server, a Linux virtual machine) runs a
prepackage Samba 4.8.3. Perhaps both server _and_ client side needs this fix?
At 17:03 this is what I saw:> peter86 at filifjonkan tmp]$ smbclient -m SMB3 -k //filur05/peter86
> session setup failed: NT_STATUS_IO_TIMEOUT
After I ran a “pkill winbindd” on that server (filur05) I could connect again
(even though I got one “hanging” idmap-child-winbind, the rest was gone though):
> [peter86 at filifjonkan tmp]$ smbclient -m SMB3 -k //filur05/peter86
> Try "help" to get a list of possible commands.
> smb: \> dir
> . D 0 Tue Dec 4 11:15:25 2018
> .. D 0 Mon Dec 3 11:23:14 2018
> HelloFilur05.txt N 8 Wed Nov 29 22:40:01 2017
>
> 157286400 blocks of size 1024. 157286224 blocks available
Note that I didn’t restart the winbindd daemons until after I had got that
smbclient session running… Perhaps that worked because I used Kerberos
authentication though.
Anyway, I got a forced core dump from the “hanging idmap-child” Winbindd and a
stack trace on it shows:
> (gdb) bt
> #0 0x0000000809b1bc2a in _poll () from /lib/libc.so.7
> #1 0x0000000801562726 in ?? () from /lib/libthr.so.3
> #2 0x0000000803ce7037 in poll_event_loop_poll (tvalp=0x7fffffffca80,
ev=0x812357060) at ../lib/tevent/tevent_poll.c:483
> #3 poll_event_loop_once (ev=0x812357060, location=<optimized out>)
at ../lib/tevent/tevent_poll.c:626
> #4 0x0000000803ce44de in _tevent_loop_once (ev=ev at entry=0x812357060,
location=location at entry=0x803cea448
"../lib/tevent/tevent_req.c:298") at ../lib/tevent/tevent.c:772
> #5 0x0000000803ce5b18 in tevent_req_poll (req=req at entry=0x8122b7700,
ev=ev at entry=0x812357060) at ../lib/tevent/tevent_req.c:298
> #6 0x0000000812810b2a in tldap_search (ld=0x812322520, base=0x81223e2a0
"DC=ad,DC=liu,DC=se", scope=scope at entry=2,
> filter=filter at entry=0x812292860
"(&(|(sAMAccountType=805306368)(sAMAccountType=805306369)(sAMAccountType=805306370)(sAMAccountType=268435456)(sAMAccountType=536870912))(|(objectSid=\\01\\05\\00\\00\\00\\00\\00\\
> 05\\15\\00\\00
> pmsgs=0x7fffffffcc40) at ../source3/lib/tldap.c:1999
> #7 0x0000000812814efa in idmap_ad_sids_to_unixids (dom=dom at
entry=0x812251aa0, ids=ids at entry=0x81226bde0) at
../source3/winbindd/idmap_ad.c:827
> #8 0x0000000812815481 in idmap_ad_sids_to_unixids_retry (dom=0x812251aa0,
ids=0x81226bde0) at ../source3/winbindd/idmap_ad.c:946
> #9 0x000000000107a790 in _wbint_Sids2UnixIDs (p=p at entry=0x7fffffffcdd0,
r=r at entry=0x8123233e0) at ../source3/winbindd/winbindd_dual_srv.c:203
> #10 0x00000000010d0e49 in api_wbint_Sids2UnixIDs (p=0x7fffffffcdd0) at
default/librpc/gen_ndr/srv_winbind.c:391
> #11 0x000000000107a345 in winbindd_dual_ndrcmd (domain=<optimized
out>, state=0x7fffffffd058) at ../source3/winbindd/winbindd_dual_ndr.c:369
> #12 0x0000000001076082 in child_process_request (state=0x7fffffffd058,
child=<optimized out>) at ../source3/winbindd/winbindd_dual.c:745
> #13 child_handler (ev=<optimized out>, fde=<optimized out>,
flags=<optimized out>, private_data=0x7fffffffd050) at
../source3/winbindd/winbindd_dual.c:1602
> #14 0x0000000803ce4d1d in tevent_common_invoke_fd_handler (fde=fde at
entry=0x812251480, flags=<optimized out>, removed=removed at entry=0x0) at
../lib/tevent/tevent_fd.c:137
> #15 0x0000000803ce72fc in poll_event_loop_poll (tvalp=0x7fffffffcfa0,
ev=0x812258060) at ../lib/tevent/tevent_poll.c:569
> #16 poll_event_loop_once (ev=0x812258060, location=<optimized out>)
at ../lib/tevent/tevent_poll.c:626
> #17 0x0000000803ce44de in _tevent_loop_once (ev=0x812258060,
location=location at entry=0x10eb4d8
"../source3/winbindd/winbindd_dual.c:1817") at
../lib/tevent/tevent.c:772
> #18 0x00000000010797de in fork_domain_child (child=0x133ba80
<static_idmap_child>) at ../source3/winbindd/winbindd_dual.c:1817
> #19 0x0000000001079956 in wb_child_request_waited (subreq=0x0) at
../source3/winbindd/winbindd_dual.c:238
> #20 0x0000000803ce58a3 in _tevent_req_notify_callback (req=req at
entry=0x81231da00, location=location at entry=0x803cea170
"../lib/tevent/tevent_queue.c:355") at ../lib/tevent/tevent_req.c:139
> #21 0x0000000803ce594b in tevent_req_finish (req=0x81231da00, state=state
at entry=TEVENT_REQ_DONE, location=location at entry=0x803cea170
"../lib/tevent/tevent_queue.c:355") at ../lib/tevent/tevent_req.c:191
> #22 0x0000000803ce5967 in _tevent_req_done (req=<optimized out>,
location=location at entry=0x803cea170
"../lib/tevent/tevent_queue.c:355") at ../lib/tevent/tevent_req.c:197
> #23 0x0000000803ce5463 in tevent_queue_wait_trigger (req=<optimized
out>, private_data=<optimized out>) at ../lib/tevent/tevent_queue.c:355
> #24 0x0000000803ce544e in tevent_queue_immediate_trigger (ev=ev at
entry=0x812258060, im=im at entry=0x81224efe0, private_data=private_data at
entry=0x812241040) at ../lib/tevent/tevent_queue.c:149
> #25 0x0000000803ce513a in tevent_common_invoke_immediate_handler
(im=0x81224efe0, removed=removed at entry=0x0) at
../lib/tevent/tevent_immediate.c:165
> #26 0x0000000803ce515b in tevent_common_loop_immediate (ev=ev at
entry=0x812258060) at ../lib/tevent/tevent_immediate.c:202
> #27 0x0000000803ce6db9 in poll_event_loop_once (ev=0x812258060,
location=<optimized out>) at ../lib/tevent/tevent_poll.c:617
> #28 0x0000000803ce44de in _tevent_loop_once (ev=0x812258060,
location=0x10d5af8 "../source3/winbindd/winbindd.c:1911") at
../lib/tevent/tevent.c:772
> #29 0x000000000104e9c2 in main (argc=<optimized out>,
argv=<optimized out>) at ../source3/winbindd/winbindd.c:1911
I wonder if it for some reason ignores SIGTERM. Hmm… Anyway, since ’smbd’ let me
in when I had killed the rest of the Winbindd demons then this hanging
"idmap-child” probably isn’t the problem anyway...
- Peter
> On 3 Dec 2018, at 19:37, Jeremy Allison <jra at samba.org> wrote:
>
> On Mon, Dec 03, 2018 at 07:33:22PM +0100, Peter Eriksson via samba wrote:
>> Hmm…
>>
>> I see that there is a patch in the bugzilla page for that bug. I guess
I could try that one… :-)
>>
>> I notice in the patch that there is a lot of talk about SMB2 - we use
SMB3 mostly now. But perhaps there is some code sharing? Or perhaps SMB2 is used
when talking to the AD servers? Or the few SMB2-talking clients causes the
problem for all of the other users? When testing the “timeout” issue I typically
try a simple:
>>
>> smbclient -m SMB3 -U $USER -k //$SERVER/$SHARE
>
> In the Samba world SMBx, where x > 1 is a synonym for
> everything other than SMB1 :-).
>
> So for this kind of thing, SMB2 == SMB3 :-).