Jeremy Allison
2022-Sep-19 17:24 UTC
[Samba] Samba-LDAP with 100%CPU with connections in CLOSE_WAIT
On Mon, Sep 19, 2022 at 05:20:04PM +0200, Steffen via samba wrote:>Hi, > >since some time we are facing a small problem: > > >We are using samba (4.15.9-15) as AD-DC. As clients we have some NetAPP-FAS running which doing the auth. via LDAP. On NetApp timeouts for LDAP are set to 3sec per default. > >Some queries seem to need more time to answer so the client tries to close the connection but the (samba-)server-part leaves the socket open in CLOSE_WAIT. > >In some of such cases the corresponding process (ldap-worker) runs forever(?) with 100% cpu. A strace shows the ldap-worker pushing some info (the answer?)? to the socket. If one let it go the server slows down gradually while more and more connections stay in CLOSE_WAIT.Can you post an strace, followed by a stack backtrace from gdb of an ldap-worker process in such a state. That would help debug - thanks !
Andrew Bartlett
2022-Sep-19 20:55 UTC
[Samba] Samba-LDAP with 100%CPU with connections in CLOSE_WAIT
On Mon, 2022-09-19 at 10:24 -0700, Jeremy Allison via samba wrote:> On Mon, Sep 19, 2022 at 05:20:04PM +0200, Steffen via samba wrote: > > Hi, > > > > since some time we are facing a small problem: > > > > > > We are using samba (4.15.9-15) as AD-DC. As clients we have some > > NetAPP-FAS running which doing the auth. via LDAP. On NetApp > > timeouts for LDAP are set to 3sec per default. > > > > Some queries seem to need more time to answer so the client tries > > to close the connection but the (samba-)server-part leaves the > > socket open in CLOSE_WAIT. > > > > In some of such cases the corresponding process (ldap-worker) runs > > forever(?) with 100% cpu. A strace shows the ldap-worker pushing > > some info (the answer?) to the socket. If one let it go the server > > slows down gradually while more and more connections stay in > > CLOSE_WAIT. > > Can you post an strace, followed by a stack backtrace > from gdb of an ldap-worker process in such a state. > > That would help debug - thanks !The other helpful thing can be a 'flame graph' per https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#Instructions These are handy as they give us a lot of great info but never any confidential info (as it is just function stacks and times in them). Clearly running a 6 second query from a client set to retry infinatley after 3 seconds will not go well, and I suspect Samba is working hard to answer those queries before dealing with the closed sockets (it may of course be possible to move those up the priority). Finally, if you set 'log level = 5' you can see what time each request takes, and what it is. Setting the query timeout just as per Windows AD will also work (roughly) and provide notice (level 3 at 1/4 the timeout) and warnings at log level 1 after the timeout. See https://bugzilla.samba.org/show_bug.cgi?id=14694 and https://www.oreilly.com/library/view/active-directory-cookbook/0596004648/ch04s24.html for a description of the limits. Andrew Bartlett -- Andrew Bartlett (he/him) https://samba.org/~abartlet/ Samba Team Member (since 2001) https://samba.org Samba Team Lead, Catalyst IT https://catalyst.net.nz/services/samba Samba Development and Support, Catalyst IT - Expert Open Source Solutions
Steffen
2022-Sep-20 14:16 UTC
[Samba] Samba-LDAP with 100%CPU with connections in CLOSE_WAIT
On 19.09.22 19:24, Jeremy Allison wrote:> On Mon, Sep 19, 2022 at 05:20:04PM +0200, Steffen via samba wrote: >> Hi, >> >> since some time we are facing a small problem: >> >> >> We are using samba (4.15.9-15) as AD-DC. As clients we have some NetAPP-FAS running which doing the auth. via LDAP. On NetApp timeouts for LDAP are set to 3sec per default. >> >> Some queries seem to need more time to answer so the client tries to close the connection but the (samba-)server-part leaves the socket open in CLOSE_WAIT. >> >> In some of such cases the corresponding process (ldap-worker) runs forever(?) with 100% cpu. A strace shows the ldap-worker pushing some info (the answer?)? to the socket. If one let it go the server slows down gradually while more and more connections stay in CLOSE_WAIT. > > Can you post an strace, followed by a stack backtrace > from gdb of an ldap-worker process in such a state. > > That would help debug - thanks !just two remarks: a) Do you think a bt from a no-debug-version? is enough? b) .. because i have setup some timeouts a little bit higher maybe we have to wait a few days (2 or 3) ..