Andrew Bartlett
2022-Sep-19 20:55 UTC
[Samba] Samba-LDAP with 100%CPU with connections in CLOSE_WAIT
On Mon, 2022-09-19 at 10:24 -0700, Jeremy Allison via samba wrote:> On Mon, Sep 19, 2022 at 05:20:04PM +0200, Steffen via samba wrote: > > Hi, > > > > since some time we are facing a small problem: > > > > > > We are using samba (4.15.9-15) as AD-DC. As clients we have some > > NetAPP-FAS running which doing the auth. via LDAP. On NetApp > > timeouts for LDAP are set to 3sec per default. > > > > Some queries seem to need more time to answer so the client tries > > to close the connection but the (samba-)server-part leaves the > > socket open in CLOSE_WAIT. > > > > In some of such cases the corresponding process (ldap-worker) runs > > forever(?) with 100% cpu. A strace shows the ldap-worker pushing > > some info (the answer?) to the socket. If one let it go the server > > slows down gradually while more and more connections stay in > > CLOSE_WAIT. > > Can you post an strace, followed by a stack backtrace > from gdb of an ldap-worker process in such a state. > > That would help debug - thanks !The other helpful thing can be a 'flame graph' per https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#Instructions These are handy as they give us a lot of great info but never any confidential info (as it is just function stacks and times in them). Clearly running a 6 second query from a client set to retry infinatley after 3 seconds will not go well, and I suspect Samba is working hard to answer those queries before dealing with the closed sockets (it may of course be possible to move those up the priority). Finally, if you set 'log level = 5' you can see what time each request takes, and what it is. Setting the query timeout just as per Windows AD will also work (roughly) and provide notice (level 3 at 1/4 the timeout) and warnings at log level 1 after the timeout. See https://bugzilla.samba.org/show_bug.cgi?id=14694 and https://www.oreilly.com/library/view/active-directory-cookbook/0596004648/ch04s24.html for a description of the limits. Andrew Bartlett -- Andrew Bartlett (he/him) https://samba.org/~abartlet/ Samba Team Member (since 2001) https://samba.org Samba Team Lead, Catalyst IT https://catalyst.net.nz/services/samba Samba Development and Support, Catalyst IT - Expert Open Source Solutions
Andrew Bartlett
2022-Sep-19 21:31 UTC
[Samba] Samba-LDAP with 100%CPU with connections in CLOSE_WAIT
On Tue, 2022-09-20 at 08:55 +1200, Andrew Bartlett via samba wrote:> On Mon, 2022-09-19 at 10:24 -0700, Jeremy Allison via samba wrote: > > On Mon, Sep 19, 2022 at 05:20:04PM +0200, Steffen via samba wrote: > > > Hi, > > > > > > since some time we are facing a small problem: > > > > > > > > > We are using samba (4.15.9-15) as AD-DC. As clients we have some > > > NetAPP-FAS running which doing the auth. via LDAP. On NetApp > > > timeouts for LDAP are set to 3sec per default. > > > > > > Some queries seem to need more time to answer so the client tries > > > to close the connection but the (samba-)server-part leaves the > > > socket open in CLOSE_WAIT. > > > > > > In some of such cases the corresponding process (ldap-worker) > > > runs > > > forever(?) with 100% cpu. A strace shows the ldap-worker pushing > > > some info (the answer?) to the socket. If one let it go the > > > server > > > slows down gradually while more and more connections stay in > > > CLOSE_WAIT. > > > > Can you post an strace, followed by a stack backtrace > > from gdb of an ldap-worker process in such a state. > > > > That would help debug - thanks ! > > The other helpful thing can be a 'flame graph' per > > https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#Instructions > > > These are handy as they give us a lot of great info but never any > confidential info (as it is just function stacks and times in them). > > Clearly running a 6 second query from a client set to retry > infinatley > after 3 seconds will not go well, and I suspect Samba is working hard > to answer those queries before dealing with the closed sockets (it > may > of course be possible to move those up the priority).It looks like the close of the socket will happen when we get to that socket in the list looking to read from it again (and the read fails), so if there are a lot of slow queries outstanding, that could take a while.> Finally, if you set 'log level = 5' you can see what time each > request > takes, and what it is. Setting the query timeout just as per Windows > AD will also work (roughly) and provide notice (level 3 at 1/4 the > timeout) and warnings at log level 1 after the timeout. > > See > https://bugzilla.samba.org/show_bug.cgi?id=14694 > and > https://www.oreilly.com/library/view/active-directory-cookbook/0596004648/ch04s24.html > for a > description of the limits. > > Andrew Bartlett > > > -- > Andrew Bartlett (he/him) > https://samba.org/~abartlet/ > > Samba Team Member (since 2001) > https://samba.org > > Samba Team Lead, Catalyst IT > https://catalyst.net.nz/services/samba > > > Samba Development and Support, Catalyst IT - Expert Open Source > Solutions > >-- Andrew Bartlett (he/him) https://samba.org/~abartlet/ Samba Team Member (since 2001) https://samba.org Samba Team Lead, Catalyst IT https://catalyst.net.nz/services/samba Samba Development and Support, Catalyst IT - Expert Open Source Solutions
Steffen
2022-Sep-20 14:32 UTC
[Samba] Samba-LDAP with 100%CPU with connections in CLOSE_WAIT
...> Finally, if you set 'log level = 5' you can see what time each request > takes, and what it is. Setting the query timeout just as per Windows > AD will also work (roughly) and provide notice (level 3 at 1/4 the > timeout) and warnings at log level 1 after the timeout. > > See https://bugzilla.samba.org/show_bug.cgi?id=14694 and > https://www.oreilly.com/library/view/active-directory-cookbook/0596004648/ch04s24.html for a > description of the limits. > > Andrew BartlettHm, we were trying for a long time to get some log entries which show us the requested LDAP-Queries but with no luck. Which/Where should we adapt the "log level = 5", just in the global section? currently we have set: [global] ... log level = 5 auth:5 auth_audit:10@/var/log/samba/auth_audit.log ldap debug level = 5 ldap debug threshold = 1 We only have seen ldap-queries for long or outtimed requests. We don't see "normal" ldap-queries. We tried with ldapsearch from CLI.