Elias Pereira
2024-Apr-02 12:25 UTC
[Samba] How to diagnose a busy LDAP server process in the Samba AD DC
The saga continues... I've spent a whole day with log level 5 and 7 and no error. All I have to do is return the log to the default and the error reappears. I monitored the "LDAP Query: Duration", but I didn't notice any crashes in the queries. I don't know if it's a long time, but some queries took 1.5s. Is there anything else I can do? On Mon, Mar 25, 2024 at 1:30?PM Elias Pereira <empbilly at gmail.com> wrote:> Hello Andrew, > > What's the explanation for when the log level is set to 5, the error > NT_STATUS_IO_TIMEOUT doesn't appear, but when it's at the default log > level, it does? > > On Mon, Mar 18, 2024 at 10:33?AM Elias Pereira <empbilly at gmail.com> wrote: > >> hi Andrew, thanks for the help!!! >> >> It seems to me the LDAP process being busy would be the root cause here. >>> Working out what is going on here shouldn't is a detective task - I always >>> start with a wireshark trace. The client making all the noise/traffic will >>> be the one causing the trouble. >> >> >> In the wireshark analysis, should I filter only by the ldap protocol or >> leave everything? Should I look at something specific in the client logs? >> >> On Sun, Mar 10, 2024 at 9:31?PM Andrew Bartlett <abartlet at samba.org> >> wrote: >> >>> Thanks for getting back to me. >>> >>> It seems to me the LDAP process being busy would be the root cause >>> here. Working out what is going on here shouldn't is a detective task - I >>> always start with a wireshark trace. The client making all the >>> noise/traffic will be the one causing the trouble. >>> >>> If it isn't clear from that, then look into the DB audit logging for >>> perhaps busy writes >>> >>> >>> https://wiki.samba.org/index.php/Setting_up_Audit_Logging#Enabling_AD_DC_Database_Audit_Logging >>> >>> Finally, set 'log level = 5' and look for logs like: LDAP Query: >>> Duration was >>> >>> This will tell you about how long each query is taking, potentially >>> showing a particularly slow query that needs to be stopped. >>> >>> Andrew Bartlett >>> >>> On Sun, 2024-03-10 at 19:46 -0300, Elias Pereira wrote: >>> >>> Is the drepl local processes very busy doing inbound replication? >>> >>> >>> How can I check this? >>> >>> My instinct is either the server is very busy (and this should show up >>> in CPU use) or a transaction is being held open excessively. >>> >>> >>> I use VMs on Proxmox. In DC1, I installed the Proxmox agent, and CPU >>> usage via the dashboard is very low. However, when I checked using 'top,' >>> the LDAP process is consuming around 94/96% of the CPU. Very strange. >>> >>> >>> It is probably 94% of a single CPU, but you might have 8 CPUs in the VM, >>> so overall use is low. >>> >>> The VM has 4 CPUs and 6GB of memory. >>> >>> >>> >>> On Sun, Mar 10, 2024 at 5:55?PM Andrew Bartlett <abartlet at samba.org> >>> wrote: >>> >>> Either the local server is busy, or possibly (but it would not explain >>> the samba_kcc) Samba's drepl process is stuck talking to a remote server. >>> >>> Is the drepl local processes very busy doing inbound replication? >>> >>> My instinct is either the server is very busy (and this should show up >>> in CPU use) or a transaction is being held open excessively. >>> >>> Andrew Bartlett >>> >>> On Sat, 2024-03-09 at 19:11 -0300, Elias Pereira via samba wrote: >>> >>> I've been grappling with a recurring set of errors for quite some time now: >>> >>> - UpdateRefs failed with NT_STATUS_IO_TIMEOUT >>> >>> - Failed samba_kcc - NT_STATUS_IO_TIMEOUT >>> >>> - IRPC callback failed for DsReplicaSync - NT_STATUS_IO_TIMEOUT >>> >>> >>> Despite cranking up the log level to 10, the returned information remains >>> >>> frustratingly cryptic and hard to decipher. >>> >>> >>> This error, being overly generic, continues to elude identification even >>> >>> with >>> >>> the heightened log verbosity. The challenge lies in tracing its origin. >>> >>> >>> Running samba-tool dbcheck doesn't reveal any problems, yet executing the >>> >>> command while monitoring the Samba log with "tail -f" exposes errors >>> >>> identical >>> >>> to those described above. >>> >>> >>> Interestingly, samba-tool drs showrepl doesn't report any errors. >>> >>> >>> So, what additional steps can be taken to unearth the root cause >>> >>> of these persistent NT_STATUS_IO_TIMEOUT errors? >>> >>> >>> >>> On Fri, Mar 1, 2024 at 10:32?PM Elias Pereira < >>> >>> empbilly at gmail.com >>> >>> > wrote: >>> >>> >>> There is probably nothing wrong with your log, but Firefox doesn't >>> >>> like it, it thinks it contains a virus. >>> >>> >>> >>> I just saw now that your response ended up in spam, probably because of >>> >>> the link with the log. O.o >>> >>> >>> I still receive the error in the logs: >>> >>> source4/dsdb/kcc/kcc_periodic.c:790: Failed samba_kcc - >>> >>> NT_STATUS_IO_TIMEOUT >>> >>> >>> The strangest thing is that it occurs when the command is executed: >>> >>> samba-tool dbcheck --cross-ncs --fix --yes >>> >>> >>> Could it be some object causing this error? >>> >>> >>> On Mon, Feb 12, 2024 at 4:40?PM Rowland Penny via samba < >>> >>> samba at lists.samba.org >>> >>> > wrote: >>> >>> >>> On Mon, 12 Feb 2024 16:20:27 -0300 >>> >>> Elias Pereira via samba < >>> >>> samba at lists.samba.org >>> >>> > wrote: >>> >>> >>> hi, >>> >>> >>> My saga continues... >>> >>> >>> I've configured the audit log for drs_repl in smb.conf, and below is >>> >>> the log generated. >>> >>> https://transfer.sh/7fen4qCNIQ/drs_repl.log >>> >>> >>> >>> The log level was 5. >>> >>> drs_repl:5@/var/log/samba/drs_repl.log >>> >>> >>> Could someone take a look and help me understand the log? >>> >>> >>> >>> There is probably nothing wrong with your log, but Firefox doesn't >>> >>> like it, it thinks it contains a virus. >>> >>> >>> Rowland >>> >>> >>> >>> >>> -- >>> >>> To unsubscribe from this list go to the following URL and read the >>> >>> instructions: >>> >>> https://lists.samba.org/mailman/options/samba >>> >>> >>> >>> >>> >>> -- >>> >>> Elias Pereira >>> >>> >>> >>> >>> -- >>> >>> Elias Pereira >>> >>> -- >>> >>> >>> Andrew Bartlett (he/him) https://samba.org/~abartlet/ >>> Samba Team Member (since 2001) https://samba.org >>> Samba Team Lead https://catalyst.net.nz/services/samba >>> Catalyst.Net Ltd >>> >>> Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT group >>> company >>> >>> Samba Development and Support: https://catalyst.net.nz/services/samba >>> >>> Catalyst IT - Expert Open Source Solutions >>> >>> >>> >>> >>> -- >>> Elias Pereira >>> >>> -- >>> >>> Andrew Bartlett (he/him) https://samba.org/~abartlet/ >>> Samba Team Member (since 2001) https://samba.org >>> Samba Team Lead https://catalyst.net.nz/services/samba >>> Catalyst.Net Ltd >>> >>> Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT group >>> company >>> >>> Samba Development and Support: https://catalyst.net.nz/services/samba >>> >>> Catalyst IT - Expert Open Source Solutions >>> >>> >>> >> >> -- >> Elias Pereira >> > > > -- > Elias Pereira >-- Elias Pereira
Andrew Bartlett
2024-Apr-02 19:28 UTC
[Samba] How to diagnose a busy LDAP server process in the Samba AD DC
1.5 seconds is pretty long, I would look into what those queries are. I would also look into repeated queries, sometimes these things are clients stuck in a loop where they don't complete because they expect some termination condition. Andrew Bartlett On Tue, 2024-04-02 at 09:25 -0300, Elias Pereira via samba wrote:> The saga continues... > I've spent a whole day with log level 5 and 7 and no error. All I > have todo is return the log to the default and the error reappears. > I monitored the "LDAP Query: Duration", but I didn't notice any > crashes inthe queries. > I don't know if it's a long time, but some queries took 1.5s. > Is there anything else I can do? > On Mon, Mar 25, 2024 at 1:30?PM Elias Pereira <empbilly at gmail.com> > wrote: > > Hello Andrew, > > What's the explanation for when the log level is set to 5, the > > errorNT_STATUS_IO_TIMEOUT doesn't appear, but when it's at the > > default loglevel, it does? > > On Mon, Mar 18, 2024 at 10:33?AM Elias Pereira <empbilly at gmail.com> > > wrote: > > > hi Andrew, thanks for the help!!! > > > It seems to me the LDAP process being busy would be the root > > > cause here. > > > > Working out what is going on here shouldn't is a detective task > > > > - I alwaysstart with a wireshark trace. The client making all > > > > the noise/traffic willbe the one causing the trouble. > > > > > > In the wireshark analysis, should I filter only by the ldap > > > protocol orleave everything? Should I look at something specific > > > in the client logs? > > > On Sun, Mar 10, 2024 at 9:31?PM Andrew Bartlett < > > > abartlet at samba.org>wrote: > > > > Thanks for getting back to me. > > > > It seems to me the LDAP process being busy would be the root > > > > causehere. Working out what is going on here shouldn't is a > > > > detective task - Ialways start with a wireshark trace. The > > > > client making all thenoise/traffic will be the one causing the > > > > trouble. > > > > If it isn't clear from that, then look into the DB audit > > > > logging forperhaps busy writes > > > > > > > > https://wiki.samba.org/index.php/Setting_up_Audit_Logging#Enabling_AD_DC_Database_Audit_Logging > > > > > > > > Finally, set 'log level = 5' and look for logs like: LDAP > > > > Query:Duration was > > > > This will tell you about how long each query is taking, > > > > potentiallyshowing a particularly slow query that needs to be > > > > stopped. > > > > Andrew Bartlett > > > > On Sun, 2024-03-10 at 19:46 -0300, Elias Pereira wrote: > > > > Is the drepl local processes very busy doing inbound > > > > replication? > > > > > > > > How can I check this? > > > > My instinct is either the server is very busy (and this should > > > > show upin CPU use) or a transaction is being held open > > > > excessively. > > > > > > > > I use VMs on Proxmox. In DC1, I installed the Proxmox agent, > > > > and CPUusage via the dashboard is very low. However, when I > > > > checked using 'top,'the LDAP process is consuming around 94/96% > > > > of the CPU. Very strange. > > > > > > > > It is probably 94% of a single CPU, but you might have 8 CPUs > > > > in the VM,so overall use is low. > > > > The VM has 4 CPUs and 6GB of memory. > > > > > > > > > > > > On Sun, Mar 10, 2024 at 5:55?PM Andrew Bartlett < > > > > abartlet at samba.org>wrote: > > > > Either the local server is busy, or possibly (but it would not > > > > explainthe samba_kcc) Samba's drepl process is stuck talking to > > > > a remote server. > > > > Is the drepl local processes very busy doing inbound > > > > replication? > > > > My instinct is either the server is very busy (and this should > > > > show upin CPU use) or a transaction is being held open > > > > excessively. > > > > Andrew Bartlett > > > > On Sat, 2024-03-09 at 19:11 -0300, Elias Pereira via samba > > > > wrote: > > > > I've been grappling with a recurring set of errors for quite > > > > some time now: > > > > - UpdateRefs failed with NT_STATUS_IO_TIMEOUT > > > > - Failed samba_kcc - NT_STATUS_IO_TIMEOUT > > > > - IRPC callback failed for DsReplicaSync - NT_STATUS_IO_TIMEOUT > > > > > > > > Despite cranking up the log level to 10, the returned > > > > information remains > > > > frustratingly cryptic and hard to decipher. > > > > > > > > This error, being overly generic, continues to elude > > > > identification even > > > > with > > > > the heightened log verbosity. The challenge lies in tracing its > > > > origin. > > > > > > > > Running samba-tool dbcheck doesn't reveal any problems, yet > > > > executing the > > > > command while monitoring the Samba log with "tail -f" exposes > > > > errors > > > > identical > > > > to those described above. > > > > > > > > Interestingly, samba-tool drs showrepl doesn't report any > > > > errors. > > > > > > > > So, what additional steps can be taken to unearth the root > > > > cause > > > > of these persistent NT_STATUS_IO_TIMEOUT errors? > > > > > > > > > > > > On Fri, Mar 1, 2024 at 10:32?PM Elias Pereira < > > > > empbilly at gmail.com > > > > > > > > > wrote: > > > > > > > > There is probably nothing wrong with your log, but Firefox > > > > doesn't > > > > like it, it thinks it contains a virus. > > > > > > > > > > > > I just saw now that your response ended up in spam, probably > > > > because of > > > > the link with the log. O.o > > > > > > > > I still receive the error in the logs: > > > > source4/dsdb/kcc/kcc_periodic.c:790: Failed samba_kcc - > > > > NT_STATUS_IO_TIMEOUT > > > > > > > > The strangest thing is that it occurs when the command is > > > > executed: > > > > samba-tool dbcheck --cross-ncs --fix --yes > > > > > > > > Could it be some object causing this error? > > > > > > > > On Mon, Feb 12, 2024 at 4:40?PM Rowland Penny via samba < > > > > samba at lists.samba.org > > > > > > > > > wrote: > > > > > > > > On Mon, 12 Feb 2024 16:20:27 -0300 > > > > Elias Pereira via samba < > > > > samba at lists.samba.org > > > > > > > > > wrote: > > > > > > > > hi, > > > > > > > > My saga continues... > > > > > > > > I've configured the audit log for drs_repl in smb.conf, and > > > > below is > > > > the log generated. > > > > https://transfer.sh/7fen4qCNIQ/drs_repl.log > > > > > > > > > > > > > > > > The log level was 5. > > > > drs_repl:5@/var/log/samba/drs_repl.log > > > > > > > > Could someone take a look and help me understand the log? > > > > > > > > > > > > There is probably nothing wrong with your log, but Firefox > > > > doesn't > > > > like it, it thinks it contains a virus. > > > > > > > > Rowland > > > > > > > > > > > > > > > > -- > > > > To unsubscribe from this list go to the following URL and read > > > > the > > > > instructions: > > > > https://lists.samba.org/mailman/options/samba > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Elias Pereira > > > > > > > > > > > > > > > > -- > > > > Elias Pereira > > > > -- > > > > > > > > Andrew Bartlett (he/him) https://samba.org/~abartlet/ > > > > Samba Team Member (since 2001) https://samba.org > > > > Samba Team Lead > > > > https://catalyst.net.nz/services/samba > > > > Catalyst.Net Ltd > > > > Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT > > > > groupcompany > > > > Samba Development and Support: > > > > https://catalyst.net.nz/services/samba > > > > > > > > Catalyst IT - Expert Open Source Solutions > > > > > > > > > > > > > > > > --Elias Pereira > > > > -- > > > > Andrew Bartlett (he/him) https://samba.org/~abartlet/ > > > > Samba Team Member (since 2001) https://samba.org > > > > Samba Team Lead > > > > https://catalyst.net.nz/services/samba > > > > Catalyst.Net Ltd > > > > Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT > > > > groupcompany > > > > Samba Development and Support: > > > > https://catalyst.net.nz/services/samba > > > > > > > > Catalyst IT - Expert Open Source Solutions > > > > > > > > > > > > > > --Elias Pereira > > > > --Elias Pereira > > -- Elias Pereira-- Andrew Bartlett (he/him) https://samba.org/~abartlet/Samba Team Member (since 2001) https://samba.orgSamba Team Lead https://catalyst.net.nz/services/sambaCatalyst.Net Ltd Proudly developing Samba for Catalyst.Net Ltd - a Catalyst IT group company Samba Development and Support: https://catalyst.net.nz/services/samba Catalyst IT - Expert Open Source Solutions
Possibly Parallel Threads
- How to diagnose a busy LDAP server process in the Samba AD DC
- How to diagnose a busy LDAP server process in the Samba AD DC
- How to diagnose a busy LDAP server process in the Samba AD DC
- How to diagnose a busy LDAP server process in the Samba AD DC
- How to diagnose a busy LDAP server process in the Samba AD DC