Tore André Klock
2006-May-31 13:33 UTC
[Dovecot] Dovecot 1.0beta8 dovecot-auth consumes 100% CPU time on Solaris 10 amd64
Hello. I hope someone out there can help with this. It is getting pretty urgent. I am running a Solaris 10 server on Opteron (amd64) hardware and have compiled Dovecot 1.0beta8 from source. It has openssl compiled in (after much mucking around with various environment variables and modifying the Makefile), and was built with: $ ./configure --with-ldap --with-ssl-dir=/etc/ssl $ make # make install The variables: CC=/usr/sfw/bin/gcc CFLAGS='-m64 -I/usr/local/ssl/include:/usr/sfw/include:/opt/sfw/include' CPP=/usr/sfw/bin/cpp CPPFLAGS=-I/usr/local/ssl/include/openssl:/usr/sfw/include:/opt/sfw/include CXX=/usr/sfw/bin/g++ LDFLAGS='-m64 -L/usr/local/ssl/lib -R/usr/local/ssl/lib:/usr/sfw/lib:/opt/sfw/lib' LD_LIBRARY_PATH=/usr/local/ssl/lib:/usr/sfw/lib:/opt/sfw/lib It was compiled with gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath). I modified the SSL_CFLAGS and SSL_LIBS variables in the Makefile to make it find my openssl install in /usr/local/ssl. The one that came with the OS was broken[1], so I compiled 0.9.8b from source. The configure script for some reason didn't pick this up, even when using CFLAGS, LDFLAGS, etc. At any rate, I successfully set it up to accept SSL/TLS connections and authenticate against an LDAP server. I use the auth_bind authentication method and only one LDAP host. I have confirmed that this works. In fact, dovecot runs just fine with the exception of the dovecot-auth proces, which is thrown into an endless loop. It starts off at maybe 1-2% CPU time, but after a few minutes it is back up to 100%. This is the output from truss(1) pointed at the dovecot-auth process: --cut-- pollsys(0xFFFFFD7FFFDFF6F0, 5, 0xFFFFFD7FFFDFF6D0, 0x00000000) = 0 pollsys(0xFFFFFD7FFFDFF6F0, 5, 0xFFFFFD7FFFDFF6D0, 0x00000000) = 0 pollsys(0x00467DC0, 10, 0xFFFFFD7FFFDFFB10, 0x00000000) = 1 [same sequence repeats ad nauseum] --cut-- This is a sample of top(1) output: --cut-- load averages: 1.00, 1.00, 1.00 08:51:38 60 processes: 55 sleeping, 3 running, 1 zombie, 1 on cpu CPU states: 0.0% idle, 68.3% user, 31.7% kernel, 0.0% iowait, 0.0% swap Memory: 1024M real, 680M free, 71M swap in use, 591M swap free PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 22875 nobody 1 0 0 0K 0K run 726:22 99.68% dovecot-auth --cut-- Here is a condensed version of my ldap-config: (comments removed, names changed to protect the innocent) --cut-- hosts = ldap.example.com auth_bind = yes auth_bind_userdn = uid=%u,ou=People,o=example.com,o=example.com ldap_version = 3 base = uid=%,ou=People,o=example.com,o=example.com --cut-- This is my condensed dovecot.conf: --cut-- base_dir = /var/run/dovecot/ protocols = imap pop3 listen = * ssl_disable = no ssl_cert_file = /etc/ssl/certs/host.cert ssl_key_file = /etc/ssl/private/host.key ssl_cipher_list = ALL:!LOW disable_plaintext_auth = no shutdown_clients = yes log_path = /var/log/dovecot log_timestamp = "%b %d %H:%M:%S " login_user = dovecot login_process_per_connection = yes verbose_proctitle = yes verbose_ssl = yes first_valid_uid = 100 last_valid_uid = 60000 mail_debug = yes default_mail_env = maildir:/var/mail/%u/Maildir mail_full_filesystem_access = no protocol imap { login_executable = /usr/local/libexec/dovecot/imap-login mail_executable = /usr/local/libexec/dovecot/imap login_greeting_capability = yes imap_client_workarounds = outlook-idle tb-extra-mailbox-sep } protocol pop3 { login_executable = /usr/local/libexec/dovecot/pop3-login mail_executable = /usr/local/libexec/dovecot/pop3 pop3_uidl_format = %08Xu%08Xv pop3_logout_format = top=%t/%p, retr=%r/%b, del=%d/%m, size=%s mail_plugin_dir = /usr/local/lib/dovecot/pop3 pop3_client_workarounds = outlook-no-nuls oe-ns-eoh } auth_executable = /usr/local/libexec/dovecot/dovecot-auth auth_cache_size = 0 auth_username_chars abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890.-_@ auth_verbose = yes auth_debug = no auth default { mechanisms = plain passdb ldap { args = /usr/local/etc/dovecot-ldap.conf } userdb passwd { } user = nobody } --cut-- I may have commented out more than necessary in an attempt to be explicit while I was trying to narrow this down. This is the output of ldd dovecot-auth: --cut-- libpam.so.1 => /lib/64/libpam.so.1 libldap.so.5 => /usr/lib/64/libldap.so.5 libsocket.so.1 => /lib/64/libsocket.so.1 libnsl.so.1 => /lib/64/libnsl.so.1 librt.so.1 => /lib/64/librt.so.1 libsendfile.so.1 => /lib/64/libsendfile.so.1 libc.so.1 => /lib/64/libc.so.1 libcmd.so.1 => /lib/64/libcmd.so.1 libsasl.so.1 => /usr/lib/64/libsasl.so.1 libmd5.so.1 => /lib/64/libmd5.so.1 libnspr4.so => /usr/lib/mps/64/libnspr4.so libplc4.so => /usr/lib/mps/64/libplc4.so libnss3.so => /usr/lib/mps/64/libnss3.so libssl3.so => /usr/lib/mps/64/libssl3.so libmp.so.2 => /lib/64/libmp.so.2 libscf.so.1 => /lib/64/libscf.so.1 libaio.so.1 => /lib/64/libaio.so.1 libpthread.so.1 => /lib/64/libpthread.so.1 libthread.so.1 => /lib/64/libthread.so.1 libdl.so.1 => /lib/64/libdl.so.1 libsoftokn3.so => /usr/lib/mps/amd64/libsoftokn3.so libplds4.so => /usr/lib/mps/amd64/libplds4.so libdoor.so.1 => /lib/64/libdoor.so.1 libuutil.so.1 => /lib/64/libuutil.so.1 libm.so.2 => /lib/64/libm.so.2 --cut-- Notes: - The problem *seems* to occur only when I use LDAP authentication. - Disabling SSL in the config has no effect on the CPU usage. - I am not using NFS for anything. - I use Thunderbird 1.5.0.2 and another client runs Outlook. - There is no indication of any problem in the dovecot or system logs. - I suspected this had to do with the poll method dovecot is compiled with (poll), but it happens even when I compiled it with the select method. As far as I know, none of the other methods are available on Solaris. - I noted that the poll(2) call in Solaris will return immediately given a timeout of zero (-1 is 'forever'). I tried tacking down the use of that system call, but I had to give up on that. I only mention it in case it is a dovecot bug and of any help to Timo. Any help is greatly appreciated. Thank you. - Tore [1] If you use the factory-installed version, you get messages like "Warning: imap-login: SSL_accept() failed: error:1409D08A:SSL routines:SSL3_SETUP_KEY_BLOCK:cipher or hash unavailable" in your dovecot log.
Timo Sirainen
2006-May-31 14:10 UTC
[Dovecot] Dovecot 1.0beta8 dovecot-auth consumes 100% CPU time on Solaris 10 amd64
On Wed, 2006-05-31 at 09:33 -0400, Tore Andr? Klock wrote:> This is the output from truss(1) pointed at the dovecot-auth process: > --cut-- > pollsys(0xFFFFFD7FFFDFF6F0, 5, 0xFFFFFD7FFFDFF6D0, 0x00000000) = 0 > pollsys(0xFFFFFD7FFFDFF6F0, 5, 0xFFFFFD7FFFDFF6D0, 0x00000000) = 0 > pollsys(0x00467DC0, 10, 0xFFFFFD7FFFDFFB10, 0x00000000) = 1 > [same sequence repeats ad nauseum] > --cut--So if pollsys() does return 1 many times a second, it means that some some connection is talking to Dovecot constantly. But if there's nothing else than pollsys() calls there, Dovecot doesn't react on it at all.. A bit difficult to guess why that would happen.> auth_bind = yes > auth_bind_userdn = uid=%u,ou=People,o=example.com,o=example.com..> - The problem *seems* to occur only when I use LDAP authentication.Is it possible that you could try this without auth_bind to see if the bug is in it or elsewhere in LDAP authentication?> - I noted that the poll(2) call in Solaris will return immediately given a > timeout of zero (-1 is 'forever').The pollsys() parameters seem to be something completely different. I get similar values when trussing Dovecot processes which are waiting in pollsys() for a couple of seconds. If you can't solve this in Dovecot's side, you could still use pam_ldap instead. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20060531/a2039b2a/attachment.bin>