Mike Brudenell
2009-Mar-09 17:41 UTC
[Dovecot] Dovecot 1.1.11 failing to accept SSL/TLS connections
Greetings - We are running Dovecot 1.1.11 on our servers and have been gradually migrating people off our old (UW-based) systems onto the new (Dovecot-based) ones. As a result the new systems are seeing more connections from more people. We have started seeing problems reported by users of our webmail service that they are getting an error indicating the webmail software (Prayer) has failed to establish an IMAP connection using TLS to Dovecot. Investigations show it is not just the webmail service that is affected but all mail clients: it's just other clients retry the connection, whereas Prayer fails the login and shows the error message: TLS/SSL failure for username.imap.york.ac.uk: SSL negotiation failed It seems to be related to one of Dovecot's imap-login processes getting a lot of file descriptors in use. We initially spawn off 10 imap-login processes and have each handling 256 connections. Full "dovecot -n" at the end of the message, but the relevant settings here are: login_process_size = 64 login_process_per_connection = no login_processes_count = 10 login_max_processes_count = 128 login_max_connections = 256 We have grown to suspect it is to do with one of the imap-login processes having a large number of files open. Killing the process seems to get rid of the problem. For example currently we have 11 imap-login processes running, one of which has 518 open files -- process 12436 in the list below. I suspect that, as in previous times we've encountered the problem, killing this process will alleviate the problem. (I'll be doing this later on this evening.) It is slightly odd that the imap-login processes have a very skewed distribution of open files, almost as if the algorithm for allocating connections to a process favours some over others. For example the current counts of open files are: Pid = Open files count 12430 = 42 26818 = 237 12431 = 90 12433 = 12 12438 = 304 12437 = 106 12435 = 190 12432 = 14 12436 = 518 12434 = 32 12429 = 12 Process 12436 was one of the 10 imap-login processes initially created back on March 3rd. (Process 26818 was the additional imap-login process spawned a little later on March 4th.) I don't believe the problem lies with the available file descriptors on the system: in the script which starts Dovecot I do date >/var/run/dovecot-limits ulimit -Sa >>/var/run/dovecot-limits echo "----------" >>/var/run/dovecot-limits ulimit -Ha >>/var/run/dovecot-limits ulimit -Sn 10128 echo "==========" >>/var/run/dovecot-limits ulimit -Sa >>/var/run/dovecot-limits echo "----------" >>/var/run/dovecot-limits ulimit -Ha >>/var/run/dovecot-limits $DOVECOT >>/var/run/dovecot-limits 2>&1 (The magic 10128 number came from Dovecot 1.1.11 itself, complaining that the number I had originally was too low.) Likewise the output of the pfiles command on process 12436 (which is the one I believe to be problematic) indicates its limit still has some available -- I'm guessing Dovecot has reduced the limit down to 533 from the 10128 set in the startup script: Current rlimit: 533 file descriptors We originally saw this problem with Dovecot 1.0.3 which we were running up until a couple of weeks ago. From there I upgraded first to 1.0.15 and then to 1.1.11. I'd hoped that upgrading would fix the problem: I see it has been mentioned before on the list, but not for a year or two. Can anyone give any help, please? Cheers, Mike Brudenell Configuration info Platform: Solaris 10 dovecot -n output is : # 1.1.11: /usr/local/dovecot-1.1.11/etc/dovecot.conf # OS: SunOS 5.10 i86pc ufs base_dir: /var/run/dovecot/ log_path: /logfiles/mail/live/dovecot listen: *:143 ssl_listen: *:993 ssl_cert_file: /usr/local/dovecot/ssl/certs/imapservice-bundle-2007.pem ssl_key_file: /usr/local/dovecot/ssl/private/imapservice-key-2007.pem disable_plaintext_auth: no shutdown_clients: no login_dir: /var/run/dovecot/login login_executable: /usr/local/dovecot-1.1.11/libexec/dovecot/imap-login login_log_format_elements: user=<%Lu> method=%m rip=%r lip=%l %c login_process_per_connection: no login_greeting_capability: yes login_processes_count: 10 max_mail_processes: 10000 mail_max_userip_connections: 20 mail_location: maildir:/mailstore/messages/%1Ln/%Ln/Maildir:INDEX=/mailstore/index/%1Ln/%Ln:CONTROL=/mailstore/control/%1Ln/%Ln mail_plugins: quota imap_quota fts fts_squat mail_log_prefix: [%p]%Us(%Lu): imap_client_workarounds: delay-newmail namespace: type: private separator: / inbox: yes list: yes subscriptions: yes auth default: mechanisms: plain login cache_size: 1024 cache_ttl: 600 username_chars: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890 username_format: %Ln passdb: driver: shadow userdb: driver: passwd plugin: fts: squat fts_squat: partial=4 full=4 quota: fs:user
Timo Sirainen
2009-Mar-13 20:54 UTC
[Dovecot] Dovecot 1.1.11 failing to accept SSL/TLS connections
On Mon, 2009-03-09 at 17:41 +0000, Mike Brudenell wrote:> We have grown to suspect it is to do with one of the imap-login processes > having a large number of files open. Killing the process seems to get rid of > the problem.You didn't mention if you actually saw "Too many open files" errors in log file. If you didn't, the problem isn't with ulimits.> It is slightly odd that the imap-login processes have a very skewed > distribution of open files, almost as if the algorithm for allocating > connections to a process favours some over others.That's because of kernel. Dovecot doesn't currently even attempt to distribute the connections. Instead each process simply waits for new connections and the process that's the quickest gets it.> Likewise the output of the pfiles command on process 12436 (which is the one > I believe to be problematic) indicates its limit still has some available -- > I'm guessing Dovecot has reduced the limit down to 533 from the 10128 set in > the startup script: > > Current rlimit: 533 file descriptorsYes, v1.1 drops the number of fds to the maximum number that it needs. Since you had login_max_connections=256, it doesn't need more than twice as much of them. The 12436 process probably was very close to the 256 connections, and after reaching that it would have stopped accepting more. But there do seem to be bugs related to reaching login_max_connections. I'm not entirely sure what bugs exactly though. It's just better not to reach it. Perhaps you should change the settings to something like: login_processes_count = 2 login_max_connections = 1024 login_processes_count should be about the same as the number of CPUs/cores on the system (maybe +1). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20090313/86e0710a/attachment-0002.bin>
Mike Brudenell
2009-Mar-19 14:01 UTC
[Dovecot] Dovecot 1.1.11 failing to accept SSL/TLS connections
Hi! Sorry for the delay in replying: I was waiting for the problem to recur so I could double-check the logs and the states of the imap-login processes. 2009/3/13 Timo Sirainen <tss at iki.fi>> On Mon, 2009-03-09 at 17:41 +0000, Mike Brudenell wrote: > > We have grown to suspect it is to do with one of the imap-login processes > > having a large number of files open. Killing the process seems to get rid > of > > the problem. > > You didn't mention if you actually saw "Too many open files" errors in > log file. If you didn't, the problem isn't with ulimits.No, there's no sign of the "Too many open files" error message in the logfiles.> Likewise the output of the pfiles command on process 12436 (which is the > one > > I believe to be problematic) indicates its limit still has some available > -- > > I'm guessing Dovecot has reduced the limit down to 533 from the 10128 set > in > > the startup script: > > > > Current rlimit: 533 file descriptors > > Yes, v1.1 drops the number of fds to the maximum number that it needs. > Since you had login_max_connections=256, it doesn't need more than twice > as much of them. The 12436 process probably was very close to the 256 > connections, and after reaching that it would have stopped accepting > more.Ah, I see. When I upgraded from 1.0.15 I had 1.1.11 telling me off for having the fd limit set too low at 2048 when I started Dovecot. Instead it told me to raise the limit to at least 10128, so I did. Hence I was a bit surprised to find the limit lowered down to 533 if it had told me it wanted the higher number.> But there do seem to be bugs related to reaching login_max_connections. > I'm not entirely sure what bugs exactly though. It's just better not to > reach it. Perhaps you should change the settings to something like: > > login_processes_count = 2 > login_max_connections = 1024 > > login_processes_count should be about the same as the number of > CPUs/cores on the system (maybe +1). >We're running a pair of servers, each with 8 CPUs. So I'm guessing my login_processes_count = 10 should be OK? The servers are handling a LOT of client machines. For example I've just checked the two machines and as I write there are 1881 "imap" processes on one, and 1808 on the other. I'm guessing that if I increase login_max_connections from its current 256 to 1024 this might delay the problem occurring? And perhaps if I were restart Dovecot in the small hours of the night every few days? Or is an alternative workaround to change login_process_per_connection from no to yes? ...If I were to do this am I right in thinking that imap-login then plays no part in SSL-connected IMAP sessions? As it's imap-login that seems to be having the problem, anything I can do ti reduce the number of connections its handling would presumably help? If it's any help in working out what the problem might be I have the output from the Solaris "pfiles" command, which lists all the open files each process has. The output for a "rogue" imap-login process shows lots of these as being S_IFSOCK and connected to clients as expected. There are also lots which are AF_UNIX as well -- I'm guessing the proxying of SSL connections through imap-login to the imap process? I can send you (Timo) this file privately if you think it might help any. Cheers, Mike B-)