Mike Brudenell
2009-Mar-09 17:41 UTC
[Dovecot] Dovecot 1.1.11 failing to accept SSL/TLS connections
Greetings -
We are running Dovecot 1.1.11 on our servers and have been gradually
migrating people off our old (UW-based) systems onto the new (Dovecot-based)
ones. As a result the new systems are seeing more connections from more
people.
We have started seeing problems reported by users of our webmail service
that they are getting an error indicating the webmail software (Prayer) has
failed to establish an IMAP connection using TLS to Dovecot. Investigations
show it is not just the webmail service that is affected but all mail
clients: it's just other clients retry the connection, whereas Prayer
fails the login and shows the error message:
TLS/SSL failure for username.imap.york.ac.uk: SSL negotiation failed
It seems to be related to one of Dovecot's imap-login processes getting a
lot of file descriptors in use. We initially spawn off 10 imap-login
processes and have each handling 256 connections. Full "dovecot -n" at
the
end of the message, but the relevant settings here are:
login_process_size = 64
login_process_per_connection = no
login_processes_count = 10
login_max_processes_count = 128
login_max_connections = 256
We have grown to suspect it is to do with one of the imap-login processes
having a large number of files open. Killing the process seems to get rid of
the problem.
For example currently we have 11 imap-login processes running, one of which
has 518 open files -- process 12436 in the list below. I suspect that, as in
previous times we've encountered the problem, killing this process will
alleviate the problem. (I'll be doing this later on this evening.)
It is slightly odd that the imap-login processes have a very skewed
distribution of open files, almost as if the algorithm for allocating
connections to a process favours some over others. For example the current
counts of open files are:
Pid = Open files count
12430 = 42
26818 = 237
12431 = 90
12433 = 12
12438 = 304
12437 = 106
12435 = 190
12432 = 14
12436 = 518
12434 = 32
12429 = 12
Process 12436 was one of the 10 imap-login processes initially created back
on March 3rd. (Process 26818 was the additional imap-login process spawned a
little later on March 4th.)
I don't believe the problem lies with the available file descriptors on the
system: in the script which starts Dovecot I do
date >/var/run/dovecot-limits
ulimit -Sa >>/var/run/dovecot-limits
echo "----------" >>/var/run/dovecot-limits
ulimit -Ha >>/var/run/dovecot-limits
ulimit -Sn 10128
echo "==========" >>/var/run/dovecot-limits
ulimit -Sa >>/var/run/dovecot-limits
echo "----------" >>/var/run/dovecot-limits
ulimit -Ha >>/var/run/dovecot-limits
$DOVECOT >>/var/run/dovecot-limits 2>&1
(The magic 10128 number came from Dovecot 1.1.11 itself, complaining that
the number I had originally was too low.)
Likewise the output of the pfiles command on process 12436 (which is the one
I believe to be problematic) indicates its limit still has some available --
I'm guessing Dovecot has reduced the limit down to 533 from the 10128 set in
the startup script:
Current rlimit: 533 file descriptors
We originally saw this problem with Dovecot 1.0.3 which we were running up
until a couple of weeks ago. From there I upgraded first to 1.0.15 and then
to 1.1.11. I'd hoped that upgrading would fix the problem: I see it has been
mentioned before on the list, but not for a year or two.
Can anyone give any help, please?
Cheers,
Mike Brudenell
Configuration info
Platform: Solaris 10
dovecot -n output is
:
# 1.1.11: /usr/local/dovecot-1.1.11/etc/dovecot.conf
# OS: SunOS 5.10 i86pc ufs
base_dir: /var/run/dovecot/
log_path: /logfiles/mail/live/dovecot
listen: *:143
ssl_listen: *:993
ssl_cert_file: /usr/local/dovecot/ssl/certs/imapservice-bundle-2007.pem
ssl_key_file: /usr/local/dovecot/ssl/private/imapservice-key-2007.pem
disable_plaintext_auth: no
shutdown_clients: no
login_dir: /var/run/dovecot/login
login_executable: /usr/local/dovecot-1.1.11/libexec/dovecot/imap-login
login_log_format_elements: user=<%Lu> method=%m rip=%r lip=%l %c
login_process_per_connection: no
login_greeting_capability: yes
login_processes_count: 10
max_mail_processes: 10000
mail_max_userip_connections: 20
mail_location:
maildir:/mailstore/messages/%1Ln/%Ln/Maildir:INDEX=/mailstore/index/%1Ln/%Ln:CONTROL=/mailstore/control/%1Ln/%Ln
mail_plugins: quota imap_quota fts fts_squat
mail_log_prefix: [%p]%Us(%Lu):
imap_client_workarounds: delay-newmail
namespace:
type: private
separator: /
inbox: yes
list: yes
subscriptions: yes
auth default:
mechanisms: plain login
cache_size: 1024
cache_ttl: 600
username_chars:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890
username_format: %Ln
passdb:
driver: shadow
userdb:
driver: passwd
plugin:
fts: squat
fts_squat: partial=4 full=4
quota: fs:user
Timo Sirainen
2009-Mar-13 20:54 UTC
[Dovecot] Dovecot 1.1.11 failing to accept SSL/TLS connections
On Mon, 2009-03-09 at 17:41 +0000, Mike Brudenell wrote:> We have grown to suspect it is to do with one of the imap-login processes > having a large number of files open. Killing the process seems to get rid of > the problem.You didn't mention if you actually saw "Too many open files" errors in log file. If you didn't, the problem isn't with ulimits.> It is slightly odd that the imap-login processes have a very skewed > distribution of open files, almost as if the algorithm for allocating > connections to a process favours some over others.That's because of kernel. Dovecot doesn't currently even attempt to distribute the connections. Instead each process simply waits for new connections and the process that's the quickest gets it.> Likewise the output of the pfiles command on process 12436 (which is the one > I believe to be problematic) indicates its limit still has some available -- > I'm guessing Dovecot has reduced the limit down to 533 from the 10128 set in > the startup script: > > Current rlimit: 533 file descriptorsYes, v1.1 drops the number of fds to the maximum number that it needs. Since you had login_max_connections=256, it doesn't need more than twice as much of them. The 12436 process probably was very close to the 256 connections, and after reaching that it would have stopped accepting more. But there do seem to be bugs related to reaching login_max_connections. I'm not entirely sure what bugs exactly though. It's just better not to reach it. Perhaps you should change the settings to something like: login_processes_count = 2 login_max_connections = 1024 login_processes_count should be about the same as the number of CPUs/cores on the system (maybe +1). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20090313/86e0710a/attachment-0002.bin>
Mike Brudenell
2009-Mar-19 14:01 UTC
[Dovecot] Dovecot 1.1.11 failing to accept SSL/TLS connections
Hi! Sorry for the delay in replying: I was waiting for the problem to recur so I could double-check the logs and the states of the imap-login processes. 2009/3/13 Timo Sirainen <tss at iki.fi>> On Mon, 2009-03-09 at 17:41 +0000, Mike Brudenell wrote: > > We have grown to suspect it is to do with one of the imap-login processes > > having a large number of files open. Killing the process seems to get rid > of > > the problem. > > You didn't mention if you actually saw "Too many open files" errors in > log file. If you didn't, the problem isn't with ulimits.No, there's no sign of the "Too many open files" error message in the logfiles.> Likewise the output of the pfiles command on process 12436 (which is the > one > > I believe to be problematic) indicates its limit still has some available > -- > > I'm guessing Dovecot has reduced the limit down to 533 from the 10128 set > in > > the startup script: > > > > Current rlimit: 533 file descriptors > > Yes, v1.1 drops the number of fds to the maximum number that it needs. > Since you had login_max_connections=256, it doesn't need more than twice > as much of them. The 12436 process probably was very close to the 256 > connections, and after reaching that it would have stopped accepting > more.Ah, I see. When I upgraded from 1.0.15 I had 1.1.11 telling me off for having the fd limit set too low at 2048 when I started Dovecot. Instead it told me to raise the limit to at least 10128, so I did. Hence I was a bit surprised to find the limit lowered down to 533 if it had told me it wanted the higher number.> But there do seem to be bugs related to reaching login_max_connections. > I'm not entirely sure what bugs exactly though. It's just better not to > reach it. Perhaps you should change the settings to something like: > > login_processes_count = 2 > login_max_connections = 1024 > > login_processes_count should be about the same as the number of > CPUs/cores on the system (maybe +1). >We're running a pair of servers, each with 8 CPUs. So I'm guessing my login_processes_count = 10 should be OK? The servers are handling a LOT of client machines. For example I've just checked the two machines and as I write there are 1881 "imap" processes on one, and 1808 on the other. I'm guessing that if I increase login_max_connections from its current 256 to 1024 this might delay the problem occurring? And perhaps if I were restart Dovecot in the small hours of the night every few days? Or is an alternative workaround to change login_process_per_connection from no to yes? ...If I were to do this am I right in thinking that imap-login then plays no part in SSL-connected IMAP sessions? As it's imap-login that seems to be having the problem, anything I can do ti reduce the number of connections its handling would presumably help? If it's any help in working out what the problem might be I have the output from the Solaris "pfiles" command, which lists all the open files each process has. The output for a "rogue" imap-login process shows lots of these as being S_IFSOCK and connected to clients as expected. There are also lots which are AF_UNIX as well -- I'm guessing the proxying of SSL connections through imap-login to the imap process? I can send you (Timo) this file privately if you think it might help any. Cheers, Mike B-)