Bartosz Kwitniewski
2022-Sep-02 14:54 UTC
Thousands of SSL certificates stalls new logins during reload - problem with Dovecot config process
Out of other services on that machine that are able to handle such number of certificates during reloads: - proftpd loads configs dynamically based on SNI domain - exim loads certificates dynamically based on SNI domain - LiteSpeed switches to a new process after loading whole configuration Best regards, -- Bartosz Kwitniewski On 02/09/2022 14:52, Felipe Gasper wrote:> For hosting environments--where TLS certs can change hundreds of times in a matter of minutes--it would be a boon for Dovecot to load those certificates dynamically rather than all at once. > > Pure-FTPd implements a nice solution to this: a standalone service that fetches TLS certificates & keys. Documented here: > > https://github.com/jedisct1/pure-ftpd/blob/9d25440e5b5283fbeca94dd0595aa6672c3f8428/README.TLS#L161 > > -FG > > >> On Sep 2, 2022, at 08:44, Bartosz Kwitniewski <zerg-dovecot at uid0.pl> wrote: >> >> Hello, >> >> I'm running a dovecot 2.3.19.1 server that has around 6000 SSL certificates in separate config files, each containing: >> local_name "domain" { >> ssl_cert = ... >> ssl_key = ... >> } >> When new certificate is added, dovecot is reloaded (around 20 times a day). When dovecot is being reloaded, users are unable to log in for around 30 seconds. >> >> The main problem here seems to be that during reload, new config process is immediately designated as the one serving config requests and then it starts parsing config files, which takes around 20-30 seconds. If it would parse config files first, and only then would become a new process for serving config requests, then it would probably solve the problem. Or perhaps there is a better way to load new certificates or a way to optimize? >> >> There is another problem with config process and shutdown_clients=no. We do not want to disconnect users during reload, because e.g. Thunderbird displays a popup that server is shutting down. When there are long lasting IMAP connections from Google and other services that aggregate e-mail, old config process is not being killed. Because config process with ~6000 certificates is using ~1 GB of RAM, it can quickly rise to 20 GB of memory used. This is not a big issue however, because we have created a task that kills old processes, but there could be a built-in mechanism to solve that problem. >> >> I have created minimal configuration and scripts to recreate problem. Reproduction steps below. >> >> (...)
John Stoffel
2022-Sep-02 20:45 UTC
Thousands of SSL certificates stalls new logins during reload - problem with Dovecot config process
>>>>> "Bartosz" == Bartosz Kwitniewski <zerg-dovecot at uid0.pl> writes:> Out of other services on that machine that are able to handle such > number of certificates during reloads: > - proftpd loads configs dynamically based on SNI domain > - exim loads certificates dynamically based on SNI domain > - LiteSpeed switches to a new process after loading whole configurationAre you running all these services on one machine? Maybe you could get an SSL termination device which terminates the SSL connections and then forwards them into the proper backend application? This way only one system needs to be managed for certs, and only one (or two since I assume you have an HA pair :-) needs to then reload when new certs are inserted. If you could hack the proftpd cert code into dovecot, that might also be a way around it. I haven't a clue how this works since I haven't looked at either code base. It won't be simple, but I'm sure others would apprecaite it. If it's critical, paying for the feature to be added is another option.> Best regards, > -- > Bartosz Kwitniewski> On 02/09/2022 14:52, Felipe Gasper wrote: >> For hosting environments--where TLS certs can change hundreds of times in a matter of minutes--it would be a boon for Dovecot to load those certificates dynamically rather than all at once. >> >> Pure-FTPd implements a nice solution to this: a standalone service that fetches TLS certificates & keys. Documented here: >> >> https://github.com/jedisct1/pure-ftpd/blob/9d25440e5b5283fbeca94dd0595aa6672c3f8428/README.TLS#L161 >> >> -FG >> >> >>> On Sep 2, 2022, at 08:44, Bartosz Kwitniewski <zerg-dovecot at uid0.pl> wrote: >>> >>> Hello, >>> >>> I'm running a dovecot 2.3.19.1 server that has around 6000 SSL certificates in separate config files, each containing: >>> local_name "domain" { >>> ssl_cert = ... >>> ssl_key = ... >>> } >>> When new certificate is added, dovecot is reloaded (around 20 times a day). When dovecot is being reloaded, users are unable to log in for around 30 seconds. >>> >>> The main problem here seems to be that during reload, new config process is immediately designated as the one serving config requests and then it starts parsing config files, which takes around 20-30 seconds. If it would parse config files first, and only then would become a new process for serving config requests, then it would probably solve the problem. Or perhaps there is a better way to load new certificates or a way to optimize? >>> >>> There is another problem with config process and shutdown_clients=no. We do not want to disconnect users during reload, because e.g. Thunderbird displays a popup that server is shutting down. When there are long lasting IMAP connections from Google and other services that aggregate e-mail, old config process is not being killed. Because config process with ~6000 certificates is using ~1 GB of RAM, it can quickly rise to 20 GB of memory used. This is not a big issue however, because we have created a task that kills old processes, but there could be a built-in mechanism to solve that problem. >>> >>> I have created minimal configuration and scripts to recreate problem. Reproduction steps below. >>> >>> (...)