> On Feb 21, 2017, at 11:12 PM, Christian Balzer <chibi at gol.com> wrote: > > On Tue, 21 Feb 2017 09:49:39 -0500 KT Walrus wrote: > >> I just read this blog: https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/ <https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/><https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/ <https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/>> about scaling to 12 Million Concurrent Connections on a single server and it got me thinking. >> > > While that's a nice article, nothing in it was news to me or particular > complex when one does large scale stuff, like Ceph for example. > >> Would it be possible to scale Dovecot IMAP server to 10 Million IMAP sessions on a single server? >> > I'm sure Timo's answer will (or would, if he could be bothered) be along > the lines of: > "Sure, if you give me all your gold and then some for a complete rewrite > of, well, everything?.It will be a long time before I would need to scale to 10 Million users and I will be happy to pay for the rewrite of the IMAP plugin when the time comes, if not done before then by someone else. I have seen proposals for a new client protocol called JMAP that seem to be all about running a mail server at scale like an NGINX https web server can scale. That got me thinking about wether there is anything fundamental about IMAP that causes it to be difficult to scale. After looking into Dovecot?s current IMAP implementation, I think the approach was taken that fundamentally would have scaling issues (as in, one backend process per IMAP session). I see a couple years ago, work was done to ?migrate? idling IMAP sessions to a single process that ?remembers? the state of the IMAP session and can restore it back to a backend process when the idling is done. But, the only estimate that I have read about the ?migrate idling? is that you are likely to see only a 20% reduction of the number of concurrent processes you need if you are running at 50,000 IMAP sessions per mail server. 20% reduction is not nearly enough of a benefit for scale. I would need to see at least an order of magnitude improvement to scale (and hopefully, several orders of magnitude). So, in my mind, since these IMAP sessions are long lived with infrequent bursts of activity, a better approach would be to manage the session data in memory or in an external datastore and only process using the session data when there is activity. Much like Web Sockets and even HTTPS requests are handled today for installations that need to scale to support millions of active users. As for Dovecot, I would think the work done to ?migrate? idling IMAP sessions would be a good start to implementing managing a large number of sessions with a fixed pool of worker processes like other web servers do. So, my question really is: Is there anything about the IMAP protocol that would prevent an implementation from scaling to 10 Million users per server? Or, do we need to push for a new protocol like JMAP that has been designed to scale better (by being stateless with the server requests)? Kevin
Timo Sirainen
2017-Feb-22 19:44 UTC
Scaling to 10 Million IMAP sessions on a single server
On 22 Feb 2017, at 17.07, KT Walrus <kevin at my.walr.us> wrote:> > I have seen proposals for a new client protocol called JMAP that seem to be all about running a mail server at scale like an NGINX https web server can scale. That got me thinking about wether there is anything fundamental about IMAP that causes it to be difficult to scale. After looking into Dovecot?s current IMAP implementation, I think the approach was taken that fundamentally would have scaling issues (as in, one backend process per IMAP session). I see a couple years ago, work was done to ?migrate? idling IMAP sessions to a single process that ?remembers? the state of the IMAP session and can restore it back to a backend process when the idling is done. > > But, the only estimate that I have read about the ?migrate idling? is that you are likely to see only a 20% reduction of the number of concurrent processes you need if you are running at 50,000 IMAP sessions per mail server. 20% reduction is not nearly enough of a benefit for scale. I would need to see at least an order of magnitude improvement to scale (and hopefully, several orders of magnitude).My long-term plans are something like this: * imap-hibernate process can be used more aggressively. Not necessarily even for just IDLEing sessions, but for any session that isn't actively being used. And actually if the server is too busy, even active sessions could be hibernated. That would be somewhat similar to cooperative multitasking. When this is done, you can think of the current imap processes as the worker processes. * More state will be transferred to imap-hibernate process, so it can perform simpler commands without recreating the IMAP process. For example STATUS replies can be returned from cached state as long as it hasn't actually changed. * imap-hibernate is currently tracking changed state via inotify (etc.) This mostly work, but it's also unnecessarily sometimes waking up. For example just because one IMAP session performed a FETCH that added something to dovecot.index.cache, it doesn't mean that there are any real changes. We'll need some mail plugin that notifies imap-hibernate process when some real change has happened. * Hibernated sessions can even be moved away entirely from backends into IMAP proxies. The IMAP proxy can then reconnect to backend to re-establish the session. This allows even switching backends entirely, as long as the storage is shared. This requires that backends notify the proxy whenever something changes to the user, which is mostly a continuation of the previous item (just TCP notification instead of UNIX socket notification). * IMAP proxies can also perform similar limited functionality as imap-hibernate processes. Possibly running the same imap-hibernate processes. * And kind of a reverse of hibernation: imap processes can also preserve the user's imap session and opened folder indexes in memory even after the IMAP client has disconnected. If the same user connects back, the imap process can quickly be re-used with all the state already open. This is especially useful for client that create many short-lived connections, such as webmails. So after all these changes there would practically be something like 1000 imap processes constantly open and either doing work or waiting for a recently disconnected IMAP client to come back. As Christian already mentioned, the Dovecot proxies are supposed to be able to handle quite a lot of connections. I wouldn't be surprised if you can already do millions of connections with them. Most of our customers haven't tried scaling them very hard because they don't really want to create multiple IP addresses for servers, which is required to avoid running out of TCP ports (or I guess there could be multiple destination ports, but that also complicates things and Dovecot doesn't currently support that in an easy way either).> Is there anything about the IMAP protocol that would prevent an implementation from scaling to 10 Million users per server? Or, do we need to push for a new protocol like JMAP that has been designed to scale better (by being stateless with the server requests)?I guess mainly the message sequence numbers in IMAP protocol makes this more difficult, but it's not an impossible problem to solve.
> On Feb 22, 2017, at 2:44 PM, Timo Sirainen <tss at iki.fi> wrote: > > I guess mainly the message sequence numbers in IMAP protocol makes this more difficult, but it's not an impossible problem to solve.Any thoughts on the wisdom of supporting an external database for session state or even mailbox state (like using Redis or even MySQL)? Also, would it help reliability or scalability to store a copy of the index data in an external database? I want to use mdbox format but I have heard that these index files do get corrupted occasionally and have to be rebuilt (possibly using an older version of the index file to construct a new one). I worry that using mdbox might cause my users to see the IMAP flags suddenly reset back to a previous state (like seeing previously read messages becoming unread in their mail clients). If a copy of the index data were stored in an external database, such problems of duplicate messages occurring in a dovecot cluster could be handled by having the cluster ?lookup? the index data using the external database instead of the local copy stored on the server. An external database could easily implement unique serial numbers cluster-wide. In the site I?m working on building, I even use Redis to implement ?message queues? between Postfix and Dovecot (via redis push/pop feature). Currently, I am only delivering new messages via IMAP instead of LMTP (no LMTP will be available to my backend mail servers, only IMAP). If you stored the MD5 checksum of the index files (and even the message files) in the external database, you could also run a background process that would periodically check for corruption of the local index files using the checksums from the database, making mdbox format even more bulletproof. And, the best thing about using an external database is that making the external database highly available is not a problem (as most sites already do that). The index data stored in the database would become the ?source of truth? with the local index files/session data being an efficient cache for the mailstore. And, re-caching could occur as needed to make the whole cluster more reliable. Kevin