I just read this blog: https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/ <https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/> about scaling to 12 Million Concurrent Connections on a single server and it got me thinking. Would it be possible to scale Dovecot IMAP server to 10 Million IMAP sessions on a single server? I think the current implementation of having a separate process manage each active IMAP session (w/ the possibility of moving idling sessions to a single hibernate process) will never be able to deploy a single server managing 10 Million IMAP sessions. But, would it be possible to implement a new IMAP server plugin that uses a fixed configurable pool of ?worker? processes, much like NGINX or PHP-FPM does. These servers can probably scale to 10 Million TCP connections, if the server is carefully tuned and has enough cores/memory to support that many active sessions. I?m thinking that the new IMAP server could use some external database (e.g., Redis or Memcached) to save all the sessions state and have the ?worker? processes poll the TCP sockets for new IMAP commands to process (fetching the session state from the external database when it has a command that is waiting on a response). The Dovecot IMAP proxies could even queue incoming commands to proxy many incoming requests to a smaller number of backend connections (like ProxySQL does for MySQL requests). That might allow each Dovecot proxy to support 10 Million IMAP sessions and a single backend could support multiple front end Dovecot proxies (to scale to 100 Million concurrent IMAP connections using 10 proxies for 100 Million connections and 1 backend server for 10 Million connections). Of course, the backend server may need to be beefy and have very fast NVMe SSDs for local storage, but changing the IMAP server to manage a pool of workers instead of requiring a process per active session, would allow bigger scale up and could save large sites a lot of money. Is this a good idea? Or, am I missing something? Kevin
Christian Balzer
2017-Feb-22 04:12 UTC
Scaling to 10 Million IMAP sessions on a single server
On Tue, 21 Feb 2017 09:49:39 -0500 KT Walrus wrote:> I just read this blog: https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/ <https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/> about scaling to 12 Million Concurrent Connections on a single server and it got me thinking. >While that's a nice article, nothing in it was news to me or particular complex when one does large scale stuff, like Ceph for example.> Would it be possible to scale Dovecot IMAP server to 10 Million IMAP sessions on a single server? >I'm sure Timo's answer will (or would, if he could be bothered) be along the lines of: "Sure, if you give me all your gold and then some for a complete rewrite of, well, everything". What you're missing and what the bad idea here is that as mentioned before scale-up only goes so far. I was feeling that my goal of 500k users/sessions in 2-node active/active cluster was quite ambitious and currently I'm looking at 200k sessions as something achievable with the current Dovecot and other limitations. But even if you were to implement something that can handle 1 million or more sessions per server, would you want to? As in, if that server goes down, the resulting packet, authentication storm will be huge and most like result in a proverbial shit storm later. Having more than 10% or so of your customers on one machine and thus involved in an outage that you KNOW will hit you eventually strikes me as a bad idea. I'm not sure how the design below meshes with Timo's lofty goals and standards when it comes to security as well. And a push with the right people (clients) to support IMAP NOTIFY would of course reduce the number of sessions significantly. Finally, Dovecot in proxy mode already scales quite well. Christian> I think the current implementation of having a separate process manage each active IMAP session (w/ the possibility of moving idling sessions to a single hibernate process) will never be able to deploy a single server managing 10 Million IMAP sessions. > > But, would it be possible to implement a new IMAP server plugin that uses a fixed configurable pool of ?worker? processes, much like NGINX or PHP-FPM does. These servers can probably scale to 10 Million TCP connections, if the server is carefully tuned and has enough cores/memory to support that many active sessions. > > I?m thinking that the new IMAP server could use some external database (e.g., Redis or Memcached) to save all the sessions state and have the ?worker? processes poll the TCP sockets for new IMAP commands to process (fetching the session state from the external database when it has a command that is waiting on a response). The Dovecot IMAP proxies could even queue incoming commands to proxy many incoming requests to a smaller number of backend connections (like ProxySQL does for MySQL requests). That might allow each Dovecot proxy to support 10 Million IMAP sessions and a single backend could support multiple front end Dovecot proxies (to scale to 100 Million concurrent IMAP connections using 10 proxies for 100 Million connections and 1 backend server for 10 Million connections). > > Of course, the backend server may need to be beefy and have very fast NVMe SSDs for local storage, but changing the IMAP server to manage a pool of workers instead of requiring a process per active session, would allow bigger scale up and could save large sites a lot of money. > > Is this a good idea? Or, am I missing something? > > Kevin-- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/
A more efficient algorithm would reduce computational complexity, and the need for expensive power-hungry CPUs. Sent from ProtonMail Mobile On Wed, Feb 22, 2017 at 5:12 AM, Christian Balzer <'chibi at gol.com'> wrote: On Tue, 21 Feb 2017 09:49:39 -0500 KT Walrus wrote:> I just read this blog: https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/ <https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/> about scaling to 12 Million Concurrent Connections on a single server and it got me thinking. >While that's a nice article, nothing in it was news to me or particular complex when one does large scale stuff, like Ceph for example.> Would it be possible to scale Dovecot IMAP server to 10 Million IMAP sessions on a single server? >I'm sure Timo's answer will (or would, if he could be bothered) be along the lines of: "Sure, if you give me all your gold and then some for a complete rewrite of, well, everything". What you're missing and what the bad idea here is that as mentioned before scale-up only goes so far. I was feeling that my goal of 500k users/sessions in 2-node active/active cluster was quite ambitious and currently I'm looking at 200k sessions as something achievable with the current Dovecot and other limitations. But even if you were to implement something that can handle 1 million or more sessions per server, would you want to? As in, if that server goes down, the resulting packet, authentication storm will be huge and most like result in a proverbial shit storm later. Having more than 10% or so of your customers on one machine and thus involved in an outage that you KNOW will hit you eventually strikes me as a bad idea. I'm not sure how the design below meshes with Timo's lofty goals and standards when it comes to security as well. And a push with the right people (clients) to support IMAP NOTIFY would of course reduce the number of sessions significantly. Finally, Dovecot in proxy mode already scales quite well. Christian> I think the current implementation of having a separate process manage each active IMAP session (w/ the possibility of moving idling sessions to a single hibernate process) will never be able to deploy a single server managing 10 Million IMAP sessions. > > But, would it be possible to implement a new IMAP server plugin that uses a fixed configurable pool of "worker" processes, much like NGINX or PHP-FPM does. These servers can probably scale to 10 Million TCP connections, if the server is carefully tuned and has enough cores/memory to support that many active sessions. > > I?m thinking that the new IMAP server could use some external database (e.g., Redis or Memcached) to save all the sessions state and have the "worker" processes poll the TCP sockets for new IMAP commands to process (fetching the session state from the external database when it has a command that is waiting on a response). The Dovecot IMAP proxies could even queue incoming commands to proxy many incoming requests to a smaller number of backend connections (like ProxySQL does for MySQL requests). That might allow each Dovecot proxy to support 10 Million IMAP sessions and a single backend could support multiple front end Dovecot proxies (to scale to 100 Million concurrent IMAP connections using 10 proxies for 100 Million connections and 1 backend server for 10 Million connections). > > Of course, the backend server may need to be beefy and have very fast NVMe SSDs for local storage, but changing the IMAP server to manage a pool of workers instead of requiring a process per active session, would allow bigger scale up and could save large sites a lot of money. > > Is this a good idea? Or, am I missing something? > > Kevin-- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/
> On 22 Feb 2017, at 6.12, Christian Balzer <chibi at gol.com> wrote: > > On Tue, 21 Feb 2017 09:49:39 -0500 KT Walrus wrote: > >> I just read this blog: https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/ <https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/> about scaling to 12 Million Concurrent Connections on a single server and it got me thinking. >> > > While that's a nice article, nothing in it was news to me or particular > complex when one does large scale stuff, like Ceph for example. > >> Would it be possible to scale Dovecot IMAP server to 10 Million IMAP sessions on a single server? >> > I'm sure Timo's answer will (or would, if he could be bothered) be along > the lines of: > "Sure, if you give me all your gold and then some for a complete rewrite > of, well, everything?.Well. The current bottleneck in achieving that would probably be the memory amount required. With 12M active sessions (non-hibernated) the memory requirement for that 12M active user single instance server would be huge. Approximately 10TB. If 12M active sessions is the target then the architecture of one user per imap process needs to be abandoned. Sami
> On Feb 21, 2017, at 11:12 PM, Christian Balzer <chibi at gol.com> wrote: > > On Tue, 21 Feb 2017 09:49:39 -0500 KT Walrus wrote: > >> I just read this blog: https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/ <https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/><https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/ <https://mrotaru.wordpress.com/2013/10/10/scaling-to-12-million-concurrent-connections-how-migratorydata-did-it/>> about scaling to 12 Million Concurrent Connections on a single server and it got me thinking. >> > > While that's a nice article, nothing in it was news to me or particular > complex when one does large scale stuff, like Ceph for example. > >> Would it be possible to scale Dovecot IMAP server to 10 Million IMAP sessions on a single server? >> > I'm sure Timo's answer will (or would, if he could be bothered) be along > the lines of: > "Sure, if you give me all your gold and then some for a complete rewrite > of, well, everything?.It will be a long time before I would need to scale to 10 Million users and I will be happy to pay for the rewrite of the IMAP plugin when the time comes, if not done before then by someone else. I have seen proposals for a new client protocol called JMAP that seem to be all about running a mail server at scale like an NGINX https web server can scale. That got me thinking about wether there is anything fundamental about IMAP that causes it to be difficult to scale. After looking into Dovecot?s current IMAP implementation, I think the approach was taken that fundamentally would have scaling issues (as in, one backend process per IMAP session). I see a couple years ago, work was done to ?migrate? idling IMAP sessions to a single process that ?remembers? the state of the IMAP session and can restore it back to a backend process when the idling is done. But, the only estimate that I have read about the ?migrate idling? is that you are likely to see only a 20% reduction of the number of concurrent processes you need if you are running at 50,000 IMAP sessions per mail server. 20% reduction is not nearly enough of a benefit for scale. I would need to see at least an order of magnitude improvement to scale (and hopefully, several orders of magnitude). So, in my mind, since these IMAP sessions are long lived with infrequent bursts of activity, a better approach would be to manage the session data in memory or in an external datastore and only process using the session data when there is activity. Much like Web Sockets and even HTTPS requests are handled today for installations that need to scale to support millions of active users. As for Dovecot, I would think the work done to ?migrate? idling IMAP sessions would be a good start to implementing managing a large number of sessions with a fixed pool of worker processes like other web servers do. So, my question really is: Is there anything about the IMAP protocol that would prevent an implementation from scaling to 10 Million users per server? Or, do we need to push for a new protocol like JMAP that has been designed to scale better (by being stateless with the server requests)? Kevin
> On Feb 21, 2017, at 11:12 PM, Christian Balzer <chibi at gol.com> wrote:> But even if you were to implement something that can handle 1 million or > more sessions per server, would you want to? > As in, if that server goes down, the resulting packet, authentication > storm will be huge and most like result in a proverbial shit storm later. > Having more than 10% or so of your customers on one machine and thus > involved in an outage that you KNOW will hit you eventually strikes me as > a bad idea.The idea would be to store session state in an external database like Redis. I use Redis for PHP session data on the web servers and Redis is implemented as a high-availability cluster (using Redis Sentinels). If the IMAP session state is maintained externally in a high-availability datastore, then rebooting a mail server or having it go down unexpectedly should not mean that all existing sessions are ?kicked? and the clients would need to log in again. Rather, a backup mail server or servers could take the load and just use the high-availability datastore to manage the sessions that were on the old server. One potential problem, if not using shared storage for the mailboxes, is that dovecot replication is asynchronous so a small number of IMAP sessions might be out of date with the data on the replacement server, so some of the data in Redis might need to be re-cached to reflect the state of the backup mailstore. Other than that, I don?t think there would be much of a "proverbial shit storm? caused by the failure of one mail server, even if that server were to handle 1 million or more sessions per server. The remaining mail servers in the cluster would need to be able to absorb the load (maybe cluster in 3 server clusters would be the norm so each remaining server would only have to be able to take 50% of the sessions from the failed server while it is unavailable). Kevin