Hi, often I run into the situation where a dovecot server goes down for maintenance, and all users get concentrated in the remaining dovecot server (considering I have 2 dovecot servers only). When that dovecot server comes back online, director server will send new users to it, but the dovecot server that was up all the time will still have tons of clients mapped to it. I suggest the director servers to always try to balance load between servers, in the way: - if a server has several more connections than other, mark it to re-balance - when a user connected to this loaded server disconnects, map it to another server (that is per definition not the same server) immediately. that way it would gracefully re-balance, not killing existing connections, just waiting for them to finish. Thank you for your time. Webert Lima MAV Tecnologia Belo Horizonte, Brasil.
On 20 Apr 2017, at 17.35, Webert de Souza Lima <webert.boss at gmail.com> wrote:> > Hi, > > often I run into the situation where a dovecot server goes down for > maintenance, and all users get concentrated in the remaining dovecot server > (considering I have 2 dovecot servers only). > > When that dovecot server comes back online, director server will send new > users to it, but the dovecot server that was up all the time will still > have tons of clients mapped to it. > > I suggest the director servers to always try to balance load between > servers, in the way: > > - if a server has several more connections than other, mark it to > re-balance > - when a user connected to this loaded server disconnects, map it to > another server (that is per definition not the same server) immediately. > > that way it would gracefully re-balance, not killing existing connections, > just waiting for them to finish.You could effectively do this by shrinking the director_user_expire time. But if it's too low, it causes director to be a bit more inefficient when assigning users to backends. Also if backends are doing any background work (e.g. full text search indexing) director might move the user away too early. But setting it to e.g. 5 minutes would likely help a lot. There's of course also the doveadm director flush, which can be used to move users between backends, but that requires killing the connections for now. I've some future plans to make it possible to move connections between backends without disconnecting the IMAP client.
Shrinking director_user_expire might be a workaround but not as good as a solution, as also the user can end up mapped to the same server again. Director flush is both manual and aggressive, so not a good solution too. The possibility to move users between backends without killing existing connections is a good solution, yes! It can be scripted. =] What I suggested was more automated, but that can be left for a future future. If you have a command to be manually issued like: "doveadm director rebalance" it would be great. Thanks for your feedback. Att, Webert de Souza Lima MAV Tecnologia. On Fri, Apr 21, 2017 at 4:52 AM, Timo Sirainen <tss at iki.fi> wrote:> On 20 Apr 2017, at 17.35, Webert de Souza Lima <webert.boss at gmail.com> > wrote: > > > > Hi, > > > > often I run into the situation where a dovecot server goes down for > > maintenance, and all users get concentrated in the remaining dovecot > server > > (considering I have 2 dovecot servers only). > > > > When that dovecot server comes back online, director server will send new > > users to it, but the dovecot server that was up all the time will still > > have tons of clients mapped to it. > > > > I suggest the director servers to always try to balance load between > > servers, in the way: > > > > - if a server has several more connections than other, mark it to > > re-balance > > - when a user connected to this loaded server disconnects, map it to > > another server (that is per definition not the same server) immediately. > > > > that way it would gracefully re-balance, not killing existing > connections, > > just waiting for them to finish. > > You could effectively do this by shrinking the director_user_expire time. > But if it's too low, it causes director to be a bit more inefficient when > assigning users to backends. Also if backends are doing any background work > (e.g. full text search indexing) director might move the user away too > early. But setting it to e.g. 5 minutes would likely help a lot. > > There's of course also the doveadm director flush, which can be used to > move users between backends, but that requires killing the connections for > now. I've some future plans to make it possible to move connections between > backends without disconnecting the IMAP client. >