thr3ads.net - dovecot - [Dovecot] A new director service in v2.0 for NFS installations [May 2010]

If this information is useful, please help other people find it:
Share via:

Timo Sirainen

2010-May-19 08:51 UTC

[Dovecot] A new director service in v2.0 for NFS installations

As http://wiki.dovecot.org/NFS describes, the main problem with NFS has always
been caching problems. One NFS client changes two files, but another NFS client
sees only one of the changes, which Dovecot then assumes is caused by
corruption.

The recommended solution has always been to redirect the same user to only a
single server at the same time. User doesn't have to be permanently assigned
there, but as long as a server has some of user's files cached, it should be
the only server accessing the user's mailbox. Recently I was thinking about
a way to make this possible with an SQL database:
http://dovecot.org/list/dovecot/2010-January/046112.html

The company here in Italy didn't really like such idea, so I thought about
making it more transparent and simpler to manage. The result is a new
"director" service, which does basically the same thing, except
without SQL database. The idea is that your load balancer can redirect
connections to one or more Dovecot proxies, which internally then figure out
where the user should go. So the proxies act kind of like a secondary load
balancer layer.

When a connection from a newly seen user arrives, it gets assigned to a mail
server according to a function:

  host = vhosts[ md5(username) mod vhosts_count ]

This way all of the proxies assign the same user to the same host without having
to talk to each others. The vhosts[] is basically an array of hosts, except each
host is initially listed there 100 times (vhost count=100). This vhost count can
then be increased or decreased as necessary to change the host's load,
probably automatically in future.

The problem is then of course that if (v)hosts are added or removed, the above
function will return a different host than was previously used for the same
user. That's why there is also an in-memory database that keeps track of
username -> (hostname, timestamp) mappings. Every new connection from user
refreshes the timestamp. Also existing connections refresh the timestamp every n
minutes. Once all connections are gone, the timestamp expires and the user is
removed from database.

The final problem then is how multiple proxies synchronize their state. The
proxies connect to each others forming a connection ring. For example with 4
proxies the connections would go like A -> B -> C -> A. Each time a
user is added/refreshed, a notification is sent to both directions in the ring
(e.g. B sends to A and C), which in turn forward it until it reaches a server
that has already seen it. This way if a proxy dies (or just hangs for a few
seconds), the other proxies still get the changes without waiting for it to
timeout. Host changes are replicated in the same way.

It's possible that two connections from a user arrive to different proxies
while (v)hosts are being added/removed. It's also possible that only one of
the proxies has seen the host change. So the proxies could redirect users to
different servers during that time. This can be prevented by doing a ring-wide
sync, during which all proxies delay assigning hostnames to new users. This
delay shouldn't be too bad because a) they should happen rarely, b) it
should be over quickly, c) users already in database can still be redirected
during the sync.

The main complexity here comes from how to handle proxy server failures in
different situations. Those are less interesting to describe and I haven't
yet implemented all of it, so let's just assume that in future it all works
perfectly. :) I was also thinking about writing a test program to simulate the
director failures to make sure it all works.

Finally, there are the doveadm commands that can be used to:

1) List the director status:
# doveadm director status
mail server ip	vhosts	users
11.22.3.44		100		1312
12.33.4.55		50		1424

1) Add a new mail server (defaults are in dovecot.conf):
# doveadm director add 1.2.3.4

2) Change a mail server's vhost count to alter its connection count (also
works during adding):
# doveadm director add 1.2.3.4 50

3) Remove a mail server completely (because it's down):
# doveadm director remove 1.2.3.4

If you want to slowly get users away from a specific server, you can assign its
vhost count to 0 and wait for its user count to drop to zero. If the server is
still working while "doveadm director remove" is called, new
connections from the users in that server are going to other servers while the
old ones are still being handled.

Timo Sirainen

2010-May-19 14:56 UTC

head link

[Dovecot] A new director service in v2.0 for NFS installations

On 19.5.2010, at 13.09, Cor Bosman wrote:
> I guess one of my first questions is, not just how to handle failure of
proxies, but also failure of whatever server the proxy sends you to. We've
talked about that before, that the proxy could for instance fall back to itself
as the 'final server'.
The people in here want to use their existing monitoring script that would do
"ssh host doveadm director remove 1.2.3.4". But on top of that, yeah,
maybe proxy itself could tell director that some host is down and retry the
lookup. Although determining when it's down might not be fully reliable
always, so I'm a bit worried that proxy could sometimes fail to connect to a
mail server even if it's up (but e.g. with high load, temporary small
network problem, etc).

luben karavelov

2010-May-19 15:16 UTC

head link

[Dovecot] A new director service in v2.0 for NFS installations

On Wed, 19 May 2010 10:51:06 +0200, Timo Sirainen <tss at iki.fi>
wrote:> The company here in Italy didn't really like such idea, so I thought
about> making it more transparent and simpler to manage. The result is a new
> "director" service, which does basically the same thing, except
without
SQL> database. The idea is that your load balancer can redirect connections
to> one or more Dovecot proxies, which internally then figure out where the
> user should go. So the proxies act kind of like a secondary load
balancer> layer.
As I understand, the first load balancer is just IP balancer, not
POP3/IMAP balancer, isn't it?
 > When a connection from a newly seen user arrives, it gets assigned to a
> mail server according to a function:
> 
>   host = vhosts[ md5(username) mod vhosts_count ]
> 
> This way all of the proxies assign the same user to the same host
without> having to talk to each others. The vhosts[] is basically an array of
hosts,> except each host is initially listed there 100 times (vhost count=100).
> This vhost count can then be increased or decreased as necessary to
change> the host's load, probably automatically in future.
> 
> The problem is then of course that if (v)hosts are added or removed, the
> above function will return a different host than was previously used for
> the same user. That's why there is also an in-memory database that
keeps
> track of username -> (hostname, timestamp) mappings. Every new
connection> from user refreshes the timestamp. Also existing connections refresh the
> timestamp every n minutes. Once all connections are gone, the timestamp
> expires and the user is removed from database.
> 
I have implemented similar scheme here with imap/pop3 proxy (nginx) in 
front of dovecot servers. What i have found to work best (for my
conditions)
as hashing scheme  is some sort of weighted constant hash. 
Here is the algorithm I use:

On init, server add or server remove you initialize a ring:

1. For every server:
   - seed the random number generator with the crc32(IP of the server)
   - get N random numbers (where N = server weight) and put them in an 
     array. Put randon_number => IP in another map/hash structure.
2. Sort the array. This is the ring.

For every redirect request:

1. get crc32 number of the mailbox
2. traverse the ring until you  find a number that is bigger than 
   the crc32 number and was not yet visited. 
3. mark that number as visited.
4. lookup if it is already marked dead. If it was marked goto 2.
5. lookup the number in the map/hash and you find the IP of the server. 
6. redirect the client to that server
7. If that server is not responding, you mark it as dead and goto 2.

In this way you do not need to synchronize a state between balancers 
and proxies. If you add or remove servers very few clients get 
reallocated - num active clients/num servers. If one server is not 
responding, the clients that should be directed to it are redirected 
to one and a same other server without a need to sync states between 
servers.

This scheme has some disadvantages also - on certain circumstances, 
different sessions to one mailbox could be handled by different
servers in parallel. My tests showed that this causes some 
performance degradation but no index corruptions here (using OCFS2, 
not NFS). 

So my choice was to trade correctness (no parallel sessions to 
different servers) for simplicity (no state synchronization between 
servers).
> 
> Finally, there are the doveadm commands that can be used to:
> 
> 1) List the director status:
> # doveadm director status
> mail server ip	vhosts	users
> 11.22.3.44		100		1312
> 12.33.4.55		50		1424
> 
> 1) Add a new mail server (defaults are in dovecot.conf):
> # doveadm director add 1.2.3.4
> 
> 2) Change a mail server's vhost count to alter its connection count
(also> works during adding):
> # doveadm director add 1.2.3.4 50
> 
> 3) Remove a mail server completely (because it's down):
> # doveadm director remove 1.2.3.4
> 
> If you want to slowly get users away from a specific server, you can
> assign its vhost count to 0 and wait for its user count to drop to zero.
If> the server is still working while "doveadm director remove" is
called,
new> connections from the users in that server are going to other servers
while> the old ones are still being handled.
This is nice admin interface.
Also, I have a question. Your implementation, what kind of sessions does 
it balance? I suppose imap/pop3. Is there a plan for similar redirecting
of LMTP connections based on delivery address?

Best regards and thanks for the great work
luben

Brad Davidson

2010-May-25 19:32 UTC

head link

[Dovecot] A new director service in v2.0 for NFS installations

Timo,
> -----Original Message-----
> From: dovecot-bounces+brandond=uoregon.edu at dovecot.org
[mailto:dovecot-> 
> The company here in Italy didn't really like such idea, so I thought
about> making it more transparent and simpler to manage. The result is a new
> "director" service, which does basically the same thing, except
without SQL> database. The idea is that your load balancer can redirect connections
to> one or more Dovecot proxies, which internally then figure out where
the> user should go. So the proxies act kind of like a secondary load
balancer> layer.
This looks very cool! We run a basic two-site active-active
configuration with 6 Dovecot hosts in either location, with an
Active/Standby load balancer cluster in front, and a cluster of
geographically distributed NFS servers in the back. I'm sure I've
described it before. We'd like to keep failover as simple as possible
while also avoiding single points of failure. I have some questions
about the suggested configuration, as well as the current
implementation.

* Does this work for POP3 as well as IMAP?

* Is there any reason not to use all 12 of our servers as proxies as
well as mailbox servers, and let the director communication route
connections to the appropriate endpoint?

* Does putting a host into 'directed proxy' mode prevent it from
servicing local mailbox requests?

* How is initial synchronization handled? If a new host is added, is it
sent a full copy of the user->host mapping database?

* What would you think about using multicast for the notifications
instead of a ring structure? If we did set up all 12 hosts in a ring, it
would be conceivable that a site failure plus failure of a single host
at the surviving site would segment the ring. Multicast would prevent
this, as well as (conceivably) simplifying dynamic resizing of the pool.

Thanks!

-Brad

Oliver Eales

2010-Jun-16 13:19 UTC

head link

[Dovecot] A new director service in v2.0 for NFS installations

Am 19.05.2010 10:51, schrieb Timo Sirainen:> As http://wiki.dovecot.org/NFS describes, the main problem with NFS has
always been caching problems. One NFS client changes two files, but another NFS
client sees only one of the changes, which Dovecot then assumes is caused by
corruption.
>
>    host = vhosts[ md5(username) mod vhosts_count ]
>    
Hello Timo,
i am currently playing around with the new director service and i am 
really looking forward for it in 2.0
Wouldn't it be better to use a consistent hash function instead of the 
md5 ?
So that you would only get a new assignment of users belonging to the 
failed server and not a "complete remapping".
With this setup it might be possible to store local indexes in a NFS 
Backend setting, as the users stay kind of sticky to their server.
And there would also be no need for the distribution of the currently 
active mappings within the ring. Maybe only for the state for the servers.

Regards,
Oliver

Possibly Parallel Threads

Search for more seemingly similar threads

dovecot - May 2010 - A new director service in v2.0 for NFS installations

[Dovecot] A new director service in v2.0 for NFS installations

[Dovecot] A new director service in v2.0 for NFS installations

[Dovecot] A new director service in v2.0 for NFS installations

[Dovecot] A new director service in v2.0 for NFS installations

[Dovecot] A new director service in v2.0 for NFS installations

Possibly Parallel Threads