thr3ads.net - dovecot - [Dovecot] dsync redesign [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Timo Sirainen

2012-Mar-23 21:25 UTC

[Dovecot] dsync redesign

In case anyone is interested in reading (and maybe helping!) with a dsync
redesign that's intended to fix all of its current problems, here are some
possibly incoherent ramblings about it:

dovecot.org/tmp/dsync-redesign.txt

and even if you don't understand that, here's another document
disguising as an algorithm class problem :) If anyone has thoughts on how to
solve it, would be great:

dovecot.org/tmp/dsync-redesign-problem.txt

It only deals with saving new messages, not expunges/flag changes/etc, but those
should be much simpler.

Attila Nagy

2012-Mar-24 07:19 UTC

head link

[Dovecot] dsync redesign

On 03/23/12 22:25, Timo Sirainen wrote:> In case anyone is interested in reading (and maybe helping!) with a dsync
redesign that's intended to fix all of its current problems, here are some
possibly incoherent ramblings about it:
>
> dovecot.org/tmp/dsync-redesign.txt
>
> and even if you don't understand that, here's another document
disguising as an algorithm class problem :) If anyone has thoughts on how to
solve it, would be great:
>
> dovecot.org/tmp/dsync-redesign-problem.txt
>
> It only deals with saving new messages, not expunges/flag changes/etc, but
those should be much simpler.
>Well, dsync is a very useful tool, but with continuous replication it 
tries to solve a problem which should be handled -at least partially- 
elsewhere. Storing stuff in plain file systems and duplicating them to 
another one just doesn't scale.

I personally think that Dovecot could gain much more if the amount of 
work going into fixing or improving dsync would go into making Dovecot 
to (be able of) use a high scale, distributed storage backend.
I know it's much harder, because there are several major differences 
compared to the "low latency" and consistency problem free local file 
systems, but its fruits are also sweeter for the long term. :)

It would bring Dovecot into the class of open source mail servers where 
there are currently no contenders.

BTW, for the previous question in this topic (are there any nosql dbs 
supporting application-level conflict resolution?), there are similar 
solutions (like CouchDB, but having some experiences with it, I wouldn't 
recommend it for massive mail storage -at least the plain CouchDB 
product), but I guess you would be better off with designing a schema 
which doesn't need it at the first time.
For example, messages are immutable, so you won't face this issue in 
this area.
And for metadata, maybe the solution is not to store "digested" 
snapshots of the current metadata (folders, flags, message links for 
folders etc), but to store the changes happening on the user's mailbox 
and occasionally aggregate them into a last known good and consistent state.
Also, there are other interesting ideas, maybe with real single instance 
store (splitting mime parts? Storing attachments in plain binary form? 
This always brings up the question of whether the mail server should 
modify the mails, can be pretty bad for encrypted/signed stuff).

And of course there is always the problem of designing a good, 
consistent method which is also efficient.

Michescu Andrei

2012-Mar-26 22:14 UTC

head link

[Dovecot] dsync redesign

Hello Timo,

Thank you very much for planning a redesign of the dsycn and for opening
this discussion.

As I can see from the replies that came until now everybody misses the
main point of IMAP: IMAP has been designed to work as a disconnected,
high-latency data store.

To make this more clear: once and IMAP client finishes the synchronization
with the server, both have client and server have a consistent state of
the mailbox. After this both the "client" and the "server"
act like master
for their own local copy (on the "server" new emails get created etc,
on
the "client" existing emails get changed (flags) and moved, and new
emails
appear (sent items)).

So the protocol is designed, originally, to handle the master-master
replication. And as this it make sense a deployment global-wide, where
servers work independently and from time to time they "merge" the
changes.

This being said and acknowledged here are my 2 cents:

I think that the current '1 brain / 2 workers' seems to be the correct
model. The "the client" connects to the "server" and pushes
the local
changes and after retrieves the updated/new items from the "server".
"The
brain" considers first server as the "local storage" and the
second server
as "server storage".

For the split design, "come to the same conclusion of the state" is
very
race-condition prone.

As long as the algorithm is kept as you described it in the original
document then the backups should really be incremental (because you only
do the changes since last sync).

As the most changes are "metadata-only" the sync can be pretty fast by
merging indexes.

Thank you,
Andrei

> In case anyone is interested in reading (and maybe helping!) with a dsync
> redesign that's intended to fix all of its current problems, here are
some
> possibly incoherent ramblings about it:
>
> dovecot.org/tmp/dsync-redesign.txt
>
> and even if you don't understand that, here's another document
disguising
> as an algorithm class problem :) If anyone has thoughts on how to solve
> it, would be great:
>
> dovecot.org/tmp/dsync-redesign-problem.txt
>
> It only deals with saving new messages, not expunges/flag changes/etc, but
> those should be much simpler.
>
>
> !DSPAM:4f6cea4c260302917022693!
>
>

Micah Anderson

2012-Mar-27 15:47 UTC

head link

[Dovecot] dsync redesign

Timo Sirainen <tss at iki.fi> writes:
> In case anyone is interested in reading (and maybe helping!) with a dsync
redesign that's intended to fix all of its current problems, here are some
possibly incoherent ramblings about it:
thank you for opening this discussion about dsync!

besides the problems I've encountered with dsync, there are a couple
things that I think would be great to build into the new vision of the
protocol. 

One would be the ability to perform *intelligent* incremental/rotated
backups. I can do this now by running a dsync backup operation and then
doing manual hardlinking or moving of the backup directories (daily.1,
daily.2, weekly.1, monthly.1, etc.), but it would be more intelligent if
this were baked into the backup process.

Secondly, being able to filter out mailboxes could result in much more
efficient syncing. Now there is the capability to operate on only
specific mailboxes, but this doesn't scale well when I am trying to
backup thousands of users and I want to omit the Spam and Trash folders
from the sync. I would have to get a mailbox list of each user, and then
iterate over each mailbox for each user, skipping the Spam and Trash
folders, forking a new 'dsync backup' for each of their mailboxes, for
each user.

Lastly, there isn't a good method for restoring backups. I can reverse
the backup process, onto the user's "live" mailbox, but that
brings the
user into an undesirable state (eg. their mailbox state one day
ago). Better would be if their backup could be restored in such a way
that the user can resolve the missing pieces manually, as they know
best.

thanks again for your work on this, from my position dovecot is an
amazing piece of software, the only part that seems to have some issues
is dsync and I applaud the effort to redesign to fix things!

micah

Timo Sirainen

2012-Mar-28 22:43 UTC

head link

[Dovecot] dsync redesign

On 23.3.2012, at 23.25, Timo Sirainen wrote:
> and even if you don't understand that, here's another document
disguising as an algorithm class problem :) If anyone has thoughts on how to
solve it, would be great:
> 
> dovecot.org/tmp/dsync-redesign-problem.txt
> 
> It only deals with saving new messages, not expunges/flag changes/etc, but
those should be much simpler.
Step #3 was more difficult than I first realized. I spent last two days figuring
out a way to make it work, and looks like I finally did. I didn't update the
document yet, but I wrote a test program: dovecot.org/tmp/test-dsync.c

Step #2 should be easy enough.

Step #4 I think I'll forget about and just implement a per-mailbox dsync
lock. The main reason I wanted to get rid of locks was because a per-user lock
can't work with shared mailboxes. But a per-mailbox lock is okay enough.
Note that #3 allows the two dsyncs to run in parallel and send duplicate
changes, just not modifying the same mailbox at the same time (which would
duplicate mails due to two transactions adding the same mails).

Reasonably Related Threads

Search for more maybe matching threads

dovecot - Mar 2012 - dsync redesign

[Dovecot] dsync redesign

[Dovecot] dsync redesign

[Dovecot] dsync redesign

[Dovecot] dsync redesign

[Dovecot] dsync redesign

Reasonably Related Threads