Peter Dolding
2011-Jul-03 06:20 UTC
[Dovecot] Replication problem I have. And how I think I can get around problem.
I have two servers in two different locations. Neither what you would call 100 percent safe from being turned off. Most staff use web based email. This backs onto imap server. I do know I will have to deal with contact lists and other items. in that client. Worst part is the link between them may get broken so both servers may be receiving email and back active at the same time. I can see 1 very clear way around this problem. If I accept that the imap ids at each of the server will be that server only. Since users are mostly using webbased they are not going to notice. If they do notice because they have connected to a different domain address stiff bad luck. Now what I need to be able to deal with the problem. 1 a unique server id on each message for the server the message was received on. 1 a unique server receive id for each message as imap like id for service recieved messages for message received directly not synced. 1 logs for messages deleted and changes that are not current server server id. This should be pritty simple todo. Basically 1 log per server flushed when synced with the server it owns to. 1 a sync function that deletes and changes messages that have been deleted or changed at other locations and compares other servers current messages against copies retained at the mirroring location. Now with this. Each new message to each store gets the next imap id along with system wide unique combination of sever id and service receive id. No modifications of already recieved messages ids. Since I am not careing if the imap id are matched between servers. Fairly much able to use a custom for of imap syncing working off the server id's and server receive ids. Of course this solution should be fairly fault torrent. Since each server can directly store any message they receive. Also it should be possible to trace back to what server the message came from in case of spam problems or equal. Now my biggest problem can I attach my own custom attributes to incoming mail to the store and access that information effectively. Since with that information I will be able to do a form of live synced storage. Personally I see imap id design as a defect in protocal since it never allowed server id along with it this is why imap 100 percent synced message stores cannot be run on independent servers with unstable network connects well. Nothing comes without a price. This solution does not require clustered file systems or a constant active connection between servers so able to operate in areas of disruption. It does not block the servers from receiving emails at any time either. Problem is of course is if someone connects a client directly from 1 server to another. Mail will have to be re-downloaded. Also I have to check of z-push/ActiveSync depends on the imap ids being dependable across the network if not most hand held devices will not be a problem. If it don't then only imap has a problem. I can live with that. Ie if local email server down use web mail until local mail server is fixed. Old rule of networking 40 to 50 percent functional network staff can normally still get stuff done. 0 percent functional you have downtime. Now if I can get the email storage fault torrent and to remain operational in case of fault I then can focus on getting the web base applications using a equal system for contacts and other things. So each location can remain fairly operational no matter what. This way if a server disappeared for good only thing that would have to be changed is syncing. Wise move is to allow servers to be defined more than 1 server id. Ie a server gone for good remaining server gets told to take over responsibility for the old servers id messages. New replacement server given a new id and everything keeps on going nice. Anyone else with a better idea or advice how to add my own custom ids in a way they cannot be distrupted and are simple to search. If it works push for server id's in imap5? Ie imap5 clients being able to cope with the event that an email store is spreed between multi servers that may not be connected all the time. Peter Dolding
Ed W
2011-Jul-04 09:46 UTC
[Dovecot] Replication problem I have. And how I think I can get around problem.
On 03/07/2011 07:20, Peter Dolding wrote:> Now what I need to be able to deal with the problem.Have you considered a new Dovecot storage backend? I have plotted some designs on a napkin a few times.. Consider some kind of storage server with "eventually consistent" replication capabilities. This could be used for the metadata storage for all the emails (ie FROM, TO, DATE, SUBJECT and all the other non body parts you might search on) Your replication engine can now work in conjunction with Dovecot to sync changes between servers as quickly as possible, eg if desired implement a two phase commit when the LDA delivers new emails, so that all storage servers confirm they have received the new email. You could if you wish implement quorum support (ie some server which was offline for some mail deliveries proxies requests to another storage block until it's caught up syncing) You may or may not store the message bodies and attachments with the mail metadata. I can see performance arguments depending on what operations you do most commonly. A theoretically (but probably horrible in practice) idea might be to consider "DB query latency" vs transfer speed. The metadata needs to cover 99% of common searches and deliver results quickly - as such it needs to be "near" Dovecot and cover the main message headers, and perhaps also message body structure (list of parts, etc). Next tier down is probably satisfying full text searches of message bodies and supplying body parts (where the most common queries might be either just text/html parts (smart clients) or the entire body (common clients). Some kind of compressed blocking format for message bodies is probably most optimal and storing larger attachments separately would be an interesting way to increase cache hit ratios. I don't know whether Timo is interested in working such a project, but I for one would be interested to sponsor some work on robust async replication, perhaps there is some crossover with synchronous replication that you desire? I think this is an interesting area to develop. Cyrus has done some work on this stuff - I haven't followed it, but would be interesting to see what they have done? Good luck Ed W
Peter Dolding
2011-Jul-04 12:52 UTC
[Dovecot] Replication problem I have. And how I think I can get around problem.
> Mon, 04 Jul 2011 10:46:31 Ed W > On 03/07/2011 07:20, Peter Dolding wrote: > > Now what I need to be able to deal with the problem. > > Have you considered a new Dovecot storage backend? I have plotted some > designs on a napkin a few times.. Consider some kind of storage server > with "eventually consistent" replication capabilities. This could be > used for the metadata storage for all the emails (ie FROM, TO, DATE, > SUBJECT and all the other non body parts you might search on) >Remember I am new to the Dovecot source code. How to code a Dovecot Storage backend it where I might have to start. Really I don't care if the servers are ever fully consistent other than the fact they both contain the same emails. Of course read status shared would be nice. If that status information is out so be it. "eventually consistent" is a deadly thing to try to aim for. Its a simple fact of what needs to be backed up. If a user is at one physical location all the time and connecting to the same server all the time they will not ever see that the 2 servers are not 100 percent synced. For my usage cases lot of times 100 percent synced will be more wasted effort. Roughly synced will be more than suitable. Of course what I am talking about can be used as an foundation to get data from point a to point b.> > Your replication engine can now work in conjunction with Dovecot to sync > changes between servers as quickly as possible, eg if desired implement > a two phase commit when the LDA delivers new emails, so that all storage > servers confirm they have received the new email. You could if you wish > implement quorum support (ie some server which was offline for some mail > deliveries proxies requests to another storage block until it's caught > up syncing) >You are missing something here. My sync's due to issues maybe chaos. So both servers may have been running split from each other both received emails the other has not had users accessing them. Both with users in using the web and local email clients. While all that is going on the servers have to sync. This is very much the worst case. I am sane. I am prepared to give up the users ability to change client programs between servers to make it be able to work. So IMAP unique identifies not be replicated ever. Basically any identifier that cannot be based of a server id + a unique number for server be only unique to the server its on. Basically I cannot ensure quickly and most of the time I don't really need that. As long as the email without breakage is not lost in the server for greater than 15 min without reaching person wanting to read it that would be fine for normal operations. This is still faster than exchange working with pop email accounts. Basically Ed W. I am saying this is the worst I can get away with in a working office. I have gone through the protocals I would most likely ever need to use from a business point of view. pop3 I know basically does not really give a stuff if what is hiding behind is multi master or single master. User might receive a few extra emails if they change between servers. Nothing system killing. As long as the user gets the emails in the end that is the important bit if they a copy for each server bad luck at least they got the email. Too many copies is not a issue from business point of view not getting the copy in the first place is an issue. smtp messages directly from multi locations outgoing does not give a stuff. imap4 who ever invented this protocol for what I want todo I fell like strangling. The id system completely sends you to hell. Business point of view this might be a issue if uses change between servers due to having to download everything again. activesync/Z-push Ok nice. ID are 64 char in size with no defined contents. No ascending order no trouble basically. So nothing stopping me doing server_id:then unique number. So that is most mobile devices covered. Message might disappear when moving between not totally synced servers. Message will catch up as syncs do. So from business usage annoying but issue is not long lasting. Ie custom backend on Z-push and mobile devices will work mostly fine with chaos between the storage servers. MAPI is 64 chars to 512 chars. for ID's. Again nothing in protocol stopping server_id:then unique number. Web applications much of a muchness. Since either they will connect to the same server as the http server at each location so be protected or can be hooked onto a new protocol to know they are connecting to a multi server back end and detect if there are issues going on. Mostly as bad as activesync if you change between hosting locations email syncs might not have caught up with you so a message might disappear temp. So only imap4 is requiring syncing so ids are in order. Question does everyone need imap4 server locations to be interchangeable. I know I don't. Since most people out the office either use web mail or activesync. Only in office will be using imap4. Now I cannot see any fast sane way for imap4 for other than make it id only unique to each server and maybe create like imap 5 for multi server with the option to run locally as a imap5 to imap4 converter. Reason you ask server when it synced with X server and you find out that is older than the copies you have you don't replace. So the client is hiding server to server sync issues. Sometimes the best solution is look at everything and see what is limiting you and deprecate it. If we don't need id aligned we don't need real-time connection. We don't need as reliable of an network. So failures are less. I guess everyone who has tried to sync has attempted to battle the imap protocal. I am mostly the first strange person who has said stuff syncing imap and just go straight past its defect and make the storage work. Simple fault in imap design make the complete back end replication a pain in but when it should be so simple. I have decided of someone else wants imap to be synced let that be their problem. Simple each location stores what they receive. and forwards to all other locations as those locations come on line and the other locations do the reverse. I like the idea of natural event syncing. That the items pure naturally sync with each other. No reprocessing to force syncing due to least amount conflicting data created. Possible for pop MAPI and activesync/Z-push todo natural syncing. IMAP the evil thing just does not want to play ball with natural syncing and it wants to set up events of conflicting data. IMAP might force a real-time synced server in some cases. A master server handing out IMAP id numbers for new messages is one possibility so removing the need change id's on the mail after it is stored. If my web interfaces are using something other than IMAP I could possibly say use webmail until fixed. Worst issues I can see is message read status and other times like that. Where user might have marked a message unread at X on one server and at Y time marked it back read on a different server. But these are more niggles. Better to have the users annoyed by minor niggles and working than unable to work because its broken. Dependable replications the more parts you use the more areas you have for failure. So far my research on what is possible seams to be coming out to reasonable. Hopefully once I have this idea fully ironed out I can start getting into code. I rarely code. But this is a true case I have a problem so I have motivation. Peter Dolding