Greetings, I have some architectural questions regarding dovecot, and though I've half answered them by looking at the source, I'm also interested in hearing whether my (our) wishes and suggestions are already being considered (or can be considered, once built) for inclusion in dovecot itself. Let me first explain why I'm doing this. I work for XS4ALL, a fairly large ISP in the Netherlands. We provide a wide variety of services, including shell access, pop3, webmail, et cetera. We use Sendmail on several clusters of FreeBSD machines (loadbalanced using layer-4 ethernet switches) and several NetApp Filers (dedicated NFS servers with fail-safe disk-arrays and such) for backend. Several years ago (when we were a lot smaller) we noticed the typical use of the mailboxes included leaving much old email on the server, at least for a while, and that this is a bothersome thing when using mbox mailboxes. (The mboxes basically have to be copied over whenever the status of an email changes, leading to a lot of I/O.) We briefly played with modifying sendmail and the pop server to avoid the full copy in the common case (only status changes) by doing in-place edits of a pre-generated Status line, as well as avoid full scanning of the mbox file by creating special headers to mark the 'real' length of an email. It worked, for a while, but it wasn't going to scale very well. So we switched to maildir mailboxes for the mail spool. A modified mail.local (which we need for other reasons as well) delivers in /var/spool/mail/u/s/username, and mutt, uqwk, a modified pine and qmail's pop3 daemon read it from there. Until last week our clients could choose to have mbox inboxes, to use with 'elm' or 'mail', but we decided to discontinue that support. Our new shell servers, which are in test, don't have elm installed anymore anyway. We still have support for mbox mailboxes in a user's homedirectory though, by using procmail and such. So when we needed an IMAP server for use with our webmail (based on SquirrelMail), we were forced to go with the UW-IMAP server, with the maildir patch that's been scattered around the 'net. This worked, for a while; we also use the maildir patch with pine after all. However, the maildir patch is not very good. Not at all, even, and it only seems to work by pure chance. Pine works for the average user who does not get a lot of new mail while his pine is open or does not use procmail, and fortunately a lot of the people that do get a lot of email use mutt, which does work properly. The UW-IMAP server worked fine because SquirrelMail only uses (used) a small subset of the available functionality. But that's changing, as SquirrelMail gets actively developed, and we're also considering other IMAP-based services. But we can't switch to Courier or Cyrus because we need mbox support. And while looking for mbox patches for either of those two, I ran across dovecot. Yay! :) Dovecot is not everything we'd want, but it comes very close, and contrary to UW-IMAP both the design and the actual source code are clean, readable and logical, which means we can add the features we need and support them. What we need and want to add is fairly simple, but I've only been looking at dovecot since yesterday so I'd be happy to hear if any of it is possible, feasible, unwise or unacceptable. - First off, we need the maildir support to be 'correct' in that it does not rely on the naming of the files in the mailbox, other than the very loose specification DJB gives (doesn't contain a colon or slash and doesn't start with a dot.) The pine/UW-imap patch breaks here because it depends on the first part of the filename being time() or something else that, when sorted alphanumerically, puts new mail at the end. Our LDA does this, but procmail does not, and it shouldn't have to. - Second, we need the maildir support to be 'correct' in that it does not rely on the directory order being persistant. The NetApp Filers use btree-indexed directories, so the order of readdir() can change completely whenever a file is added or removed. The pine/uw-imap patch relies on the '.uidvalidity' file being modified whenever the maildir sort order changed, and this isn't happening. I *think*, from reading the sources, both of those are correct already. If they aren't, I'd strongly urge you to fix it, as #1 is a problem for anyone using procmail and #2 is a problem for anyone with 'indexed' directories (including such new filesystems as reiserfs, and I assume FreeBSD's new hashed directories.) - We need to avoid using fcntl(). The Netapps support it, but file-locking over NFS is very, very poorly designed and we've had too much problems of various kinds before, with fcntl. We also don't like the idea of having thousands of fcntl locks at the same time ;P Instead, we've switched to the locking method described in the Linux open(2) manpage under O_EXCL. (We call it 'dot-locking', I'm not sure where the name came from.) The actual implementation of that method is pretty simple, and I have a C version and a Python version hanging around here somewhere (the Python version is being used by GNU Mailman, last I looked.) If we're going to use dovecot, we will replace most, if not all, fcntl()s with dot-locking, the question is whether you want it contributed to dovecot :) - Every user's incoming mailbox is /var/spool/u/s/username. Other mailboxes are in /home/u/username/mail or /home/u/username/Mail (the second if the first does not exist.) We are not yet certain whether we want the inbox to be able to have subdir-mailboxes, as /var/spool and /home have different quotas and we urge people not to store their mail on /var/spool. (for one thing, it doesn't get backed-up.) We want these things to work without magical symlinks or empty files, because people _will_ delete them and cause unnecessary helpdesk calls :) Again, the question is mostly whether this is desirable in dovecot (or something enough like it to reduce local changes.) - We have over 300k mailboxes at the moment. We expect that number to keep growing. The indexer process (as described by design.txt) does not sound as a good idea in our case :) How necessary is it, really ? Especially since we do not expect more than 10% of those mailboxes to be actually used by IMAP, not even once. If disabling the indexer completely just means longer startup times for IMAP sessions, we can live with that. - The UW-IMAP maildir patch stores UID's in the indiviual filenames, using a 'U' flag. Will this interfere with dovecot ? We don't really need dovecot and UW-IMAP to share UIDs, but we would like to have an as painless transition as possible, without having to rename millions of files to remove the U flag and other flags :P It would also be nice to keep pine using the existing maildir patch, even though very few IMAP-users would use pine. - Would dovecot scale, architecturally speaking, to 500k+ active mailboxes ? The amount of hardware is not really an issue, we can add a lot of machines (off-the-shelve intel hadware) to each cluster, but if each dovecot process has to load in an index of all possible mailboxes... that would be a problem. Doing an inordinate number of file-accesses over NFS would also be a problem, but I haven't seen any indication of that in the source, yet. In case it wasn't clear yet, I'm very happy to have found dovecot. The lack of a decent mbox IMAP server has always dismayed me, let alone an mbox+maildir one :) I should also point out that even though XS4ALL is a commercial company, we would contribute our changes even if the licence didn't require it, and we want to contribute them back the way you want them, not necessarily the way it's easiest for us. We have a lot of experience with opensource software, as a simple google on my name should indicate ;P Regards, -- Thomas Wouters <thomas at xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Sat, 2002-10-19 at 15:38, Thomas Wouters wrote:> We briefly played with modifying sendmail and the pop server to avoid the > full copy in the common case (only status changes) by doing in-place edits > of a pre-generated Status line,UW-imapd does this as well, creating "X-Keywords: " line for each mail. I had thought about this first with dovecot too, but since mutt rewrote the whole mailbox always I figured I might as well. But with larger mailboxes this is really slow, so I think I'll support the X-keywords trick myself too.> as well as avoid full scanning of the mbox > file by creating special headers to mark the 'real' length of an email.For each mail? Content-Length? With my tests that didn't seem to help much, rather made it just slower.. Could be that I just did something badly, have to look into it more when I begin optimizing mbox handling more. Have to get it at least as fast as UW-imapd :)> We still have support for mbox mailboxes in a user's homedirectory though, > by using procmail and such. So when we needed an IMAP server for use with > our webmail (based on SquirrelMail), we were forced to go with the UW-IMAP > server, with the maildir patch that's been scattered around the 'net. ThisHm. Squirrelmail requires SORT extension which Dovecot doesn't support yet. Notes about SORT from CVS's TODO: - sort (draft-ietf-imapext-sort) - basically sorted SEARCH, requiring CHARSET support for UTF-8 and ASCII - we could create alternative binary tree file(s) for different sort conditions, ".tree-sort" or something. or if we decide to just keep it in memory, btree could still be best choice. - required by squirrelmail (webmail)> - First off, we need the maildir support to be 'correct' in that it does > not rely on the naming of the files in the mailbox, other than the very > loose specification DJB gives (doesn't contain a colon or slash and > doesn't start with a dot.) The pine/UW-imap patch breaks here because it > depends on the first part of the filename being time() or something else > that, when sorted alphanumerically, puts new mail at the end. Our > LDA does this, but procmail does not, and it shouldn't have to.Dovecot doesn't care as long as the file name stays same before the ':' character.> - Second, we need the maildir support to be 'correct' in that it does not > rely on the directory order being persistant. The NetApp Filers use > btree-indexed directories, so the order of readdir() can change > completely whenever a file is added or removed. The pine/uw-imap patch > relies on the '.uidvalidity' file being modified whenever the maildir sort > order changed, and this isn't happening.Dovecot reads them into hash so it doesn't depend on readdir() behaviour.> - We need to avoid using fcntl(). The Netapps support it, but file-locking > over NFS is very, very poorly designed and we've had too much problems of > various kinds before, with fcntl. We also don't like the idea of having > thousands of fcntl locks at the same time ;P Instead, we've switched to > the locking method described in the Linux open(2) manpage under O_EXCL. > (We call it 'dot-locking', I'm not sure where the name came from.)Hmm. The dot-lock means the "mbox.lock" file which gets created when someone wants it exclusively locked. Dovecot supports it, and maildir itself doesn't need locking at all. Dovecot's index files currently use fcntl()-locking, but it would be possible to replace them with lock files. Then there's modify log file. Dovecot uses fcntl() locking for it as a way to figure out if it's the only one using the log file. Like make everyone read-lock the file, then if someone wants to know if it's the only one using it it tries to set write-lock on, if it fails it knows someone else it using it as well. I'm not sure if there's any good way to replace that by using files, I had pretty complicated (desperate) plans before figuring out fcntl() could be used to do it easily. It would be possible to just assume that there's always someone else using the modify log, but each flag change or expunge would always write a few bytes to it then, and when log file is switched (there's .log and .log.2) it wouldn't be truncated after last process is finished with it which is not too bad since after the next switch it will be truncated. Also it would be possible not to use index files at all but just keep them in memory. I've been fixing code to make this possible and somewhat fast.> If we're going to > use dovecot, we will replace most, if not all, fcntl()s with dot-locking, > the question is whether you want it contributed to dovecot :)All locking goes through file_*_lock() or mbox_lock_*() functions. mbox locking supports it already, and file_*_lock() could be made to support it. It doesn't get currently file name but that could be done.> - Every user's incoming mailbox is /var/spool/u/s/username. Other mailboxes > are in /home/u/username/mail or /home/u/username/Mail (the second if the > first does not exist.) We are not yet certain whether we want the inbox > to be able to have subdir-mailboxes, as /var/spool and /home have > different quotas and we urge people not to store their mail on > /var/spool. (for one thing, it doesn't get backed-up.) We want these > things to work without magical symlinks or empty files, because people > _will_ delete them and cause unnecessary helpdesk calls :) Again, the > question is mostly whether this is desirable in dovecot (or something > enough like it to reduce local changes.)Are maildir inboxes also in /var/spool? With mbox sub-inboxes wouldn't be even possible because dir structure == mailbox structure, and since inbox file exists there can't be inbox-dir (except maybe with different case but that's kludgy). I've also thought I might as well make it possible to read the mbox inbox from /var/mail or whereever it is. Pretty easy to do, but .lock file is problematic if new files can't be added to the /var/mail directory.> - We have over 300k mailboxes at the moment. We expect that number to keep > growing. The indexer process (as described by design.txt) does not sound > as a good idea in our case :) How necessary is it, really ? Especially > since we do not expect more than 10% of those mailboxes to be actually > used by IMAP, not even once. If disabling the indexer completely just > means longer startup times for IMAP sessions, we can live with that.Indexer doesn't exist yet, and wouldn't be really needed even. I still think it could be somewhat nice idea, the system load is probably less during night so we could use the extra time to make mailboxes perform faster next day. It'd be difficult to know when exactly there is "extra time" which is why I haven't yet done the indexer. Probably needs some external program (script) which tells it by maybe looking at some I/O statistics from /proc or doing a few file operations and checking the latency. Am I right in that CPU usage still isn't any problem but rather the I/O?> - The UW-IMAP maildir patch stores UID's in the indiviual filenames, using > a 'U' flag. Will this interfere with dovecot ? We don't really need > dovecot and UW-IMAP to share UIDs, but we would like to have an as > painless transition as possible, without having to rename millions of > files to remove the U flag and other flags :P It would also be nice to > keep pine using the existing maildir patch, even though very few > IMAP-users would use pine.How exactly does the U flag work? I hope it's before the ':' character like Courier's S=filesize? Otherwise U=1234 would be thought of as 6 different flags which isn't very good since Dovecot reorders them as 1234=U.> - Would dovecot scale, architecturally speaking, to 500k+ active mailboxes ? > The amount of hardware is not really an issue, we can add a lot of > machines (off-the-shelve intel hadware) to each cluster, but if each > dovecot process has to load in an index of all possible mailboxes... that > would be a problem. Doing an inordinate number of file-accesses over NFS > would also be a problem, but I haven't seen any indication of that in the > source, yet.Dovecot opens the index when opening mailbox. It doesn't open other mailboxes indexes. Also the indexes should make the file accesses less than otherwise, especially with mbox since it wouldn't need to read and parse the whole mbox file. In general I've tried to keep the file I/O as little as possible. If your clusters access the files through NFS, there should be no problem. Except I've never tried Dovecot through NFS, and I'm not sure how well mmap()ing works through NFS. I know there's been problems before but hopefully they've been fixed already.
On Sat, Oct 19, 2002 at 12:42:25PM -0400, Charlie Brady wrote:> > So when we needed an IMAP server for use with our webmail (based on > > SquirrelMail), we were forced to go with the UW-IMAP server, with the > > maildir patch that's been scattered around the 'net. This worked, for a > > while; we also use the maildir patch with pine after all. However, the > > maildir patch is not very good. Not at all, even, and it only seems to > > work by pure chance.> The best one I've found is the patch last modified AFAICT by Miquel van > Smoorenburg, which has maildir filenames like: > time.pid.host,U=xxx,W=yyy:2,flagsWe use two different patches, both of which have a bit of Mike in them. One is for the IMAP server, which is an old version ('uw-imap-2000' or something like that comes to mind, but it came from the pine 4.10 source) from before the mailbox->append prototype changed (but with Mike's bilennium-patch.) A big problem with later patches was that the append method changed in the pine source, but not in the maildir patch. We considered using a newer uw-imap but decided that the current one works good enough ;P For pine we currently use the patch that comes with Debian's 'pine-tracker' package (which is an installer for pine with patches, since you aren't allowed to distributed modified binaries.) This patch also has some Mike in it, but I'm not sure howmuch, as at least parts of it seem to be backed out later. This patch works okay except for the two problems I noted in my original mail: it depends on directory order not to change except when '.uidvalidity' gets touched, and it depends on alphanumerical sort order of files matching chronological (or at least uid-based, which should be the same) sort order. The latter breaks with (standard) procmail, the former occasionally with btree and (presumably) hashed directory indices.> The storage of RFC822.SIZE, aka the on-the-wire size, in the filename > makes a very big difference to performance.That's interesting. I'll keep that in mind for when we begin to see performance issues with UW-IMAP. (I'm hoping I never have to look at the pine source again, though.)> But even this patch (from, e.g. > http://www.star.le.ac.uk/~tjg/misc/uw_imap-2001a_maildir-02.patch) has a > number of bugs. I've fixed a few of them. You can find a source RPM at > ftp://ftp.e-smith.org/pub/e-smith/dev/5.6dev/SRPMS/. The fixes are:<snip fixes> I couldn't find the source RPM for pine or (uw-)imap in that directory, but it doesn't sound like your changes solve our fundamental problems. It's not that big a deal, I think we've decided internally (I know _I_ have :) that we can't offer real IMAP services based on UW-IMAP; we'd sooner go for Cyrus or Courier, even if it does mean disallowing mboxes with IMAP. But dovecot is an even better alternative, once it has all the features we need :) -- Thomas Wouters <thomas at xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!