thr3ads.net - dovecot - [Dovecot] New index / mailbox API [Nov 2003]

If this information is useful, please help other people find it:
Share via:

Timo Sirainen

2003-Nov-22 20:59 UTC

[Dovecot] New index / mailbox API

Today morning I had an idea how to make indexes more scalable with
multiple concurrent writers. As a side effect it also made the locking
issues much more simpler. So I thought I'd go and rewrite all the
indexing code, it had gotten way too ugly and difficult to understand
and maintain.

I also thought to separate the index handling completely from mailbox
handling. So there would be lib-index and lib-mailbox. lib-storage would
then tie them together.

Mailbox API still feels somewhat dirty though.. I didn't want it to
depend on lib-index and I didn't want to duplicate any lib-index
functionality in it to avoid uselessly allocating memory, so I can't
reference to mails directly with their sequence number or UID. So I'm
passing (void *id, size_t id_len) around - with maildir that would be
the filename and with mbox it would be current offset in file.

Suggestions welcome about both APIs.

Some of the changes in indexes and it's API:

- "modify log" -> "transaction log" which was the big
change. All flag
updates and expunges are first written to transaction log and only when
committing it they will be moved into the main index.

The transaction writes require the file locked for only very short time,
so with lots of concurrent writers it means that several sessions could
have written their wanted changes to log file, but they're all blocking
on getting them moved into main index. When one session finally gets a
lock in the main lock file, it writes everyone's changes into it. If
lock can't be acquired in a second or so, the main index will be
rewritten into temp file and rename()d over.

This means that we can have tons of sessions reading and writing to same
shared index file and no session can block others from reading or
writing. Currently yoou could easily block changes for a very long time
with eg. FETCH 1:* BODY.PEEK[] and reading it 1 bytes/sec.

- mail_index will be a class interface, so it's implementation can be
changed at runtime. In-memory indexing is going to be a totally
different implementation this time with better memory usage. Could be
useful for optimizing some special cases as well.

- You'll get "views" into mailbox. A view will take care of
keeping
sequences synchronized according to what each client session thinks they
are. Currently lib-storage does that in kind of ugly way. Hopefully this
means that we can get rid of sequence parsing in lib-storage completely
and move it into imap-only code.

- You don't directly manipulate with locks and there's no rules of in
which order you have to do something. You get views and transactions and
use them.

- You can have multiple views and transactions, so one opened index
could be used to handle multiple client sessions. I'm not sure if that
will ever be useful though :)

- Error correction will be done automatically this time. Whenever some
error is noticed, it will be fixed immediately and unless it changes
syncing state, it doesn't force client to disconnect. Also it hopefully
will be possible to change from disk-indexing to memory-indexing on the
fly in case you run out of disk space.

- I'm trying to get it NFS-safe..

- mail-cache that I recently rewrote doesn't change. The bad thing is
that it seems to be broken in some way and probably needs rewriting
anyway.. And I'm not sure if the problem is just in the code, or if the
design itself is somehow broken with it. I'm fearing that memory doesn't
somehow keep up with changes by other sessions, since we're not locking
cache file for reading..

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mail-index.h
Type: text/x-c-header
Size: 8103 bytes
Desc: not available
URL:
<http://dovecot.org/pipermail/dovecot/attachments/20031122/3091859e/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mailbox-driver.h
Type: text/x-c-header
Size: 3102 bytes
Desc: not available
URL:
<http://dovecot.org/pipermail/dovecot/attachments/20031122/3091859e/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL:
<http://dovecot.org/pipermail/dovecot/attachments/20031122/3091859e/attachment-0005.bin>

DINH Viet Hoa

2003-Nov-22 23:22 UTC

head link

[Dovecot] New index / mailbox API

Timo Sirainen wrote :
> with maildir that would be
> the filename and with mbox it would be current offset in file.
Is it a good idea to use the current offset of a message in a file since 
there are (several) programs that rewrite entirely mailboxes ?
I find it rather dirty to do this.

In UW-IMAP, an UID is added into the message headers.

-- 
DINH V. Hoa,

"Tu as lu combien de bandes dessin?es ce mois-ci ? 13 Go"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL:
<http://dovecot.org/pipermail/dovecot/attachments/20031123/726b1d6e/attachment-0001.bin>

Farkas Levente

2003-Nov-24 12:00 UTC

head link

speed Re: [Dovecot] New index / mailbox API

hi,
it's an interesting read, but go back to a higher level for a moment.
I like dovecot since it's use simple files as mail storage (not like 
cyrus). in case of dovecot if something happend I can easily switch to 
another imap server, which is one of the most important feature in a 
production enviroment (there should have to be a way of escape). other 
important feature is the speed (when we have a few hundred of 
mailboxes). some kind of indexing seems to be a good way of to do this. 
but I always feel that dovecot reinvent the wheel, since there is a 
dozen of database system which has nothing else to do just indexing (ok 
it's not true, but..). they probably do it in the right way (or at least 
we can find some) and they has a few years of experience. they do right 
the indexing, the locking, the transactions, etc..
so why we not use one realy fast and good database engine to index our 
mail storage?
the only reason what I can accept in this case, that this is some very 
special type of database and dovecot can use such algorithm which suited 
to this problem better then a general indexing algorithms. is this true?

another thing which always come to my mind when think about speed:
why we do the indexing when we look into the folders, since IMHO it'd be 
more efficient if we do it at the mail delivery time. the mail arrival 
is more balanced during the time, so the system load is more balanced. 
so there can be
- one optional deivery helper application which can do the indexing 
during the deivery time,
- indexing during remove, move, copy etc. imap operations,
- and the current (eg, the new indexing engine) if someone do not use 
the delivery helper apps.

just my 2c.

-- 
   Levente                               "Si vis pacem para bellum!"

Farkas Levente

2003-Nov-24 12:19 UTC

head link

speed Re: [Dovecot] New index / mailbox API

hi,
it's an interesting read, but go back to a higher level for a moment.
I like dovecot since it's use simple files as mail storage (not like
cyrus). in case of dovecot if something happend I can easily switch to
another imap server, which is one of the most important feature in a
production enviroment (there should have to be a way of escape). other
important feature is the speed (when we have a few hundred of
mailboxes). some kind of indexing seems to be a good way of to do this.
but I always feel that dovecot reinvent the wheel, since there is a
dozen of database system which has nothing else to do just indexing (ok
it's not true, but..). they probably do it in the right way (or at least
we can find some) and they has a few years of experience. they do right
the indexing, the locking, the transactions, etc..
so why we not use one realy fast and good database engine to index our
mail storage?
the only reason what I can accept in this case, that this is some very
special type of database and dovecot can use such algorithm which suited
to this problem better then a general indexing algorithms. is this true?

another thing which always come to my mind when think about speed:
why we do the indexing when we look into the folders, since IMHO it'd be
more efficient if we do it at the mail delivery time. the mail arrival
is more balanced during the time, so the system load is more balanced.
so there can be
- one optional deivery helper application which can do the indexing
during the deivery time,
- indexing during remove, move, copy etc. imap operations,
- and the current (eg, the new indexing engine) if someone do not use
the delivery helper apps.

just my 2c.

-- 
   Levente                               "Si vis pacem para bellum!"

Reasonably Related Threads

Search for more seemingly similar threads

dovecot - Nov 2003 - New index / mailbox API

[Dovecot] New index / mailbox API

[Dovecot] New index / mailbox API

speed Re: [Dovecot] New index / mailbox API

speed Re: [Dovecot] New index / mailbox API

Reasonably Related Threads