thr3ads.net - dovecot - librmb: Mail storage on RADOS with Dovecot [Sep 2017]

If this information is useful, please help other people find it:
Share via:

2017-Sep-22 11:18 UTC

librmb: Mail storage on RADOS with Dovecot

Hi ceph-ers,

The email below was posted on the ceph mailinglist yesterday by Wido den 
Hollander. I guess this could be interesting for user here as well.

MJ

-------- Forwarded Message --------
Subject: [ceph-users] librmb: Mail storage on RADOS with Dovecot
Date: Thu, 21 Sep 2017 10:40:03 +0200 (CEST)
From: Wido den Hollander <wido at 42on.com>
To: ceph-users at ceph.com

Hi,

A tracker issue has been out there for a while:
http://tracker.ceph.com/issues/12430

Storing e-mail in RADOS with Dovecot, the IMAP/POP3/LDA server with a 
huge marketshare.

It took a while, but last year Deutsche Telekom took on the heavy work 
and started a project to develop librmb: LibRadosMailBox

Together with Deutsche Telekom and Tallence GmbH (DE) this project came 
to life.

First, the Github link:
https://github.com/ceph-dovecot/dovecot-ceph-plugin

I am not going to repeat everything which is on Github, put a short summary:

- CephFS is used for storing Mailbox Indexes
- E-Mails are stored directly as RADOS objects
- It's a Dovecot plugin

We would like everybody to test librmb and report back issues on Github 
so that further development can be done.

It's not finalized yet, but all the help is welcome to make librmb the 
best solution for storing your e-mails on Ceph with Dovecot.

Danny Al-Gaaf has written a small blogpost about it and a presentation:

- https://dalgaaf.github.io/CephMeetUpBerlin20170918-librmb/
- http://blog.bisect.de/2017/09/ceph-meetup-berlin-followup-librmb.html

To get a idea of the scale: 4,7PB of RAW storage over 1.200 OSDs is the 
final goal (last slide in presentation). That will provide roughly 1,2PB 
of usable storage capacity for storing e-mail, a lot of e-mail.

To see this project finally go into the Open Source world excites me a 
lot :-)

A very, very big thanks to Deutsche Telekom for funding this awesome 
project!

A big thanks as well to Tallence as they did an awesome job in 
developing librmb in such a short time.

Wido

Timo Sirainen

2017-Sep-24 00:43 UTC

head link

librmb: Mail storage on RADOS with Dovecot

On 22 Sep 2017, at 14.18, mj <lists at merit.unu.edu>
wrote:> First, the Github link:
> https://github.com/ceph-dovecot/dovecot-ceph-plugin
> 
> I am not going to repeat everything which is on Github, put a short
summary:
> 
> - CephFS is used for storing Mailbox Indexes
> - E-Mails are stored directly as RADOS objects
> - It's a Dovecot plugin
> 
> We would like everybody to test librmb and report back issues on Github so
that further development can be done.
> 
> It's not finalized yet, but all the help is welcome to make librmb the
best solution for storing your e-mails on Ceph with Dovecot.
It would be have been nicer if RADOS support was implemented as lib-fs driver,
and the fs-API had been used all over the place elsewhere. So 1) LibRadosMailBox
wouldn't have been relying so much on RADOS specifically and 2) fs-rados
could have been used for other purposes. There are already fs-dict and dict-fs
drivers, so the RADOS dict driver may not have been necessary to implement if
fs-rados was implemented instead (although I didn't check it closely enough
to verify). (We've had fs-rados on our TODO list for a while also.)

BTW. We've also been planning on open sourcing some of the obox pieces,
mainly fs-drivers (e.g. fs-s3). The obox format maybe too, but without the
"metacache" piece. The current obox code is a bit too much married
into the metacache though to make open sourcing it easy. (The metacache is about
storing the Dovecot index files in object storage and efficiently caching them
on local filesystem, which isn't planned to be open sourced in near future.
That's pretty much the only difficult piece of the obox plugin, with
Cassandra integration coming as a good second. I wish there had been a
better/easier geo-distributed key-value database to use - tombstones are
annoyingly troublesome.)

And using rmb-mailbox format, my main worries would be:
 * doesn't store index files (= message flags) - not necessarily a problem,
as long as you don't want geo-replication
 * index corruption means rebuilding them, which means rescanning list of mail
files, which means rescanning the whole RADOS namespace, which practically means
rescanning the RADOS pool. That most likely is a very very slow operation, which
you want to avoid unless it's absolutely necessary. Need to be very careful
to avoid that happening, and in general to avoid losing mails in case of crashes
or other bugs.
 * I think copying/moving mails physically copies the full data on disk
 * Each IMAP/POP3/LMTP/etc process connects to RADOS separately from each others
- some connection pooling would likely help here

Danny Al-Gaaf

2017-Sep-24 16:51 UTC

head link

librmb: Mail storage on RADOS with Dovecot

Am 24.09.2017 um 02:43 schrieb Timo Sirainen:> On 22 Sep 2017, at 14.18, mj <lists at merit.unu.edu> wrote:
>> First, the Github link: 
>> https://github.com/ceph-dovecot/dovecot-ceph-plugin
>> 
>> I am not going to repeat everything which is on Github, put a short
>> summary:
>> 
>> - CephFS is used for storing Mailbox Indexes - E-Mails are stored
>> directly as RADOS objects - It's a Dovecot plugin
>> 
>> We would like everybody to test librmb and report back issues on
>> Github so that further development can be done.
>> 
>> It's not finalized yet, but all the help is welcome to make librmb
>> the best solution for storing your e-mails on Ceph with Dovecot.
> 
> It would be have been nicer if RADOS support was implemented as
> lib-fs driver, and the fs-API had been used all over the place
> elsewhere. So 1) LibRadosMailBox wouldn't have been relying so much
> on RADOS specifically and 2) fs-rados could have been used for other
> purposes. There are already fs-dict and dict-fs drivers, so the RADOS
> dict driver may not have been necessary to implement if fs-rados was
> implemented instead (although I didn't check it closely enough to
> verify). (We've had fs-rados on our TODO list for a while also.)
Please note: librmb is not Dovecot specific. The goal of this library is
to abstract email storage at Ceph independent of Dovecot to allow also
other mail systems to store emails in RADOS via one library. This is
also the reason why it's relying on RADOS.

[...]> And using rmb-mailbox format, my main worries would be: 
> * doesn't store index files (= message flags) - not necessarily a
problem, as
> long as you don't want geo-replication 
The index files are stored via Dovecot's lib-index on CephFS. This is
only an intermediate step. The goal is to store also index data directly
in RADOS/Ceph omap key-value store. Currently geo-replication isn't an
important topic for our PoC setup at Deutsche Telekom.
> * index corruption means > rebuilding them, which means rescanning list
of mail files, which
> means rescanning the whole RADOS namespace, which practically means
> rescanning the RADOS pool. That most likely is a very very slow
> operation, which you want to avoid unless it's absolutely necessary.
> Need to be very careful to avoid that happening, and in general to
> avoid losing mails in case of crashes or other bugs.
This could be may avoided by snapshot on CephFS currently, at least
partially. But we will take a look at it during the PoC phase.
> * I think copying/moving mails physically copies the full data on disk
> * Each IMAP/POP3/LMTP/etc process connects to RADOS separately from each
> others - some connection pooling would likely help here
I'm not so deep in what Dovecot is currently doing. It's still under
heavy development and any comment and feedback is really welcome as Wido
already pointed out.

Danny

Peter Mauritius

2017-Sep-25 15:05 UTC

head link

librmb: Mail storage on RADOS with Dovecot

Hi Timo,

I am one of the authors of the software Wido announced in his mail. First,
I'd like to say that Dovecot is a wonderful piece of software and thank you
for it. I would like to give some explanations regarding the design we choose.

Von: Timo Sirainen <tss at iki.fi><mailto:tss at iki.fi>
Antworten: Dovecot Mailing List <dovecot at dovecot.org><mailto:dovecot
at dovecot.org>
Datum: 24. September 2017 at 02:43:44
An: Dovecot Mailing List <dovecot at dovecot.org><mailto:dovecot at
dovecot.org>
Betreff:  Re: librmb: Mail storage on RADOS with Dovecot

It would be have been nicer if RADOS support was implemented as lib-fs driver,
and the fs-API had been used all over the place elsewhere. So 1) LibRadosMailBox
wouldn't have been relying so much on RADOS specifically and 2) fs-rados
could have been used for other purposes. There are already fs-dict and dict-fs
drivers, so the RADOS dict driver may not have been necessary to implement if
fs-rados was implemented instead (although I didn't check it closely enough
to verify). (We've had fs-rados on our TODO list for a while also.)

Actually I considered using the fs-api to build a RADOS driver. But I did not
follow that path:

The dict-fs mapping is quite simplistic. For example, I would not be able to use
RADOS read/write operations to batch request or model the dictionary
transactions.  Also there is no async support if you hide the RADOS dictionary
behind as fs-api module, which would make the use of dict-rados in the
dict-proxy harder. Doing this would help to lower the price you have to pay for
the process model Dovecot ist using a lot.

Using a fs-rados module behing a storage module, let?s say sdbox, would IMO not
fit to our goals. We planned to store mails in RADOS object and their
(immutable) metadata in RADOS omap K/V. We want to be able to access the objects
without Dovecot. This is not possible if RADOS is hidden behind a fs-rados
module. The format of the stored objects would be different and depended on the
storage module sitting in front of fs-rados.
Another reason is that at the fs level the operations are to decomposed. We
would not have any, as with the dictionaries, transactional contexts etc. This
context information allows us to use the RADOS operations in an optimized way.
The storage API is IMO the right level of abstraction. Especially if we follow
our long term goal to eliminate the fs needs for index data to. I like the
internal abstraction of sdbox/mdbox a lot. But for our purpose it should have
been on mail and not file level.

But building a fs-rados should not be very hard.

BTW. We've also been planning on open sourcing some of the obox pieces,
mainly fs-drivers (e.g. fs-s3). The obox format maybe too, but without the
"metacache" piece. The current obox code is a bit too much married
into the metacache though to make open sourcing it easy. (The metacache is about
storing the Dovecot index files in object storage and efficiently caching them
on local filesystem, which isn't planned to be open sourced in near future.
That's pretty much the only difficult piece of the obox plugin, with
Cassandra integration coming as a good second. I wish there had been a
better/easier geo-distributed key-value database to use - tombstones are
annoyingly troublesome.)


That would be great.

And using rmb-mailbox format, my main worries would be:
* doesn't store index files (= message flags) - not necessarily a problem,
as long as you don't want geo-replication

Your index management is awesome, highly optimized and not easily reimplemented.
Very nice work. Unfortunately it is not using the fs-api and therefore not
capable of being located on not fs storage. We are believing that CephFS will be
a good and stable solution for the next time. Of course it would be nicer to
have a lib-index that allows us to plug in different backends.

* index corruption means rebuilding them, which means rescanning list of mail
files, which means rescanning the whole RADOS namespace, which practically means
rescanning the RADOS pool. That most likely is a very very slow operation, which
you want to avoid unless it's absolutely necessary. Need to be very careful
to avoid that happening, and in general to avoid losing mails in case of crashes
or other bugs.

Yes, disaster is a problem. We are trying to build as many rescue tools as
possible but in the end scanning mails is involved. All mails are stored within
separate RADOS namespaces each representing a different user. This will help us
to avoid scanning the whole pool. But it this not should not be a regular
operation. You are right.

* I think copying/moving mails physically copies the full data on disk

We tried to optimize this. Moves within a users mailboxes are done without
copying the mails by just changing the index data. Copies, when really
necessary, are done be native RADOS commands (OSD to OSD) without transferring
the data to the client and back. There is potential for even more optimization.
We could build a mechanism similar to the mdbox reference counters to reduce
copying. I am sure we will give it a try in a later version.

* Each IMAP/POP3/LMTP/etc process connects to RADOS separately from each others
- some connection pooling would likely help here

Dovecot is using separate processes a lot. You are right that this is a problem
for protocols/libraries that have a high setup cost. You build some mechanisms
like login process reuse or the dict-proxy to overcome that problem.

Ceph is a low latency object store. One reason of the speed of Ceph is the fact
that the cluster structure is known to the clients. The clients have a direct
connection to the OSD that hosts the object they are looking for. If we place
any intermediaries between the client process and the OSD (like with the
dict-proxy) the performance will suffer.

IMO the processes you mentioned should be reused to reduce the setup cost per
session (or implemented multithreaded or async). I am aware of the fact that
this might be a potential security risk.

Right now we do not know the price for the connection setup in a real cluster in
a Dovecot context. We are curious about the results of the tests with
Danny's cluster and will change the design of the software to get the best
results of it if necessary.

Best regards

Peter

Seemingly Similar Threads

Search for more possibly parallel threads

dovecot - Sep 2017 - librmb: Mail storage on RADOS with Dovecot

librmb: Mail storage on RADOS with Dovecot

librmb: Mail storage on RADOS with Dovecot

librmb: Mail storage on RADOS with Dovecot

librmb: Mail storage on RADOS with Dovecot

Seemingly Similar Threads