In the future, it would be cool if there were a mailbox format (dbox2?) where mail headers and each mime part were stored in separate files. This would enable the zfs dedup feature to be used to maximum benefit. In the zfs filesystem, there is a dedup feature which stores only 1 copy of duplicate blocks. In a normal mail file, the headers will be different for each recipient and the chances of the content of the message being able to be dedup'd are close to zero, because the differences in header length changes the block boundaries for the rest of the message. But if each mime part is stored in a separate file, you get massive compression "for free". -frank
On Fri, 2010-01-22 at 15:53 -0500, Frank Cusack wrote:> In the future, it would be cool if there were a mailbox format (dbox2?) > where mail headers and each mime part were stored in separate files. > This would enable the zfs dedup feature to be used to maximum benefit.This is more or less what dbox's single instance storage is going to do. Maybe in half a year or so.. And you don't even need filesystem deduplication feature. :) It would also be possible to already write such Maildir feature. Someone on this list already wrote header/body separation code, which was pretty easy to do with a plugin.> In the zfs filesystem, there is a dedup feature which stores only 1 copy > of duplicate blocks. In a normal mail file, the headers will be > different for each recipient and the chances of the content of the message > being able to be dedup'd are close to zero, because the differences in > header length changes the block boundaries for the rest of the message. > But if each mime part is stored in a separate file, you get massive > compression "for free".Dunno about zfs, but I've heard that at least in one NetApp installation deduplication was way too heavyweight. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20100122/44139e20/attachment-0002.bin>
On January 22, 2010 11:05:22 PM +0200 Timo Sirainen <tss at iki.fi> wrote:> Dunno about zfs, but I've heard that at least in one NetApp installation > deduplication was way too heavyweight.zfs dedup is pretty resources intensive -- for writes. For mail I suspect reads overwhelm writes? -frank
On January 22, 2010 11:05:22 PM +0200 Timo Sirainen <tss at iki.fi> wrote:> On Fri, 2010-01-22 at 15:53 -0500, Frank Cusack wrote: >> In the future, it would be cool if there were a mailbox format (dbox2?) >> where mail headers and each mime part were stored in separate files. >> This would enable the zfs dedup feature to be used to maximum benefit. > > This is more or less what dbox's single instance storage is going to do. > Maybe in half a year or so.. And you don't even need filesystem > deduplication feature. :)But if the mail system has to handle it, it only knows about mails written at the same time. For example, if postfix delivers mail with a single recipient per mail (the recommended config somewhere, not sure if recommended by postfix or by dovecot), dbox won't get the opportunity to dedup. And for mails which are re-forwarded (pretty common occurrence), again dbox won't get the chance to dedup. Or will there be a global index? -frank