Christoph Anton Mitterer
2012-Oct-29 20:54 UTC
[Dovecot] mbox vs. maildir storage block waste
Hi. I recently mentioned in several posts, that I'd tended to use mbox rather than maildir, because you don't loose so much space (due to always allocating full blocks per maildir file and thus per mail). I made some tests of my archive, which consists of some 3,4 million mails at a total of 42GB). Most of these mails are probably normal sized, but there are also some with bigger attachments. For those who are interested here are the results: I used a 53687091200 B image file (via loop device) and tested ext4 only. btrfs is IMHO not yet ready, I have had often issues with XFS (corruptions), reiser4 is more or less dead and reiser3 is said to have issues (see e.g. its wikipedia article, even though it has that mode for small files which would fit nicely). As you see the number of mails increased a bit, cause I tested over several days... but this is only a very small increase so it shouldn't change the numbers a lot. 1) Original mbox archives (right now in Evolution) mbox exact space: 38122676224 (does not include meta-data) mbox guess space: 44625670144 (includes Evolution meta-data which is several GBs) mbox num mails: 3412999 (occurances of From_ lines) In the following: - image file, 1B-blocks, Used_begin, Used_end, Available_begin, Available_end result out of df -B 1 - mdir exact used space is the sum of du -B 1 for each regular file (i.e. each mdir file) - mdir guess used space du -B 1 on the root dir of the filesystem - mdir num mails: find . type -f | wc -l on the root dir of the filesystem 2) EXT4 with 4096 blocks: image file: 53687091200 1B-blocks: 52844687360 Used_begin: 188555264 Used_end: 45198778368 Available_begin: 49971777536 Available_end: 2444972032 mdir exact used space: 44810866688 mdir guess used space: 45010243584 mdir num mails: 3423296 delta: 6.688190464 G delta / mail: 1953 B 3) EXT4 with 2048 blocks: image file: 53687091200 1B-blocks: 50324295680 Used_begin: 82857984 Used_end: 41598846976 Available_begin: 47557083136 Available_end: 6041094144 mdir exact used space: 41323991040 mdir guess used space: 41516007424 mdir num mails: 3425033 delta: 3.201314816 G delta / mail: 934 B 4) EXT4 with 1024 blocks: image file: 53687091200 1B-blocks: 50314834944 Used_begin: 38287360 Used_end: 39909360640 Available_begin: 47592193024 Available_end: 7721119744 mdir exact used space: 39683908608 mdir guess used space: 39871086592 mdir num mails: 3425033 delta: 1.561232384 G delta / mail: 455 B As you can see, the delta per mail is rather close to the statistically expected values of 2048B, 1024B and 512B. In the end I probably changed my opinion. ~7GB of wasted block space for all my mails is actually quite a lot, but in days of cheap disk space it's acceptable. And with mbox one has IMHO the major disadvantage that mailservers (including dovecot) store some meta-data _in_ it (i.e. in the mails themselves) , which I don't like a lot. I still think about reports that mbox is much faster with full text search (which sounds reasonable)... but therefore one needs probably and database backend anyway. HTH, Chris. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5450 bytes Desc: not available URL: <http://dovecot.org/pipermail/dovecot/attachments/20121029/1287a423/attachment-0004.bin>
On 29.10.2012, at 22.54, Christoph Anton Mitterer wrote:> I recently mentioned in several posts, that I'd tended to use mbox > rather than maildir, because you don't loose so much space (due to > always allocating full blocks per maildir file and thus per mail)...> In the end I probably changed my opinion. > ~7GB of wasted block space for all my mails is actually quite a lot, but > in days of cheap disk space it's acceptable. > And with mbox one has IMHO the major disadvantage that mailservers > (including dovecot) store some meta-data _in_ it (i.e. in the mails > themselves) , which I don't like a lot. > I still think about reports that mbox is much faster with full text > search (which sounds reasonable)... but therefore one needs probably and > database backend anyway.There is of course mdbox also, which gives the best of both mbox and maildir (and some of its own new annoyances).
On 2012-10-29 5:42 PM, Timo Sirainen <tss at iki.fi> wrote:> On 29.10.2012, at 23.15, Christoph Anton Mitterer wrote: > >> btw: What are the actual advantages of sdbox over maildir? > * Not moving files from new/ to cur/ directory > * Not renaming files when changing message flags > * Not readdir()ing directories (although maildir_very_dirty_syncs=yes helps a lot with this) > > Basically less disk I/O and making it possible to have mailboxes with a huge number of messages without everything slowing down horribly. >I had been wanting to ask about this too... So... what are the disadvantages? -- Best regards, Charles
On 2012-10-29 4:54 PM, Christoph Anton Mitterer <calestyo at scientia.net> wrote:> In the end I probably changed my opinion. > ~7GB of wasted block space for all my mails is actually quite a lot, but > in days of cheap disk space it's acceptable. > And with mbox one has IMHO the major disadvantage that mailservers > (including dovecot) store some meta-data_in_ it (i.e. in the mails > themselves) , which I don't like a lot. > I still think about reports that mbox is much faster with full text > search (which sounds reasonable)... but therefore one needs probably and > database backend anyway.What makes the most sense for me is to use mbox (or mdbox) for longer term storage that you may be offloading to slower storage systems, and use maildir (or sdbox) for the new mails... Would work great as long as you have a reliable method for archiving older mails out to your slower storage. This is what I plan on doing someday... -- Best regards, Charles