Hi Guys, I've been desperately trying to find some comparative performance information about the different mailbox formats supported by Dovecot in order to make an assessment on which format is right for our environment. This is a brand new build, with customer mailboxes to be migrated in over the course of 3-4 months. Some details on our new environment: * Approximately 1.6M+ mailboxes once all legacy systems are combined * NetApp FAS6280 storage w/ 120TB usable for mail storage, 1TB of FlashCache in each controller * All mail storage presented via NFS over 10Gbps Ethernet (Jumbo Frames) * Postfix will feed new email to Dovecot via LMTP * Dovecot servers have been split based on their role - Dovecot LDA Servers (running LMTP protocol) - Dovecot POP/IMAP servers (running POP/IMAP protocols) - LDA & POP/IMAP servers are segmented into geographically split groups (so no server sees every single mailbox) - Nginx proxy used to terminate customer connections, connections are redirected to the appropriate geographic servers * Apache Lucene indexes will be used to accelerate IMAP search for users Our closest current live configuration (Qmail SMTP, Courier IMAP, Maildir) has 600K mailboxes and pushes ~ 35,000 NFS operations per second at peak Some of the things I would like to know: * Are we likely to see a reduction in IOPS/User by using Maildir alone under Dovecot? * What kind of IOPS/User reduction could we expect to see under mdbox? * If someone can give some technical reasoning behind why mdbox does less IOPS than Maildir? I understand some of the reasons for the mdbox IOPS question, but I need some more information so we can discuss internally and make a decision as to whether we're comfortable going with mdbox from day one. We're very familiar with Maidlir, and there's just some uneasiness internally around going to a new mail storage format. Thanks!
Am 18.01.2012 13:44, schrieb Lee Standen:> Hi Guys, > > > > I've been desperately trying to find some comparative performance > information about the different mailbox formats supported by Dovecot in > order to make an assessment on which format is right for our environment. > > This is a brand new build, with customer mailboxes to be migrated in over > the course of 3-4 months. > > > > Some details on our new environment: > > * Approximately 1.6M+ mailboxes once all legacy systems are combined > > * NetApp FAS6280 storage w/ 120TB usable for mail storage, 1TB of FlashCache > in each controller > > * All mail storage presented via NFS over 10Gbps Ethernet (Jumbo Frames)nfs may not be optimal clusterfilesystem might better, but this is an heavy seperate discussion> > * Postfix will feed new email to Dovecot via LMTPperfect> > * Dovecot servers have been split based on their role > > - Dovecot LDA Servers (running LMTP protocol) > > - Dovecot POP/IMAP servers (running POP/IMAP protocols) > > - LDA & POP/IMAP servers are segmented into geographically split groups > (so no server sees every single mailbox) > > - Nginx proxy used to terminate customer connections, connections are > redirected to the appropriate geographic servers > > * Apache Lucene indexes will be used to accelerate IMAP search for users >sounds ok> > > > > Our closest current live configuration (Qmail SMTP, Courier IMAP, Maildir) > has 600K mailboxes and pushes ~ 35,000 NFS operations per second at peakwow thats big> > > > Some of the things I would like to know: > > * Are we likely to see a reduction in IOPS/User by using Maildir alone under > Dovecot? > > * What kind of IOPS/User reduction could we expect to see under mdbox?there should be people on the list , knowing this , by migration done> > * If someone can give some technical reasoning behind why mdbox does less > IOPS than Maildir?as far i remember mdbox takes 8 mails per file ( i am not using it currently, so i didnt investigate it ), better wait for more qualified answer, anyway mdbox seems recommended in your case from our last plans about 25k mailboxes we decide using mdbox, as far i remember....> > > > I understand some of the reasons for the mdbox IOPS question, but I need > some more information so we can discuss internally and make a decision as to > whether we're comfortable going with mdbox from day one. We're very > familiar with Maidlir, and there's just some uneasiness internally around > going to a new mail storage format. > > > > Thanks! > > > >from my personal knowledge io on storage has most influance of performance, if at last ,all other setup parts are solved optimal wait a little bit , i guess more matching answers will come up after all ,you can hire someone, perhaps Timo, if you stuck in something -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Javier Miguel RodrÃguez
2012-Jan-18 13:27 UTC
[Dovecot] Performance of Maildir vs sdbox/mdbox
Spanish edu site here, 80k users, 4,5 TB of email, 6.000 iops (indexes) + 9.000 iops (mdboxes) in working hours here. We evaluated mdbox against Maildir and we found that with these setting dovecot 2 perfoms better than Maildir: mdbox_rotate_interval = 1d mdbox_rotate_size=60m zlib_save_level = 9 # 1..9 zlib_save = gz # or bz2 We detected 40% less iops with this setup *in working hours (more info below)*. Zlib saved some writes (15-30%). With mdbox, deletion of a message is written to indexes (use SSD for this), and a nightly cronjob deletes the real message from the mdbox, this saves us some iops in working hours. Also, backup software is MUCH happier handling hundreds of thousands files (mdbox) versus tens of millions (maildir) Mdbox has also drawbacks: you have to be VERY careful with your indexes, they contain data that can not be rebuilt from mdboxes. The nightly cronjob "purging" the mdboxes hammers the SAN. Full backup time is reduced, but incremental backup space & time increases: if you delete a message, after "purging" it from the mdbox the mdbox file changes (size and date), so the incremental backup has to copy it again. Regards Javier
On Wed, 2012-01-18 at 20:44 +0800, Lee Standen wrote:> I've been desperately trying to find some comparative performance > information about the different mailbox formats supported by Dovecot in > order to make an assessment on which format is right for our environment.Unfortunately there aren't really any. Everyone who seems to switch to sdbox/mdbox usually also change their hardware at the same time, so there aren't really any before/after metrics. I've of course some unrealistic synthetic benchmarks, but I don't think they are very useful. So, I would also be very interested in seeing some before/after graphs of disk IO, CPU and memory usage of Maildir -> dbox switch in same hardware. Maildir is anyway definitely worse performance then sdbox or mdbox. mdbox also uses less NFS operations, but I don't know how much faster (if any) it is with Netapps.> * All mail storage presented via NFS over 10Gbps Ethernet (Jumbo Frames) > > * Postfix will feed new email to Dovecot via LMTP > > * Dovecot servers have been split based on their role > > - Dovecot LDA Servers (running LMTP protocol) > > - Dovecot POP/IMAP servers (running POP/IMAP protocols)You're going to run into NFS caching troubles with the above split setup. I don't recommend it. You will see error messages about index corruption with it, and with dbox it can cause metadata loss. http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/Director> - LDA & POP/IMAP servers are segmented into geographically split groups > (so no server sees every single mailbox) > > - Nginx proxy used to terminate customer connections, connections are > redirected to the appropriate geographic serversCan the same mailbox still be accessed via multiple geographic servers? I've had some plans for doing this kind of access/replication using dsync..> * Apache Lucene indexes will be used to accelerate IMAP search for usersDovecot's fts-solr or fts-lucene?> Our closest current live configuration (Qmail SMTP, Courier IMAP, Maildir) > has 600K mailboxes and pushes ~ 35,000 NFS operations per second at peak > > Some of the things I would like to know: > > * Are we likely to see a reduction in IOPS/User by using Maildir alone under > Dovecot?If you have webmail type of clients, definitely. For Outlook/Thunderbird you should still see improvement, but not necessarily as much. You didn't mention POP3. That isn't Dovecot's strong point. Its performance should be about the same as Courier-POP3, but could be less than QMail-POP3. Although if many of your POP3 users keep a lot of mails on server it> * If someone can give some technical reasoning behind why mdbox does less > IOPS than Maildir?Maildir renames files a lot. From new/ -> to cur/ and then every time message flag changes. That's why sdbox is faster. Why mdbox should be faster than sdbox is because mdbox puts (or should put) more mail data physically closer in disks to make reading it faster.> I understand some of the reasons for the mdbox IOPS question, but I need > some more information so we can discuss internally and make a decision as to > whether we're comfortable going with mdbox from day one. We're very > familiar with Maidlir, and there's just some uneasiness internally around > going to a new mail storage format.It's at least safer to first switch to Dovecot+Maildir to make sure that any problems you might find aren't related to the mailbox format..
On 18.01.2012 21:54, Timo Sirainen wrote:> On Wed, 2012-01-18 at 20:44 +0800, Lee Standen wrote: > >> I've been desperately trying to find some comparative performance >> information about the different mailbox formats supported by Dovecot >> in >> order to make an assessment on which format is right for our >> environment. > > Unfortunately there aren't really any. Everyone who seems to switch > to > sdbox/mdbox usually also change their hardware at the same time, so > there aren't really any before/after metrics. I've of course some > unrealistic synthetic benchmarks, but I don't think they are very > useful. > > So, I would also be very interested in seeing some before/after > graphs > of disk IO, CPU and memory usage of Maildir -> dbox switch in same > hardware. > > Maildir is anyway definitely worse performance then sdbox or mdbox. > mdbox also uses less NFS operations, but I don't know how much faster > (if any) it is with Netapps.We have bought new hardware for this project too, so we might not be able to help out massively on that front... we do have NFS operations monitored though so we should at least be able to compare that metric since the underlying storage operating system is the same. All NetApp hardware runs their Data ONTAP operating system, so the metrics are assured to be the same :) How about this... are there any tools available (that you know of) to capture real live customer POP3/IMAP traffic and replay it against a separate system? That might be a feasible option for doing a like-for-like comparison in our environment? We could probably get something in place to simulate the load if we can do something like that...>> * All mail storage presented via NFS over 10Gbps Ethernet (Jumbo >> Frames) >> >> * Postfix will feed new email to Dovecot via LMTP >> >> * Dovecot servers have been split based on their role >> >> - Dovecot LDA Servers (running LMTP protocol) >> >> - Dovecot POP/IMAP servers (running POP/IMAP protocols) > > You're going to run into NFS caching troubles with the above split > setup. I don't recommend it. You will see error messages about index > corruption with it, and with dbox it can cause metadata loss. > http://wiki2.dovecot.org/NFS http://wiki2.dovecot.org/DirectorThat might be the one thing (unfortunately) which prevents us from going with the dbox format. I understand the same issue can actually occur on Dovecot Maildir as well, but because Maildir works without these index files, we were willing to just go with it. I will raise it again, but there has been a lot of push back about introducing a single point of failure, even though this is a perceived one. The biggest challenge I have at the moment if I try to sell the dbox format is providing some kind of data on the expected gains from this. If it's only a 10% reduction in NFS operations for the typical user, then it's probably not worth our while.> >> - LDA & POP/IMAP servers are segmented into geographically split >> groups >> (so no server sees every single mailbox) >> >> - Nginx proxy used to terminate customer connections, connections >> are >> redirected to the appropriate geographic servers > > Can the same mailbox still be accessed via multiple geographic > servers? > I've had some plans for doing this kind of access/replication using > dsync..No, we're using the nginx proxy layer to ensure that if a user in Sydney (for example) tries to access a Perth mailbox, their connection is redirected (by nginx) to the Perth POP/IMAP servers. Postfix configuration is handling the same thing on the LMTP side. The requirement here is for all users to have the same settings regardless of location, but still be able to locate the email servers and data close to the customer.> >> * Apache Lucene indexes will be used to accelerate IMAP search for >> users > > Dovecot's fts-solr or fts-lucene?fts-solr. I've been using Lucene/Solr interchangeably when discussing this project with my peers :)> >> Our closest current live configuration (Qmail SMTP, Courier IMAP, >> Maildir) >> has 600K mailboxes and pushes ~ 35,000 NFS operations per second at >> peak >> >> Some of the things I would like to know: >> >> * Are we likely to see a reduction in IOPS/User by using Maildir >> alone under >> Dovecot? > > If you have webmail type of clients, definitely. For > Outlook/Thunderbird > you should still see improvement, but not necessarily as much. > > You didn't mention POP3. That isn't Dovecot's strong point. Its > performance should be about the same as Courier-POP3, but could be > less > than QMail-POP3. Although if many of your POP3 users keep a lot of > mails > on server it >Our existing systems run with about 21K concurrent IMAP connections at any one point in time, not counting Webmail POP3 runs at about 3600 concurrent connections, but since those are not long lived it's not particularly indicative of customer numbers. Vague recollection is something like 25% IMAP, 55-60% POP3, rest < 20% Webmail. I'd have to go back and check the breakdown again.>> * If someone can give some technical reasoning behind why mdbox does >> less >> IOPS than Maildir? > > Maildir renames files a lot. From new/ -> to cur/ and then every time > message flag changes. That's why sdbox is faster. Why mdbox should be > faster than sdbox is because mdbox puts (or should put) more mail > data > physically closer in disks to make reading it faster. > >> I understand some of the reasons for the mdbox IOPS question, but I >> need >> some more information so we can discuss internally and make a >> decision as to >> whether we're comfortable going with mdbox from day one. We're very >> familiar with Maidlir, and there's just some uneasiness internally >> around >> going to a new mail storage format. > > It's at least safer to first switch to Dovecot+Maildir to make sure > that > any problems you might find aren't related to the mailbox format..Yep, I'm considering that. The flip side is that it's actually going to be difficult for us to change mail format once we've migrated into this system, but we have an opportunity for (literally) a month long testing phase beginning in Feb/March which will let us test as many possibilities as we can.