Hi Everyone, I hope this is the right forum for this question. A customer is using a Thumper as an NFS file server to provide the mail store for multiple email servers (Dovecot). They find that when a zpool is freshly created and populated with mail boxes, even to the extent of 80-90% capacity, performance is ok for the users, backups and scrubs take a few hours (4TB of data). There are around 100 file systems. After running for a while (couple of months) the zpool seems to get "fragmented", backups take 72 hours and a scrub takes about 180 hours. They are running mirrors with about 5TB usable per pool (500GB disks). Being a mail store, the writes and reads are small and random. Record size has been set to 8k (improved performance dramatically). The backup application is Amanda. Once backups become too tedious, the remedy is to replicate the pool and start over. Things get fast again for a while. Is this expected behavior given the application (email - small, random writes/reads)? Are there recommendations for system/ZFS/NFS configurations to improve this sort of thing? Are there best practices for structuring backups to avoid a directory walk? Thanks, bill
On Tue, 2009-12-15 at 17:28 -0800, Bill Sprouse wrote:> After > running for a while (couple of months) the zpool seems to get > "fragmented", backups take 72 hours and a scrub takes about 180 > hours.Are there periodic snapshots being created in this pool? Can they run with atime turned off? (file tree walks performed by backups will update the atime of all directories; this will generate extra write traffic and also cause snapshots to diverge from their parents and take longer to scrub). - Bill
On Tue, 15 Dec 2009, Bill Sprouse wrote:> Hi Everyone, > > I hope this is the right forum for this question. A customer is using a > Thumper as an NFS file server to provide the mail store for multiple email > servers (Dovecot). They find that when a zpool is freshly created andIt seems that Dovecot''s speed optimizations for mbox format are specially designed to break zfs "http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Speed_Optimizations" and explains why using a tiny 8k recordsize temporarily "improved" performance. Tiny updates seem to be abnormal for a mail server. The many tiny updates combined with zfs COW conspire to spread the data around the disk, requiring a seek for each 8k of data. If more data was written at once, and much larger blocks were used, then the filesystem would continue to perform much better, although perhaps less well initially. If the system has sufficient RAM, or a large enough L2ARC, then Dovecot''s optimizations to diminish reads become meaningless.> Is this expected behavior given the application (email - small, random > writes/reads)? Are there recommendations for system/ZFS/NFS configurations > to improve this sort of thing? Are there best practices for structuring > backups to avoid a directory walk?Zfs works best when whole files are re-written rather than updated in place as Dovecot seems to want to do. Either the user mailboxes should be re-written entirely when they are "expunged" or else a different mail storage format which writes entire files, or much larger records, should be used. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
I have also had slow scrubbing on filesystems with lots of files, and I agree that it does seem to degrade badly. For me, it seemed to go from 24 hours to 72 hours in a matter of a few weeks. I did these things on a pool in-place, which helped a lot (no rebuilding): 1. reduced number of snapshots (auto snapshots can generate a lot of files). 2. disabled compression and rebuilt affected datasets (is compression on?) 3. upgraded to b129, which has metadata prefetch for scrub, seems to help by ~2x? 4. tar''d up some extremely large folders 5. added 50% more RAM. 6. turned off atime My scrubs went from 80 hours to 12 with these changes. (4TB used, ~10M files + 10 snapshots each.) I haven''t figured out if "disable compression" vs. "fewer snapshots/files and more RAM" made a bigger difference. I''m assuming that once the number of files exceeds ARC, you get dramatically lower performance, and maybe that compression has some additional overhead, but I don''t know, this is just what worked. It would be nice to have a benchmark set for features like this & general recommendations for RAM/ARC size, based on number of files, etc. How does ARC usage scale with snapshots? Scrub on a huge maildir machine seems like it would make a nice benchmark. I used "zdb -d pool" to figure out which filesystems had a lot of objects, and figured out places to trim based on that. mike On Tue, Dec 15, 2009 at 6:41 PM, Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote:> On Tue, 15 Dec 2009, Bill Sprouse wrote: > > Hi Everyone, >> >> I hope this is the right forum for this question. A customer is using a >> Thumper as an NFS file server to provide the mail store for multiple email >> servers (Dovecot). They find that when a zpool is freshly created and >> > > It seems that Dovecot''s speed optimizations for mbox format are specially > designed to break zfs > > " > http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Speed_Optimizations > " > > and explains why using a tiny 8k recordsize temporarily "improved" > performance. Tiny updates seem to be abnormal for a mail server. The many > tiny updates combined with zfs COW conspire to spread the data around the > disk, requiring a seek for each 8k of data. If more data was written at > once, and much larger blocks were used, then the filesystem would continue > to perform much better, although perhaps less well initially. If the system > has sufficient RAM, or a large enough L2ARC, then Dovecot''s optimizations to > diminish reads become meaningless. > > > Is this expected behavior given the application (email - small, random >> writes/reads)? Are there recommendations for system/ZFS/NFS configurations >> to improve this sort of thing? Are there best practices for structuring >> backups to avoid a directory walk? >> > > Zfs works best when whole files are re-written rather than updated in place > as Dovecot seems to want to do. Either the user mailboxes should be > re-written entirely when they are "expunged" or else a different mail > storage format which writes entire files, or much larger records, should be > used. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091215/6068172f/attachment.html>
On Tue, Dec 15, 2009 at 5:28 PM, Bill Sprouse <Bill.Sprouse at sun.com> wrote:> Hi Everyone, > > I hope this is the right forum for this question. ?A customer is using a > Thumper as an NFS file server to provide the mail store for multiple email > servers (Dovecot). ?They find that when a zpool is freshly created and > populated with mail boxes, even to the extent of 80-90% capacity, > performance is ok for the users, backups and scrubs take a few hours (4TB of > data). There are around 100 file systems. ?After running for a while (couple > of months) the zpool seems to get "fragmented", backups take 72 hours and a > scrub takes about 180 hours. ?They are running mirrors with about 5TB usable > per pool (500GB disks). ?Being a mail store, the writes and reads are small > and random. ?Record size has been set to 8k (improved performance > dramatically). ?The backup application is Amanda. ?Once backups become too > tedious, the remedy is to replicate the pool and start over. ?Things get > fast again for a while. > > Is this expected behavior given the application (email - small, random > writes/reads)? ?Are there recommendations for system/ZFS/NFS configurations > to improve this sort of thing? ?Are there best practices for structuring > backups to avoid a directory walk? > > Thanks, > bill > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >Anyone reason in particular they chose to use Dovecot with the old Mbox format? Mbox has been proven many times over to be painfully slow when the files get larger, and in this day and age, I can''t imagine anyone having smaller than a 50MB mailbox. We have about 30,000 e-mail users on various systems, and it seems the average size these days is approaching close to a GB. Though Dovecot has done a lot to improve the performance of Mbox mailboxes, Maildir might be more rounded for your system. I wonder if the "soon to be released" block/parity rewrite tool will "freshen" up a pool thats heavily fragmented, without having to redo the pools. -- Brent Jones brent at servuhome.net
Michael Herf wrote:> I have also had slow scrubbing on filesystems with lots of files, and I > agree that it does seem to degrade badly. For me, it seemed to go from > 24 hours to 72 hours in a matter of a few weeks. > > I did these things on a pool in-place, which helped a lot (no rebuilding):> 2. disabled compression and rebuilt affected datasets (is compression on?)That one shouldn''t have made any difference because if the data is only being read for the purposes of a scrub it won''t be uncompressed. What probably made more difference was the fact that you "rebuild" some datasets and if you didn''t export the pool those would now all possibly be hot in the ARC. -- Darren J Moffat
On Dec 15, 2009, at 6:24 PM, Bill Sommerfeld wrote:> On Tue, 2009-12-15 at 17:28 -0800, Bill Sprouse wrote: >> After >> running for a while (couple of months) the zpool seems to get >> "fragmented", backups take 72 hours and a scrub takes about 180 >> hours. > > Are there periodic snapshots being created in this pool?Yes, every two hours.> > Can they run with atime turned off?I''m not sure, but I expect they can. I''ll ask.> > (file tree walks performed by backups will update the atime of all > directories; this will generate extra write traffic and also cause > snapshots to diverge from their parents and take longer to scrub). > > - Bill >Thanks!
Hi Bob, On Dec 15, 2009, at 6:41 PM, Bob Friesenhahn wrote:> On Tue, 15 Dec 2009, Bill Sprouse wrote: > >> Hi Everyone, >> >> I hope this is the right forum for this question. A customer is >> using a Thumper as an NFS file server to provide the mail store for >> multiple email servers (Dovecot). They find that when a zpool is >> freshly created and > > It seems that Dovecot''s speed optimizations for mbox format are > specially designed to break zfs > > "http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Speed_Optimizations > " > > and explains why using a tiny 8k recordsize temporarily "improved" > performance. Tiny updates seem to be abnormal for a mail server. > The many tiny updates combined with zfs COW conspire to spread the > data around the disk, requiring a seek for each 8k of data. If more > data was written at once, and much larger blocks were used, then the > filesystem would continue to perform much better, although perhaps > less well initially. If the system has sufficient RAM, or a large > enough L2ARC, then Dovecot''s optimizations to diminish reads become > meaningless.I think one of the reasons they went to small recordsizes was an issue where they were getting killed with reads of small messages and having to pull in 128K records each time. The smaller recordsizes seem to have improved that aspect at least. Thanks for the pointer to the Dovecot notes.> >> Is this expected behavior given the application (email - small, >> random writes/reads)? Are there recommendations for system/ZFS/NFS >> configurations to improve this sort of thing? Are there best >> practices for structuring backups to avoid a directory walk? > > Zfs works best when whole files are re-written rather than updated > in place as Dovecot seems to want to do. Either the user mailboxes > should be re-written entirely when they are "expunged" or else a > different mail storage format which writes entire files, or much > larger records, should be used. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Thanks MIchael, Useful stuff to try. I wish we could add more memory, but the x4500 is limited to 16GB. Compression was a question. Its currently off, but they were thinking of turning it on. bill On Dec 15, 2009, at 7:02 PM, Michael Herf wrote:> I have also had slow scrubbing on filesystems with lots of files, > and I agree that it does seem to degrade badly. For me, it seemed to > go from 24 hours to 72 hours in a matter of a few weeks. > > I did these things on a pool in-place, which helped a lot (no > rebuilding): > 1. reduced number of snapshots (auto snapshots can generate a lot of > files). > 2. disabled compression and rebuilt affected datasets (is > compression on?) > 3. upgraded to b129, which has metadata prefetch for scrub, seems to > help by ~2x? > 4. tar''d up some extremely large folders > 5. added 50% more RAM. > 6. turned off atime > > My scrubs went from 80 hours to 12 with these changes. (4TB used, > ~10M files + 10 snapshots each.) > > I haven''t figured out if "disable compression" vs. "fewer snapshots/ > files and more RAM" made a bigger difference. I''m assuming that once > the number of files exceeds ARC, you get dramatically lower > performance, and maybe that compression has some additional > overhead, but I don''t know, this is just what worked. > > It would be nice to have a benchmark set for features like this & > general recommendations for RAM/ARC size, based on number of files, > etc. How does ARC usage scale with snapshots? Scrub on a huge > maildir machine seems like it would make a nice benchmark. > > I used "zdb -d pool" to figure out which filesystems had a lot of > objects, and figured out places to trim based on that. > > mike > > On Tue, Dec 15, 2009 at 6:41 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us > > wrote: > On Tue, 15 Dec 2009, Bill Sprouse wrote: > > Hi Everyone, > > I hope this is the right forum for this question. A customer is > using a Thumper as an NFS file server to provide the mail store for > multiple email servers (Dovecot). They find that when a zpool is > freshly created and > > It seems that Dovecot''s speed optimizations for mbox format are > specially designed to break zfs > > "http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Speed_Optimizations > " > > and explains why using a tiny 8k recordsize temporarily "improved" > performance. Tiny updates seem to be abnormal for a mail server. > The many tiny updates combined with zfs COW conspire to spread the > data around the disk, requiring a seek for each 8k of data. If more > data was written at once, and much larger blocks were used, then the > filesystem would continue to perform much better, although perhaps > less well initially. If the system has sufficient RAM, or a large > enough L2ARC, then Dovecot''s optimizations to diminish reads become > meaningless. > > > Is this expected behavior given the application (email - small, > random writes/reads)? Are there recommendations for system/ZFS/NFS > configurations to improve this sort of thing? Are there best > practices for structuring backups to avoid a directory walk? > > Zfs works best when whole files are re-written rather than updated > in place as Dovecot seems to want to do. Either the user mailboxes > should be re-written entirely when they are "expunged" or else a > different mail storage format which writes entire files, or much > larger records, should be used. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091216/d1b9018d/attachment.html>
Hi Brent, I''m not sure why Dovecot was chosen. It was most likely a recommendation by a fellow University. I agree that it lacking in efficiencies in a lot of areas. I don''t think I would be successful in suggesting a change at this point as I have already suggested a couple of alternatives without success. Do you a have a pointer to the "block/parity rewrite" tool mentioned below? bill On Dec 15, 2009, at 9:38 PM, Brent Jones wrote:> On Tue, Dec 15, 2009 at 5:28 PM, Bill Sprouse <Bill.Sprouse at sun.com> > wrote: >> Hi Everyone, >> >> I hope this is the right forum for this question. A customer is >> using a >> Thumper as an NFS file server to provide the mail store for >> multiple email >> servers (Dovecot). They find that when a zpool is freshly created >> and >> populated with mail boxes, even to the extent of 80-90% capacity, >> performance is ok for the users, backups and scrubs take a few >> hours (4TB of >> data). There are around 100 file systems. After running for a >> while (couple >> of months) the zpool seems to get "fragmented", backups take 72 >> hours and a >> scrub takes about 180 hours. They are running mirrors with about >> 5TB usable >> per pool (500GB disks). Being a mail store, the writes and reads >> are small >> and random. Record size has been set to 8k (improved performance >> dramatically). The backup application is Amanda. Once backups >> become too >> tedious, the remedy is to replicate the pool and start over. >> Things get >> fast again for a while. >> >> Is this expected behavior given the application (email - small, >> random >> writes/reads)? Are there recommendations for system/ZFS/NFS >> configurations >> to improve this sort of thing? Are there best practices for >> structuring >> backups to avoid a directory walk? >> >> Thanks, >> bill >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > Anyone reason in particular they chose to use Dovecot with the old > Mbox format? > Mbox has been proven many times over to be painfully slow when the > files get larger, and in this day and age, I can''t imagine anyone > having smaller than a 50MB mailbox. We have about 30,000 e-mail users > on various systems, and it seems the average size these days is > approaching close to a GB. Though Dovecot has done a lot to improve > the performance of Mbox mailboxes, Maildir might be more rounded for > your system. > > I wonder if the "soon to be released" block/parity rewrite tool will > "freshen" up a pool thats heavily fragmented, without having to redo > the pools. > > -- > Brent Jones > brent at servuhome.net
Any small updates to a file cause the file to be fragmented. The best Mail Box to use under Dovecot for ZFS is MailDir, each email is store as a individual file. Fair bit of statis info is kept in a index file, but also the filename it self is also used. The only problem with it is backups take longer as they are more smaller files (but this maybe better than what you are getting at the moment, if it is badly fragmented) See http://wiki.dovecot.org/MailboxFormat/Maildir http://www.linuxmail.info/mbox-maildir-mail-storage-formats/ The MailDir format will also work better with snapshots. Cheers -- This message posted from opensolaris.org
On Wed, 16 Dec 2009, Bill Sprouse wrote:> > I think one of the reasons they went to small recordsizes was an issue where > they were getting killed with reads of small messages and having to pull in > 128K records each time. The smaller recordsizes seem to have improved that > aspect at least. Thanks for the pointer to the Dovecot notes.This is likely due to insufficient RAM. Zfs performs very poorly if it is not able to cache full records in RAM but the (several/many) accesses are smaller than the record size. Dovecot is clearly optimized for a different type of file system. Something which is rarely mentioned is that zfs pools may be less fragmented on systems with lots of memory. The reason for this is that writes may be postponed to a time when there is more data to write (up to 30 seconds), and therefore more data is written contiguously or with a better layout. Synchronous write requests tend to defeat this, but perhaps using a SSD as an intent log may help so that synchronous writes to disk may also be deferred. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On 16-Dec-09, at 10:47 AM, Bill Sprouse wrote:> Hi Brent, > > I''m not sure why Dovecot was chosen. It was most likely a > recommendation by a fellow University. I agree that it lacking in > efficiencies in a lot of areas. I don''t think I would be > successful in suggesting a change at this point as I have already > suggested a couple of alternatives without success.(As Damon pointed out) The problem seems not Dovecot per se but the choice of mbox format, which is rather self-evidently inefficient.> > Do you a have a pointer to the "block/parity rewrite" tool > mentioned below? >It headlines the informal roadmap presented by Jeff Bonwick. http://www.snia.org/events/storage-developer2009/presentations/monday/ JeffBonwick_zfs-What_Next-SDC09.pdf --Toby> bill > > On Dec 15, 2009, at 9:38 PM, Brent Jones wrote: > >> On Tue, Dec 15, 2009 at 5:28 PM, Bill Sprouse >> <Bill.Sprouse at sun.com> wrote: >>> Hi Everyone, >>> >>> I hope this is the right forum for this question. A customer is >>> using a >>> Thumper as an NFS file server to provide the mail store for >>> multiple email >>> servers (Dovecot). They find that when a zpool is freshly >>> created and >>> populated with mail boxes, even to the extent of 80-90% capacity, >>> performance is ok for the users, backups and scrubs take a few >>> hours (4TB of >>> data). There are around 100 file systems. After running for a >>> while (couple >>> of months) the zpool seems to get "fragmented", backups take 72 >>> hours and a >>> scrub takes about 180 hours. They are running mirrors with about >>> 5TB usable >>> per pool (500GB disks). Being a mail store, the writes and reads >>> are small >>> and random. Record size has been set to 8k (improved performance >>> dramatically). The backup application is Amanda. Once backups >>> become too >>> tedious, the remedy is to replicate the pool and start over. >>> Things get >>> fast again for a while. >>> >>> Is this expected behavior given the application (email - small, >>> random >>> writes/reads)? Are there recommendations for system/ZFS/NFS >>> configurations >>> to improve this sort of thing? Are there best practices for >>> structuring >>> backups to avoid a directory walk? >>> >>> Thanks, >>> bill >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>> >> >> Anyone reason in particular they chose to use Dovecot with the old >> Mbox format? >> Mbox has been proven many times over to be painfully slow when the >> files get larger, and in this day and age, I can''t imagine anyone >> having smaller than a 50MB mailbox. We have about 30,000 e-mail users >> on various systems, and it seems the average size these days is >> approaching close to a GB. Though Dovecot has done a lot to improve >> the performance of Mbox mailboxes, Maildir might be more rounded for >> your system. >> >> I wonder if the "soon to be released" block/parity rewrite tool will >> "freshen" up a pool thats heavily fragmented, without having to redo >> the pools. >> >> -- >> Brent Jones >> brent at servuhome.net > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Wed, 16 Dec 2009, Toby Thain wrote:> > (As Damon pointed out) The problem seems not Dovecot per se but the choice of > mbox format, which is rather self-evidently inefficient.Note that Bill never told us what mail storage format was used. I was the one who suggested/assumed that ''mbox'' format was being used since the described behavior suggested it. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Just checked w/customer and they are using the MailDir functionality with Dovecot. On Dec 16, 2009, at 11:28 AM, Toby Thain wrote:> > On 16-Dec-09, at 10:47 AM, Bill Sprouse wrote: > >> Hi Brent, >> >> I''m not sure why Dovecot was chosen. It was most likely a >> recommendation by a fellow University. I agree that it lacking in >> efficiencies in a lot of areas. I don''t think I would be >> successful in suggesting a change at this point as I have already >> suggested a couple of alternatives without success. > > (As Damon pointed out) The problem seems not Dovecot per se but the > choice of mbox format, which is rather self-evidently inefficient. > > >> >> Do you a have a pointer to the "block/parity rewrite" tool >> mentioned below? >> > > It headlines the informal roadmap presented by Jeff Bonwick. > > http://www.snia.org/events/storage-developer2009/presentations/monday/JeffBonwick_zfs-What_Next-SDC09.pdf > > > --Toby > > >> bill >> >> On Dec 15, 2009, at 9:38 PM, Brent Jones wrote: >> >>> On Tue, Dec 15, 2009 at 5:28 PM, Bill Sprouse >>> <Bill.Sprouse at sun.com> wrote: >>>> Hi Everyone, >>>> >>>> I hope this is the right forum for this question. A customer is >>>> using a >>>> Thumper as an NFS file server to provide the mail store for >>>> multiple email >>>> servers (Dovecot). They find that when a zpool is freshly >>>> created and >>>> populated with mail boxes, even to the extent of 80-90% capacity, >>>> performance is ok for the users, backups and scrubs take a few >>>> hours (4TB of >>>> data). There are around 100 file systems. After running for a >>>> while (couple >>>> of months) the zpool seems to get "fragmented", backups take 72 >>>> hours and a >>>> scrub takes about 180 hours. They are running mirrors with about >>>> 5TB usable >>>> per pool (500GB disks). Being a mail store, the writes and reads >>>> are small >>>> and random. Record size has been set to 8k (improved performance >>>> dramatically). The backup application is Amanda. Once backups >>>> become too >>>> tedious, the remedy is to replicate the pool and start over. >>>> Things get >>>> fast again for a while. >>>> >>>> Is this expected behavior given the application (email - small, >>>> random >>>> writes/reads)? Are there recommendations for system/ZFS/NFS >>>> configurations >>>> to improve this sort of thing? Are there best practices for >>>> structuring >>>> backups to avoid a directory walk? >>>> >>>> Thanks, >>>> bill >>>> _______________________________________________ >>>> zfs-discuss mailing list >>>> zfs-discuss at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>> >>> >>> Anyone reason in particular they chose to use Dovecot with the old >>> Mbox format? >>> Mbox has been proven many times over to be painfully slow when the >>> files get larger, and in this day and age, I can''t imagine anyone >>> having smaller than a 50MB mailbox. We have about 30,000 e-mail >>> users >>> on various systems, and it seems the average size these days is >>> approaching close to a GB. Though Dovecot has done a lot to improve >>> the performance of Mbox mailboxes, Maildir might be more rounded for >>> your system. >>> >>> I wonder if the "soon to be released" block/parity rewrite tool will >>> "freshen" up a pool thats heavily fragmented, without having to redo >>> the pools. >>> >>> -- >>> Brent Jones >>> brent at servuhome.net >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Read this http://wiki.dovecot.org/MailLocation/SharedDisk If you were running Dovecot on the Thumper, mmap has issues under ZFS, old versions of ZFS (not sure if it is fixed in Sol10), so switch this off mmap_disable = yes as per the URL above for over NFS. Ensure NFS is tuned to 32K read and 32k writes (this will not help much because dovecot does small I/O, its the default on Solaris Clients, not Linux), use Jumbo frames if you can, use NFSv4. You could create a Caching NFS file system on the clients if they are Solaris. I assume the backups are not over NFS. The other choice is to run Dovecot on the Thumper. I beleive the MailDir format from DoveCot is only ever written once, if it is re-write the Mail Client is updating the email (and the whole email should be rewritten efectively) . At home Headers within emails 100% < 1k, 64% of email bodies are under 16k, 4% > 128k. I am surprised that 8k would help Given the files are not updated, the effective record size for a file will be the file size. A small record size might help with the indexes. Is the system running recent patches? -- This message posted from opensolaris.org
> The best Mail Box to use under Dovecot for ZFS is > MailDir, each email is store as a individual file.Can not agree on that. dbox is about 10x faster - at least if you have > 10000 messages in one mailbox folder. Thats not because of ZFS but dovecot just handles dbox files (one for each message like maildir) better in terms of indexing. The CPU stats for importing > 100000 messages via imap copy are even more worse for maildir: dbox is about 100x more efficient... . But anyway: its no problem to test different with imaptest or offlineimap because each users mailbox (and even folders) could be stored in a different format... Just to clarify: I''m using dovecot 1.2.x -- This message posted from opensolaris.org
Wilkinson, Alex
2010-Jan-15 04:59 UTC
[zfs-discuss] zpool fragmentation issues? (dovecot) [SEC=UNCLASSIFIED]
0n Thu, Jan 14, 2010 at 08:43:06PM -0800, Michael Keller wrote: >> The best Mail Box to use under Dovecot for ZFS is >> MailDir, each email is store as a individual file. > >Can not agree on that. dbox is about 10x faster - at least if you have > 10000 messages in one mailbox >folder. Thats not because of ZFS but dovecot just handles dbox files (one for each message like maildir) better in terms of indexing. Got a link to this magic dbox format ? -Alex IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email.
Michael Keller
2010-Jan-15 14:55 UTC
[zfs-discuss] zpool fragmentation issues? (dovecot) [SEC=UNCLASSIFIED]
> Got a link to this magic dbox format ?http://wiki.dovecot.org/MailboxFormat http://wiki.dovecot.org/MailboxFormat/dbox -- This message posted from opensolaris.org
According to DoveCot Wiki dbox files are re-written by a secondary process. ie delete do not happen immediately, they happen latter as a background process and the whole message file is re-written. You can set a size limit on message files. Some time ago I email Tim, on a few ideas to make it more ZFS friendly. I.e. to try and prevent rewrites. If you use dbox and keeping snapshots you will eat your disk up. MailDir is a lot friendlier to snapshots, but it will be slower for backups or searching text within the body of lots of email. Ie there are pro?s and cons with ZFS. Personal I will go for snapshots as being more important as I take them about 10 times a day and keep them for 7 days. Also MailDirs are easier to restore and individual email. It comes down to pro?s and con?s. Unfortunate performance is always the most important goal. Cheers Damon. -- This message posted from opensolaris.org
In my previous post I was refering more to mdbox (Multi-dbox) rather than dbox, however I beleive the meta data is store with the mail msg in version 1.x and 2.x meta is not updated within the msg which would be better for ZFS. What I am saying is msg per file which is not updated is better for snapshots. I belive 2.x version of single-dbox should be better (ie meta data is no longer stored with the msg) compared with 1.x dbox for snapshots. Cheers -- This message posted from opensolaris.org