thr3ads.net - zfs discuss - [zfs-discuss] zpool fragmentation issues? [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Bill Sprouse

2009-Dec-16 01:28 UTC

[zfs-discuss] zpool fragmentation issues?

Hi Everyone,

I hope this is the right forum for this question.  A customer is using  
a Thumper as an NFS file server to provide the mail store for multiple  
email servers (Dovecot).  They find that when a zpool is freshly  
created and populated with mail boxes, even to the extent of 80-90%  
capacity, performance is ok for the users, backups and scrubs take a  
few hours (4TB of data). There are around 100 file systems.  After  
running for a while (couple of months) the zpool seems to get  
"fragmented", backups take 72 hours and a scrub takes about 180  
hours.  They are running mirrors with about 5TB usable per pool (500GB  
disks).  Being a mail store, the writes and reads are small and  
random.  Record size has been set to 8k (improved performance  
dramatically).  The backup application is Amanda.  Once backups become  
too tedious, the remedy is to replicate the pool and start over.   
Things get fast again for a while.

Is this expected behavior given the application (email - small, random  
writes/reads)?  Are there recommendations for system/ZFS/NFS  
configurations to improve this sort of thing?  Are there best  
practices for structuring backups to avoid a directory walk?

Thanks,
bill

Bill Sommerfeld

2009-Dec-16 02:24 UTC

head link

[zfs-discuss] zpool fragmentation issues?

On Tue, 2009-12-15 at 17:28 -0800, Bill Sprouse wrote:> After  
> running for a while (couple of months) the zpool seems to get  
> "fragmented", backups take 72 hours and a scrub takes about 180  
> hours. 
Are there periodic snapshots being created in this pool?  

Can they run with atime turned off?

(file tree walks performed by backups will update the atime of all
directories; this will generate extra write traffic and also cause
snapshots to diverge from their parents and take longer to scrub).

					- Bill

Bob Friesenhahn

2009-Dec-16 02:41 UTC

head link

[zfs-discuss] zpool fragmentation issues?

On Tue, 15 Dec 2009, Bill Sprouse wrote:
> Hi Everyone,
>
> I hope this is the right forum for this question.  A customer is using a 
> Thumper as an NFS file server to provide the mail store for multiple email 
> servers (Dovecot).  They find that when a zpool is freshly created and
It seems that Dovecot''s speed optimizations for mbox format are 
specially designed to break zfs

  
"http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Speed_Optimizations"

and explains why using a tiny 8k recordsize temporarily "improved" 
performance.  Tiny updates seem to be abnormal for a mail server. 
The many tiny updates combined with zfs COW conspire to spread the 
data around the disk, requiring a seek for each 8k of data.  If more 
data was written at once, and much larger blocks were used, then the 
filesystem would continue to perform much better, although perhaps 
less well initially.  If the system has sufficient RAM, or a large 
enough L2ARC, then Dovecot''s optimizations to diminish reads become 
meaningless.
> Is this expected behavior given the application (email - small, random 
> writes/reads)?  Are there recommendations for system/ZFS/NFS configurations
> to improve this sort of thing?  Are there best practices for structuring 
> backups to avoid a directory walk?
Zfs works best when whole files are re-written rather than updated in 
place as Dovecot seems to want to do.  Either the user mailboxes 
should be re-written entirely when they are "expunged" or else a 
different mail storage format which writes entire files, or much 
larger records, should be used.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Michael Herf

2009-Dec-16 03:02 UTC

head link

[zfs-discuss] zpool fragmentation issues?

I have also had slow scrubbing on filesystems with lots of files, and I
agree that it does seem to degrade badly. For me, it seemed to go from 24
hours to 72 hours in a matter of a few weeks.

I did these things on a pool in-place, which helped a lot (no rebuilding):
1. reduced number of snapshots (auto snapshots can generate a lot of files).

2. disabled compression and rebuilt affected datasets (is compression on?)
3. upgraded to b129, which has metadata prefetch for scrub, seems to help by
~2x?
4. tar''d up some extremely large folders
5. added 50% more RAM.
6. turned off atime

My scrubs went from 80 hours to 12 with these changes. (4TB used, ~10M files
+ 10 snapshots each.)

I haven''t figured out if "disable compression" vs.
"fewer snapshots/files
and more RAM" made a bigger difference. I''m assuming that once the
number of
files exceeds ARC, you get dramatically lower performance, and maybe that
compression has some additional overhead, but I don''t know, this is
just
what worked.

It would be nice to have a benchmark set for features like this & general
recommendations for RAM/ARC size, based on number of files, etc. How does
ARC usage scale with snapshots? Scrub on a huge maildir machine seems like
it would make a nice benchmark.

I used "zdb -d pool" to figure out which filesystems had a lot of
objects,
and figured out places to trim based on that.

mike

On Tue, Dec 15, 2009 at 6:41 PM, Bob Friesenhahn <
bfriesen at simple.dallas.tx.us> wrote:
> On Tue, 15 Dec 2009, Bill Sprouse wrote:
>
>  Hi Everyone,
>>
>> I hope this is the right forum for this question.  A customer is using
a
>> Thumper as an NFS file server to provide the mail store for multiple
email
>> servers (Dovecot).  They find that when a zpool is freshly created and
>>
>
> It seems that Dovecot''s speed optimizations for mbox format are
specially
> designed to break zfs
>
>  "
> http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Speed_Optimizations
> "
>
> and explains why using a tiny 8k recordsize temporarily
"improved"
> performance.  Tiny updates seem to be abnormal for a mail server. The many
> tiny updates combined with zfs COW conspire to spread the data around the
> disk, requiring a seek for each 8k of data.  If more data was written at
> once, and much larger blocks were used, then the filesystem would continue
> to perform much better, although perhaps less well initially.  If the
system
> has sufficient RAM, or a large enough L2ARC, then Dovecot''s
optimizations to
> diminish reads become meaningless.
>
>
>  Is this expected behavior given the application (email - small, random
>> writes/reads)?  Are there recommendations for system/ZFS/NFS
configurations
>> to improve this sort of thing?  Are there best practices for
structuring
>> backups to avoid a directory walk?
>>
>
> Zfs works best when whole files are re-written rather than updated in place
> as Dovecot seems to want to do.  Either the user mailboxes should be
> re-written entirely when they are "expunged" or else a different
mail
> storage format which writes entire files, or much larger records, should be
> used.
>
> Bob
> --
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091215/6068172f/attachment.html>

Brent Jones

2009-Dec-16 05:38 UTC

head link

[zfs-discuss] zpool fragmentation issues?

On Tue, Dec 15, 2009 at 5:28 PM, Bill Sprouse <Bill.Sprouse at sun.com>
wrote:> Hi Everyone,
>
> I hope this is the right forum for this question. ?A customer is using a
> Thumper as an NFS file server to provide the mail store for multiple email
> servers (Dovecot). ?They find that when a zpool is freshly created and
> populated with mail boxes, even to the extent of 80-90% capacity,
> performance is ok for the users, backups and scrubs take a few hours (4TB
of
> data). There are around 100 file systems. ?After running for a while
(couple
> of months) the zpool seems to get "fragmented", backups take 72
hours and a
> scrub takes about 180 hours. ?They are running mirrors with about 5TB
usable
> per pool (500GB disks). ?Being a mail store, the writes and reads are small
> and random. ?Record size has been set to 8k (improved performance
> dramatically). ?The backup application is Amanda. ?Once backups become too
> tedious, the remedy is to replicate the pool and start over. ?Things get
> fast again for a while.
>
> Is this expected behavior given the application (email - small, random
> writes/reads)? ?Are there recommendations for system/ZFS/NFS configurations
> to improve this sort of thing? ?Are there best practices for structuring
> backups to avoid a directory walk?
>
> Thanks,
> bill
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
Anyone reason in particular they chose to use Dovecot with the old Mbox format?
Mbox has been proven many times over to be painfully slow when the
files get larger, and in this day and age, I can''t imagine anyone
having smaller than a 50MB mailbox. We have about 30,000 e-mail users
on various systems, and it seems the average size these days is
approaching close to a GB. Though Dovecot has done a lot to improve
the performance of Mbox mailboxes, Maildir might be more rounded for
your system.

I wonder if the "soon to be released" block/parity rewrite tool will
"freshen" up a pool thats heavily fragmented, without having to redo
the pools.

-- 
Brent Jones
brent at servuhome.net

Darren J Moffat

2009-Dec-16 09:18 UTC

head link

[zfs-discuss] zpool fragmentation issues?

Michael Herf wrote:> I have also had slow scrubbing on filesystems with lots of files, and I 
> agree that it does seem to degrade badly. For me, it seemed to go from 
> 24 hours to 72 hours in a matter of a few weeks.
> 
> I did these things on a pool in-place, which helped a lot (no rebuilding):
> 2. disabled compression and rebuilt affected datasets (is compression on?)
That one shouldn''t have made any difference because if the data is only
being read for the purposes of a scrub it won''t be uncompressed.

What probably made more difference was the fact that you "rebuild"
some
datasets and if you didn''t export the pool those would now all possibly
be hot in the ARC.

-- 
Darren J Moffat

Bill Sprouse

2009-Dec-16 15:40 UTC

head link

[zfs-discuss] zpool fragmentation issues?

On Dec 15, 2009, at 6:24 PM, Bill Sommerfeld wrote:
> On Tue, 2009-12-15 at 17:28 -0800, Bill Sprouse wrote:
>> After
>> running for a while (couple of months) the zpool seems to get
>> "fragmented", backups take 72 hours and a scrub takes about
180
>> hours.
>
> Are there periodic snapshots being created in this pool?
Yes, every two hours.
>
> Can they run with atime turned off?
I''m not sure, but I expect they can.  I''ll ask.
>
> (file tree walks performed by backups will update the atime of all
> directories; this will generate extra write traffic and also cause
> snapshots to diverge from their parents and take longer to scrub).
>
> 					- Bill
>
Thanks!

Bill Sprouse

2009-Dec-16 15:42 UTC

head link

[zfs-discuss] zpool fragmentation issues?

Hi Bob,

On Dec 15, 2009, at 6:41 PM, Bob Friesenhahn wrote:
> On Tue, 15 Dec 2009, Bill Sprouse wrote:
>
>> Hi Everyone,
>>
>> I hope this is the right forum for this question.  A customer is  
>> using a Thumper as an NFS file server to provide the mail store for  
>> multiple email servers (Dovecot).  They find that when a zpool is  
>> freshly created and
>
> It seems that Dovecot''s speed optimizations for mbox format are  
> specially designed to break zfs
>
> 
"http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Speed_Optimizations
> "
>
> and explains why using a tiny 8k recordsize temporarily
"improved"
> performance.  Tiny updates seem to be abnormal for a mail server.  
> The many tiny updates combined with zfs COW conspire to spread the  
> data around the disk, requiring a seek for each 8k of data.  If more  
> data was written at once, and much larger blocks were used, then the  
> filesystem would continue to perform much better, although perhaps  
> less well initially.  If the system has sufficient RAM, or a large  
> enough L2ARC, then Dovecot''s optimizations to diminish reads
become
> meaningless.
I think one of the reasons they went to small recordsizes was an issue  
where they were getting killed with reads of small messages and having  
to pull in 128K records each time.  The smaller recordsizes seem to  
have improved that aspect at least.  Thanks for the pointer to the  
Dovecot notes.
>
>> Is this expected behavior given the application (email - small,  
>> random writes/reads)?  Are there recommendations for system/ZFS/NFS  
>> configurations to improve this sort of thing?  Are there best  
>> practices for structuring backups to avoid a directory walk?
>
> Zfs works best when whole files are re-written rather than updated  
> in place as Dovecot seems to want to do.  Either the user mailboxes  
> should be re-written entirely when they are "expunged" or else a
> different mail storage format which writes entire files, or much  
> larger records, should be used.
>
> Bob
> --
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bill Sprouse

2009-Dec-16 15:44 UTC

head link

[zfs-discuss] zpool fragmentation issues?

Thanks MIchael,

Useful stuff to try.  I wish we could add more memory, but the x4500  
is limited to 16GB.  Compression was a question.  Its currently off,  
but they were thinking of turning it on.

bill

On Dec 15, 2009, at 7:02 PM, Michael Herf wrote:
> I have also had slow scrubbing on filesystems with lots of files,  
> and I agree that it does seem to degrade badly. For me, it seemed to  
> go from 24 hours to 72 hours in a matter of a few weeks.
>
> I did these things on a pool in-place, which helped a lot (no  
> rebuilding):
> 1. reduced number of snapshots (auto snapshots can generate a lot of  
> files).
> 2. disabled compression and rebuilt affected datasets (is  
> compression on?)
> 3. upgraded to b129, which has metadata prefetch for scrub, seems to  
> help by ~2x?
> 4. tar''d up some extremely large folders
> 5. added 50% more RAM.
> 6. turned off atime
>
> My scrubs went from 80 hours to 12 with these changes. (4TB used,  
> ~10M files + 10 snapshots each.)
>
> I haven''t figured out if "disable compression" vs.
"fewer snapshots/
> files and more RAM" made a bigger difference. I''m assuming
that once
> the number of files exceeds ARC, you get dramatically lower  
> performance, and maybe that compression has some additional  
> overhead, but I don''t know, this is just what worked.
>
> It would be nice to have a benchmark set for features like this &  
> general recommendations for RAM/ARC size, based on number of files,  
> etc. How does ARC usage scale with snapshots? Scrub on a huge  
> maildir machine seems like it would make a nice benchmark.
>
> I used "zdb -d pool" to figure out which filesystems had a lot of
> objects, and figured out places to trim based on that.
>
> mike
>
> On Tue, Dec 15, 2009 at 6:41 PM, Bob Friesenhahn <bfriesen at
simple.dallas.tx.us
> > wrote:
> On Tue, 15 Dec 2009, Bill Sprouse wrote:
>
> Hi Everyone,
>
> I hope this is the right forum for this question.  A customer is  
> using a Thumper as an NFS file server to provide the mail store for  
> multiple email servers (Dovecot).  They find that when a zpool is  
> freshly created and
>
> It seems that Dovecot''s speed optimizations for mbox format are  
> specially designed to break zfs
>
> 
"http://wiki.dovecot.org/MailboxFormat/mbox#Dovecot.27s_Speed_Optimizations
> "
>
> and explains why using a tiny 8k recordsize temporarily
"improved"
> performance.  Tiny updates seem to be abnormal for a mail server.  
> The many tiny updates combined with zfs COW conspire to spread the  
> data around the disk, requiring a seek for each 8k of data.  If more  
> data was written at once, and much larger blocks were used, then the  
> filesystem would continue to perform much better, although perhaps  
> less well initially.  If the system has sufficient RAM, or a large  
> enough L2ARC, then Dovecot''s optimizations to diminish reads
become
> meaningless.
>
>
> Is this expected behavior given the application (email - small,  
> random writes/reads)?  Are there recommendations for system/ZFS/NFS  
> configurations to improve this sort of thing?  Are there best  
> practices for structuring backups to avoid a directory walk?
>
> Zfs works best when whole files are re-written rather than updated  
> in place as Dovecot seems to want to do.  Either the user mailboxes  
> should be re-written entirely when they are "expunged" or else a
> different mail storage format which writes entire files, or much  
> larger records, should be used.
>
> Bob
> --
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091216/d1b9018d/attachment.html>

Bill Sprouse

2009-Dec-16 15:47 UTC

head link

[zfs-discuss] zpool fragmentation issues?

Hi Brent,

I''m not sure why Dovecot was chosen.  It was most likely a  
recommendation by a fellow University.  I agree that it lacking in  
efficiencies in a lot of areas.  I don''t think I would be successful  
in suggesting a change at this point as I have already suggested a  
couple of alternatives without success.

Do you a have a pointer to the "block/parity rewrite" tool mentioned  
below?

bill

On Dec 15, 2009, at 9:38 PM, Brent Jones wrote:
> On Tue, Dec 15, 2009 at 5:28 PM, Bill Sprouse <Bill.Sprouse at
sun.com>
> wrote:
>> Hi Everyone,
>>
>> I hope this is the right forum for this question.  A customer is  
>> using a
>> Thumper as an NFS file server to provide the mail store for  
>> multiple email
>> servers (Dovecot).  They find that when a zpool is freshly created  
>> and
>> populated with mail boxes, even to the extent of 80-90% capacity,
>> performance is ok for the users, backups and scrubs take a few  
>> hours (4TB of
>> data). There are around 100 file systems.  After running for a  
>> while (couple
>> of months) the zpool seems to get "fragmented", backups take
72
>> hours and a
>> scrub takes about 180 hours.  They are running mirrors with about  
>> 5TB usable
>> per pool (500GB disks).  Being a mail store, the writes and reads  
>> are small
>> and random.  Record size has been set to 8k (improved performance
>> dramatically).  The backup application is Amanda.  Once backups  
>> become too
>> tedious, the remedy is to replicate the pool and start over.   
>> Things get
>> fast again for a while.
>>
>> Is this expected behavior given the application (email - small,  
>> random
>> writes/reads)?  Are there recommendations for system/ZFS/NFS  
>> configurations
>> to improve this sort of thing?  Are there best practices for  
>> structuring
>> backups to avoid a directory walk?
>>
>> Thanks,
>> bill
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>
> Anyone reason in particular they chose to use Dovecot with the old  
> Mbox format?
> Mbox has been proven many times over to be painfully slow when the
> files get larger, and in this day and age, I can''t imagine anyone
> having smaller than a 50MB mailbox. We have about 30,000 e-mail users
> on various systems, and it seems the average size these days is
> approaching close to a GB. Though Dovecot has done a lot to improve
> the performance of Mbox mailboxes, Maildir might be more rounded for
> your system.
>
> I wonder if the "soon to be released" block/parity rewrite tool
will
> "freshen" up a pool thats heavily fragmented, without having to
redo
> the pools.
>
> -- 
> Brent Jones
> brent at servuhome.net

Damon Atkins

2009-Dec-16 16:33 UTC

head link

[zfs-discuss] zpool fragmentation issues? (dovecot)

Any small updates to a file cause the file to be fragmented.
The best Mail Box to use under Dovecot for ZFS is MailDir, each email is store
as a individual file. Fair bit of statis info is kept in a index file, but also
the filename it self is also used. The only problem with it is backups take
longer as they are more smaller files (but this maybe better than what you are
getting at the moment, if it is badly fragmented)

See
http://wiki.dovecot.org/MailboxFormat/Maildir
http://www.linuxmail.info/mbox-maildir-mail-storage-formats/

The MailDir format will also work better with snapshots.

Cheers
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Dec-16 17:02 UTC

head link

[zfs-discuss] zpool fragmentation issues?

On Wed, 16 Dec 2009, Bill Sprouse wrote:>
> I think one of the reasons they went to small recordsizes was an issue
where
> they were getting killed with reads of small messages and having to pull in
> 128K records each time.  The smaller recordsizes seem to have improved that
> aspect at least.  Thanks for the pointer to the Dovecot notes.
This is likely due to insufficient RAM.  Zfs performs very poorly if 
it is not able to cache full records in RAM but the (several/many) 
accesses are smaller than the record size.

Dovecot is clearly optimized for a different type of file system.

Something which is rarely mentioned is that zfs pools may be less 
fragmented on systems with lots of memory.  The reason for this is 
that writes may be postponed to a time when there is more data to 
write (up to 30 seconds), and therefore more data is written 
contiguously or with a better layout.  Synchronous write requests tend 
to defeat this, but perhaps using a SSD as an intent log may help so 
that synchronous writes to disk may also be deferred.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Toby Thain

2009-Dec-16 19:28 UTC

head link

[zfs-discuss] zpool fragmentation issues?

On 16-Dec-09, at 10:47 AM, Bill Sprouse wrote:
> Hi Brent,
>
> I''m not sure why Dovecot was chosen.  It was most likely a  
> recommendation by a fellow University.  I agree that it lacking in  
> efficiencies in a lot of areas.  I don''t think I would be  
> successful in suggesting a change at this point as I have already  
> suggested a couple of alternatives without success.
(As Damon pointed out) The problem seems not Dovecot per se but the  
choice of mbox format, which is rather self-evidently inefficient.

>
> Do you a have a pointer to the "block/parity rewrite" tool  
> mentioned below?
>
It headlines the informal roadmap presented by Jeff Bonwick.

http://www.snia.org/events/storage-developer2009/presentations/monday/ 
JeffBonwick_zfs-What_Next-SDC09.pdf


--Toby

> bill
>
> On Dec 15, 2009, at 9:38 PM, Brent Jones wrote:
>
>> On Tue, Dec 15, 2009 at 5:28 PM, Bill Sprouse  
>> <Bill.Sprouse at sun.com> wrote:
>>> Hi Everyone,
>>>
>>> I hope this is the right forum for this question.  A customer is  
>>> using a
>>> Thumper as an NFS file server to provide the mail store for  
>>> multiple email
>>> servers (Dovecot).  They find that when a zpool is freshly  
>>> created and
>>> populated with mail boxes, even to the extent of 80-90% capacity,
>>> performance is ok for the users, backups and scrubs take a few  
>>> hours (4TB of
>>> data). There are around 100 file systems.  After running for a  
>>> while (couple
>>> of months) the zpool seems to get "fragmented", backups
take 72
>>> hours and a
>>> scrub takes about 180 hours.  They are running mirrors with about  
>>> 5TB usable
>>> per pool (500GB disks).  Being a mail store, the writes and reads  
>>> are small
>>> and random.  Record size has been set to 8k (improved performance
>>> dramatically).  The backup application is Amanda.  Once backups  
>>> become too
>>> tedious, the remedy is to replicate the pool and start over.   
>>> Things get
>>> fast again for a while.
>>>
>>> Is this expected behavior given the application (email - small,  
>>> random
>>> writes/reads)?  Are there recommendations for system/ZFS/NFS  
>>> configurations
>>> to improve this sort of thing?  Are there best practices for  
>>> structuring
>>> backups to avoid a directory walk?
>>>
>>> Thanks,
>>> bill
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>
>>
>> Anyone reason in particular they chose to use Dovecot with the old  
>> Mbox format?
>> Mbox has been proven many times over to be painfully slow when the
>> files get larger, and in this day and age, I can''t imagine
anyone
>> having smaller than a 50MB mailbox. We have about 30,000 e-mail users
>> on various systems, and it seems the average size these days is
>> approaching close to a GB. Though Dovecot has done a lot to improve
>> the performance of Mbox mailboxes, Maildir might be more rounded for
>> your system.
>>
>> I wonder if the "soon to be released" block/parity rewrite
tool will
>> "freshen" up a pool thats heavily fragmented, without having
to redo
>> the pools.
>>
>> -- 
>> Brent Jones
>> brent at servuhome.net
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Bob Friesenhahn

2009-Dec-16 20:00 UTC

head link

[zfs-discuss] zpool fragmentation issues?

On Wed, 16 Dec 2009, Toby Thain wrote:>
> (As Damon pointed out) The problem seems not Dovecot per se but the choice
of
> mbox format, which is rather self-evidently inefficient.
Note that Bill never told us what mail storage format was used.  I was 
the one who suggested/assumed that ''mbox'' format was being
used since
the described behavior suggested it.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bill Sprouse

2009-Dec-16 21:18 UTC

head link

[zfs-discuss] zpool fragmentation issues?

Just checked w/customer and they are using the MailDir functionality  
with Dovecot.

On Dec 16, 2009, at 11:28 AM, Toby Thain wrote:
>
> On 16-Dec-09, at 10:47 AM, Bill Sprouse wrote:
>
>> Hi Brent,
>>
>> I''m not sure why Dovecot was chosen.  It was most likely a  
>> recommendation by a fellow University.  I agree that it lacking in  
>> efficiencies in a lot of areas.  I don''t think I would be  
>> successful in suggesting a change at this point as I have already  
>> suggested a couple of alternatives without success.
>
> (As Damon pointed out) The problem seems not Dovecot per se but the  
> choice of mbox format, which is rather self-evidently inefficient.
>
>
>>
>> Do you a have a pointer to the "block/parity rewrite" tool  
>> mentioned below?
>>
>
> It headlines the informal roadmap presented by Jeff Bonwick.
>
>
http://www.snia.org/events/storage-developer2009/presentations/monday/JeffBonwick_zfs-What_Next-SDC09.pdf
>
>
> --Toby
>
>
>> bill
>>
>> On Dec 15, 2009, at 9:38 PM, Brent Jones wrote:
>>
>>> On Tue, Dec 15, 2009 at 5:28 PM, Bill Sprouse  
>>> <Bill.Sprouse at sun.com> wrote:
>>>> Hi Everyone,
>>>>
>>>> I hope this is the right forum for this question.  A customer
is
>>>> using a
>>>> Thumper as an NFS file server to provide the mail store for  
>>>> multiple email
>>>> servers (Dovecot).  They find that when a zpool is freshly  
>>>> created and
>>>> populated with mail boxes, even to the extent of 80-90%
capacity,
>>>> performance is ok for the users, backups and scrubs take a few
>>>> hours (4TB of
>>>> data). There are around 100 file systems.  After running for a
>>>> while (couple
>>>> of months) the zpool seems to get "fragmented",
backups take 72
>>>> hours and a
>>>> scrub takes about 180 hours.  They are running mirrors with
about
>>>> 5TB usable
>>>> per pool (500GB disks).  Being a mail store, the writes and
reads
>>>> are small
>>>> and random.  Record size has been set to 8k (improved
performance
>>>> dramatically).  The backup application is Amanda.  Once backups
>>>> become too
>>>> tedious, the remedy is to replicate the pool and start over.   
>>>> Things get
>>>> fast again for a while.
>>>>
>>>> Is this expected behavior given the application (email - small,
>>>> random
>>>> writes/reads)?  Are there recommendations for system/ZFS/NFS  
>>>> configurations
>>>> to improve this sort of thing?  Are there best practices for  
>>>> structuring
>>>> backups to avoid a directory walk?
>>>>
>>>> Thanks,
>>>> bill
>>>> _______________________________________________
>>>> zfs-discuss mailing list
>>>> zfs-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>>
>>>
>>> Anyone reason in particular they chose to use Dovecot with the old
>>> Mbox format?
>>> Mbox has been proven many times over to be painfully slow when the
>>> files get larger, and in this day and age, I can''t imagine
anyone
>>> having smaller than a 50MB mailbox. We have about 30,000 e-mail  
>>> users
>>> on various systems, and it seems the average size these days is
>>> approaching close to a GB. Though Dovecot has done a lot to improve
>>> the performance of Mbox mailboxes, Maildir might be more rounded
for
>>> your system.
>>>
>>> I wonder if the "soon to be released" block/parity
rewrite tool will
>>> "freshen" up a pool thats heavily fragmented, without
having to redo
>>> the pools.
>>>
>>> -- 
>>> Brent Jones
>>> brent at servuhome.net
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Damon Atkins

2009-Dec-17 10:02 UTC

head link

[zfs-discuss] zpool fragmentation issues?

Read this http://wiki.dovecot.org/MailLocation/SharedDisk

If you were running Dovecot on the Thumper, mmap has issues under ZFS, old
versions of ZFS (not sure if it is fixed in Sol10), so switch this off 
mmap_disable = yes as per the URL above for over NFS.

Ensure NFS is tuned to 32K read and 32k writes (this will not help much because
dovecot does small I/O, its the default on Solaris Clients, not Linux), use
Jumbo frames if you can, use NFSv4.
 
You could create a Caching NFS file system on the clients if they are Solaris.

I assume the backups are not over NFS.

The other choice is to run Dovecot on the Thumper.

I beleive the MailDir format from DoveCot is only ever written once, if it is
re-write the Mail Client is updating the email (and the whole email should be
rewritten efectively) . At home Headers within emails 100% < 1k, 64% of email
bodies are under 16k, 4% > 128k.

I am surprised that 8k would help

Given the files are not updated, the effective record size for a file will be
the file size. A small record size might help with the indexes.

Is the system running recent patches?
-- 
This message posted from opensolaris.org

Michael Keller

2010-Jan-15 04:43 UTC

head link

[zfs-discuss] zpool fragmentation issues? (dovecot)

> The best Mail Box to use under Dovecot for ZFS is
> MailDir, each email is store as a individual file.
Can not agree on that. dbox is about 10x faster - at least if you have >
10000 messages in one mailbox folder. Thats not because of ZFS but dovecot just
handles dbox files (one for each message like maildir) better in terms of
indexing.

The CPU stats for importing > 100000 messages via imap copy are even more
worse for maildir: dbox is about 100x more efficient... .

But anyway: its no problem to test different with imaptest or offlineimap
because each users mailbox (and even folders) could be stored in a different
format...

Just to clarify: I''m using dovecot 1.2.x
-- 
This message posted from opensolaris.org

Wilkinson, Alex

2010-Jan-15 04:59 UTC

head link

[zfs-discuss] zpool fragmentation issues? (dovecot) [SEC=UNCLASSIFIED]

0n Thu, Jan 14, 2010 at 08:43:06PM -0800, Michael Keller wrote: 

    >> The best Mail Box to use under Dovecot for ZFS is
    >> MailDir, each email is store as a individual file.
    >
    >Can not agree on that. dbox is about 10x faster - at least if you have
> 10000 messages in one mailbox
    >folder. Thats not because of ZFS but dovecot just handles dbox files
(one for each message like maildir) better in terms of indexing.

Got a link to this magic dbox format ?

  -Alex

IMPORTANT: This email remains the property of the Australian Defence
Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT
1914.  If you have received this email in error, you are requested to contact
the sender and delete the email.

Michael Keller

2010-Jan-15 14:55 UTC

head link

[zfs-discuss] zpool fragmentation issues? (dovecot) [SEC=UNCLASSIFIED]

> Got a link to this magic dbox format ?
http://wiki.dovecot.org/MailboxFormat
http://wiki.dovecot.org/MailboxFormat/dbox
-- 
This message posted from opensolaris.org

Damon Atkins

2010-Jan-16 13:53 UTC

head link

[zfs-discuss] zpool fragmentation issues? (dovecot)

According to DoveCot Wiki dbox files are re-written by a secondary process. ie
delete do not happen immediately, they happen latter as a background process and
the whole message file is re-written. You can set a size limit on message files.

Some time ago I email Tim, on a few ideas to make it more ZFS friendly. I.e. to
try and prevent rewrites.   If you use dbox and keeping snapshots you will eat
your disk up. MailDir is a lot friendlier to snapshots, but it will be slower
for backups or searching text within the body of lots of email.  Ie there are
pro?s and cons with ZFS. Personal I will go for snapshots as being more
important as I take them about 10 times a day and keep them for 7 days. Also
MailDirs are easier to restore and individual email. It comes down to pro?s and
con?s. Unfortunate performance is always the most important goal.

Cheers
Damon.
-- 
This message posted from opensolaris.org

Damon Atkins

2010-Jan-16 14:32 UTC

head link

[zfs-discuss] zpool fragmentation issues? (dovecot)

In my previous post I was refering more to mdbox (Multi-dbox) rather than dbox,
however I beleive the meta data is store with the mail msg in version 1.x and
2.x meta is not updated within the msg which would be better for ZFS.

What I am saying is msg per file which is not updated is better for snapshots. 
I belive 2.x version of single-dbox should be better (ie meta data is no longer
stored with the msg) compared with 1.x dbox for snapshots.

Cheers
-- 
This message posted from opensolaris.org

zfs discuss - Dec 2009 - zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues? (dovecot)

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues?

[zfs-discuss] zpool fragmentation issues? (dovecot)

[zfs-discuss] zpool fragmentation issues? (dovecot) [SEC=UNCLASSIFIED]

[zfs-discuss] zpool fragmentation issues? (dovecot) [SEC=UNCLASSIFIED]

[zfs-discuss] zpool fragmentation issues? (dovecot)

[zfs-discuss] zpool fragmentation issues? (dovecot)