Peter Eriksson
2008-Sep-15 11:49 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
I wonder if there exists some tool that can be used to figure out an optimal ZFS recordsize configuration? Specifically for a mail server using Maildir (one ZFS filesystem per user). Ie, lot''s of small files (one file per email). -- This message posted from opensolaris.org
Vincent Fox
2008-Sep-15 17:02 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
I''m interested in this. We are actually running default recordsize with 10K users per server and no I/O problems however so it''s not apparently a primary parameter to worry about. -- This message posted from opensolaris.org
Marcelo Leal
2008-Sep-15 18:07 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
That?s an interesting question, because maildir vs mailbox is a old discussion, and i think maildir was having more fans.. but with ZFS, would be nice a comparison between 100K users, each having ~1k emails. ZFS maildir or mailbox? -- This message posted from opensolaris.org
Bob Friesenhahn
2008-Sep-15 18:25 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
On Mon, 15 Sep 2008, Marcelo Leal wrote:> That?s an interesting question, because maildir vs mailbox is a old > discussion, and i think maildir was having more fans.. but with ZFS, > would be nice a comparison between 100K users, each having ~1k > emails. ZFS maildir or mailbox?It seems that the days of 1K emails are long gone, to the benefit of big-block ZFS and detriment of small-block legacy filesystems. Typical emails in office environments must be in the 128K to 1MB range now due to use of bulky HTML and Word/Powerpoint attachments. It is important to remember that ZFS is ideal for writing new files from scratch. It is not so good at updating existing files unless the updates are sufficiently large, or perfectly overwrite existing data blocks. This is because with 128K blocks, ZFS needs to read the underlying 128K block in order to apply an update to part of the block. The main salvation here is to install lots of RAM so the data block is already cached and a disk read is not required. Then ZFS flies. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Vincent Fox
2008-Sep-15 18:30 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
We dumped UWash and went with Cyrus about 2 years ago when our mail system was buckling. We found that with mailbox format the user experience became chokingly bad at 2,500 users per backend mailstore. And this was with fairly small quotas. So doing a side by side experiment on same hardware and attempting 100K users I expect will have a FAIL in the mailbox column. -- This message posted from opensolaris.org
Eric Schrock
2008-Sep-15 18:31 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
On Mon, Sep 15, 2008 at 01:25:06PM -0500, Bob Friesenhahn wrote:> > It is important to remember that ZFS is ideal for writing new files > from scratch. It is not so good at updating existing files unless the > updates are sufficiently large, or perfectly overwrite existing data > blocks. This is because with 128K blocks, ZFS needs to read the > underlying 128K block in order to apply an update to part of the > block. The main salvation here is to install lots of RAM so the data > block is already cached and a disk read is not required. Then ZFS > flies. >Or tune down the recordsize to a value that is appropriate for your workload. ZFS always uses the smallest possible block size, so this is only a problem when your do writes to a large file in a small chunks (the size of the record you are trying to update), hence the property to control this behavior. - Eric -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock
Marcelo Leal
2008-Sep-15 18:40 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
filebench varmail is based on mailbox or maildir? I think is a good point to have a performance comparison. 100k users having 1k messages of 30k average... I think the great diff would be the time spent to delivery the mail (mta) vs the latency to read (maildir needs to scan all files to read the headers). How would be the behaviour of copy-on-write, with many syncs in the same mailbox file (tiny blocks reallocated on a big file)? -- This message posted from opensolaris.org
Vincent Fox
2008-Sep-15 19:31 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
I''d suggest before making a bunch of assumptions, then running dubious benchmarks based on those assumptions, you get some data from production mailservers of decent size. Our finding with monolithic mailboxes was every little read & write operation having to process a multi-megabyte mailbox became a significant bottleneck. Write outnumbered reads true, because mail arrives frequently and people read infrequently. However where people NOTICE the problem with mailbox is they go to do that read and it takes a lot longer with a large mailbox file than it does with maildir. In maildir they typically have a database of some sort to pre-store all the frequently-read header info so it appears to the mail-client very quickly. I promise not to hijack this into a mailbox/maildir discussion. Return to point, you need more real data before wasting time with benchmarking. -- This message posted from opensolaris.org
Marcelo Leal
2008-Sep-15 19:42 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
> I''d suggest before making a bunch of assumptions, > then running dubious benchmarks based on those > assumptions, you get some data from production > mailservers of decent size.What assumptions? What i did was just questions, i would like to get answers... i did not say in any time that "is that" or "another way". You did make assumptions from what i did say.> > Our finding with monolithic mailboxes was every > little read & write operation having to process a > multi-megabyte mailbox became a significant > bottleneck. Write outnumbered reads true, because > mail arrives frequently and people read infrequently. > However where people NOTICE the problem with mailbox > is they go to do that read and it takes a lot longer > with a large mailbox file than it does with maildir. > In maildir they typically have a database of some > sort to pre-store all the frequently-read header > info so it appears to the mail-client very quickly. >Ok, that is your experience, based on your workload. I think many things are crucial to understand a specific workload. You need to know if the mailserver is for "gmail" users, or for "work/business" users. The average size of the mailbox/maildir depends on spam policies, gmail/business accounts, etc... If you look at the web for average mail size (in general), you will see numbers a lot bigger than yours.> I promise not to hijack this into a mailbox/maildir > discussion. Return to point, you need more real data > before wasting time with benchmarking.Nobody is wasting time here... and trying to simulate a worload in filebench is not waste time... with filebench we can create a general purpose workload to mailbox/maildir, adjust the average size of the messages to fit our environment (and the changes over time), and the number of users (directory depth, etc). peace. -- This message posted from opensolaris.org
Nils Goroll
2008-Sep-18 12:37 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
Hi,> It is important to remember that ZFS is ideal for writing new files from > scratch.IIRC, maildir MTAs never overwrite mail files. But courier-imap does maintain some additional index files which will be overwritten and I guess other IMAP servers will probably do the same. Nils
Roch Bourbonnais
2008-Oct-18 04:46 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
Leave the default recordsize. With 128K recordsize, files smaller than 128K are stored as single record tightly fitted to the smallest possible # of disk sectors. Reads and writes are then managed with fewer ops. Not tuning the recordsize is very generally more space efficient and more performant. Large DB (fixed size aligned accesses to uncacheable working set) is the exception here (tuning recordsize helps) and a few other corner cases. -r Le 15 sept. 08 ? 04:49, Peter Eriksson a ?crit :> I wonder if there exists some tool that can be used to figure out an > optimal ZFS recordsize configuration? Specifically for a mail > server using Maildir (one ZFS filesystem per user). Ie, lot''s of > small files (one file per email). > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Toby Thain
2008-Oct-18 17:43 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
On 18-Oct-08, at 12:46 AM, Roch Bourbonnais wrote:> > Leave the default recordsize. With 128K recordsize, files smaller than > 128K are stored as single record > tightly fitted to the smallest possible # of disk sectors. Reads and > writes are then managed with fewer ops. > > Not tuning the recordsize is very generally more space efficient and > more performant. > Large DB (fixed size aligned accesses to uncacheable working set) is > the exception here (tuning recordsize helps) and a few other corner > cases. > > -r > > > Le 15 sept. 08 ? 04:49, Peter Eriksson a ?crit : > >> I wonder if there exists some tool that can be used to figure out an >> optimal ZFS recordsize configuration? Specifically for a mail >> server using Maildir (one ZFS filesystem per user). Ie, lot''s of >> small files (one file per email).Emails aren''t as small as they used to be. I wouldn''t be surprised if the median size is a good portion of 128K anyway. --Toby>> -- >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Marcelo Leal
2008-Oct-21 12:50 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
Hello Roch!> > Leave the default recordsize. With 128K recordsize, > files smaller than > 128K are stored as single record > tightly fitted to the smallest possible # of disk > sectors. Reads and > writes are then managed with fewer ops.In the write ZFS is dynamic, but in the read? If i have many small files (smaller than 128K), i would not waste time reading 128K? And after the ZFS has allocated a FSB of 64K for example, if that file gets bigger, ZFS will use 64K blocks right?> > Not tuning the recordsize is very generally more > space efficient and > more performant. > Large DB (fixed size aligned accesses to uncacheable > working set) is > the exception here (tuning recordsize helps) and a > few other corner > cases. > > -r > > > Le 15 sept. 08 ? 04:49, Peter Eriksson a ?crit : > > > I wonder if there exists some tool that can be used > to figure out an > > optimal ZFS recordsize configuration? Specifically > for a mail > > server using Maildir (one ZFS filesystem per user). > Ie, lot''s of > > small files (one file per email). > > -- > > This message posted from opensolaris.org > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss-- This message posted from opensolaris.org
Mika Borner
2008-Oct-22 16:46 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
> Leave the default recordsize. With 128K recordsize, > files smaller thanIf I turn zfs compression on, does the recordsize influence the compressratio in anyway? -- This message posted from opensolaris.org
Bob Friesenhahn
2008-Oct-22 17:24 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
On Wed, 22 Oct 2008, Mika Borner wrote:>> Leave the default recordsize. With 128K recordsize, >> files smaller than > > If I turn zfs compression on, does the recordsize influence the > compressratio in anyway?Yes, I believe so. ZFS is not going to try to compress a chunk of data larger than the blocksize since the data might not be compressible. With less data in a block there is less opportunity for redundancy and therefore less opportunity for compression. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Nicolas Williams
2008-Oct-22 17:38 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
On Tue, Oct 21, 2008 at 05:50:09AM -0700, Marcelo Leal wrote:> If i have many small files (smaller than 128K), i would not waste > time reading 128K? And after the ZFS has allocated a FSB of 64K for > example, if that file gets bigger, ZFS will use 64K blocks right?ZFS uses the smallest suitable block size and will always use a single block for files whose size is less than the dataset record size. I.e., if you have a file with 500 bytes of data then ZFS will use a 512 byte block size. If you then append 400 bytes then ZFS will switch to using a single 1024 byte block. And so on up to the record size, which defaults to 128KB. So if you have a file that''s 1MB then ZFS will use 8 128KB blocks to store that file''s data. Bottom-line: small files use small blocks -- you don''t need to worry about reading 128KB blocks for small files. For the whole maildir vs. mailbox thread, I''d say that any solution that involves an index in the MUA will give you better performance. In the maildir case you''ll always be reading files in toto, so the record size will be irrelevant. In the mailbox+index case you''ll be doing random access reads which won''t even be aligned, but which will typically be of mean-e-mail-size bytes, so setting the record size to be just larger than the mean e-mail size should help. And, of course, there''s the index, if your MUA keeps one. We just had a thread on Evolution and its SQLite3 DB -- setting the SQLite3 page size and the ZFS record size to match helps a lot. If you have large indexes then you may want to do the same. The natural record size for one part of the application may not match the natural record size for another. We really need a way to set the recordsize on a per-file basis. Nico --
Bill Sommerfeld
2008-Oct-22 19:02 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
On Wed, 2008-10-22 at 09:46 -0700, Mika Borner wrote:> If I turn zfs compression on, does the recordsize influence the > compressratio in anyway?zfs conceptually chops the data into recordsize chunks, then compresses each chunk independently, allocating on disk only the space needed to store each compressed block. On average, I''d expect to get a better compression ratio with a larger block size since typical compression algorithms will have more chance to find redundancy in a larger block of text. as always your mileage may vary. - Bill
Roch Bourbonnais
2008-Nov-27 16:25 UTC
[zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
Le 22 oct. 08 ? 21:02, Bill Sommerfeld a ?crit :> On Wed, 2008-10-22 at 09:46 -0700, Mika Borner wrote: >> If I turn zfs compression on, does the recordsize influence the >> compressratio in anyway? > > zfs conceptually chops the data into recordsize chunks, then > compresses > each chunk independently, allocating on disk only the space needed to > store each compressed block. > > On average, I''d expect to get a better compression ratio with a larger > block size since typical compression algorithms will have more > chance to > find redundancy in a larger block of text. >With gzip yes, but with default compression, I believe lzjb is looking for short ''repeating byte pattern''. It should not depend much on recordsize. -r> as always your mileage may vary. > > - Bill > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss