We run our IMAP spool on ZFS that''s derived from LUNs on a Netapp filer. There''s a great deal of churn in e-mail folders, with messages appearing and being deleted frequently. I know that ZFS uses copy-on- write, so that blocks in use are never overwritten, and that deleted blocks are added to a free list. This behavior would spread the free list all over the zpool. As well, the Netapp uses WAFL, also a variety of copy-on-write. The LUNs appear as large files on the filer. It won''t know which blocks are in use by ZFS. It would have to do copy-on-write each time, I suppose. Do we have a problem here? The Netapp has a utility that will defragment files on a volume. It must put them back into sequential order. Does ZFS have any concept of the geometry of its disks? If so, regular degragmentation on the Netapp might be a good thing. Should ZFS and the Netapp be using the same blocksize, so that they cooperate to some extent? -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
On Sun, Apr 26, 2009 at 3:52 PM, Gary Mills <mills at cc.umanitoba.ca> wrote:> We run our IMAP spool on ZFS that''s derived from LUNs on a Netapp > filer. There''s a great deal of churn in e-mail folders, with messages > appearing and being deleted frequently. I know that ZFS uses copy-on- > write, so that blocks in use are never overwritten, and that deleted > blocks are added to a free list. This behavior would spread the free > list all over the zpool. As well, the Netapp uses WAFL, also a > variety of copy-on-write. The LUNs appear as large files on the > filer. It won''t know which blocks are in use by ZFS. It would have > to do copy-on-write each time, I suppose. Do we have a problem here? >Not at all.> > The Netapp has a utility that will defragment files on a volume. It > must put them back into sequential order. Does ZFS have any concept > of the geometry of its disks? If so, regular degragmentation on the > Netapp might be a good thing.I assume you mean reallocate on the filer? This is run automatically as part of weekly maintenance. There are flags to run it more aggressively, but unless you''re actually seeing problems, I would suggest avoiding doing so.> > > Should ZFS and the Netapp be using the same blocksize, so that they > cooperate to some extent? >Just make sure ZFS is using a block size that is a multiple of 4k, which I believe it does by default. I have to ask though... why not just serve NFS off the filer to the Solaris box? ZFS on a LUN served off a filer seems to make about as much sense as sticking a ZFS based lun behind a v-filer (although the latter might actually might make sense in a world where it were supported *cough*neverhappen*cough* since you could buy the "cheap" newegg disk). --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090426/c54ebc7e/attachment.html>
>>>>> "t" == Tim <tim at tcsac.net> writes:t> why not just serve NFS off the filer there can be some benefit to the lossless FC fabric through eliminating TCP RTO''s and applying backpressure so the initiator has more control over I/O scheduling. As discussed here, block-based storage can produce fewer synchronous calls / rtt waits than NFS for workloads involving opening and closing lots of small files when you are not calling fsync on them. I state both based on theory not experience, and I''m not saying that''s Gary''s workload falls in the second category, nor that NFS is necessarily the wrong approach, but here are two reasons a sane person might plausibly decide to use the LUN interface instead. I''m sure there are more arguments for and against. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090426/b49bacf3/attachment.bin>
On Sun, Apr 26, 2009 at 05:19:18PM -0400, Ellis, Mike wrote:> As soon as you put those zfs blocks ontop of iscsi, the netapp won''t > have a clue as far as how to defrag those "iscsi files" from the > filer''s perspective. (It might do some fancy stuff based on > read/write patterns, but that''s unlikely)Since the LUN is just a large file on the Netapp, I assume that all it can do is to put the blocks back into sequential order. That might have some benefit overall. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
On Sun, Apr 26, 2009 at 05:02:38PM -0500, Tim wrote:> > On Sun, Apr 26, 2009 at 3:52 PM, Gary Mills <[1]mills at cc.umanitoba.ca> > wrote: > > We run our IMAP spool on ZFS that''s derived from LUNs on a Netapp > filer. There''s a great deal of churn in e-mail folders, with > messages > appearing and being deleted frequently.> Should ZFS and the Netapp be using the same blocksize, so that they > cooperate to some extent? > > Just make sure ZFS is using a block size that is a multiple of 4k, > which I believe it does by default.Okay, that''s good.> I have to ask though... why not just serve NFS off the filer to the > Solaris box? ZFS on a LUN served off a filer seems to make about as > much sense as sticking a ZFS based lun behind a v-filer (although the > latter might actually might make sense in a world where it were > supported *cough*neverhappen*cough* since you could buy the "cheap" > newegg disk).I prefer NFS too, but the IMAP server requires POSIX semantics. I believe that NFS doesn''t support that, at least NFS version 3. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Gary Mills wrote:> We run our IMAP spool on ZFS that''s derived from LUNs on a Netapp > filer. There''s a great deal of churn in e-mail folders, with messages > appearing and being deleted frequently. I know that ZFS uses copy-on- > write, so that blocks in use are never overwritten, and that deleted > blocks are added to a free list. This behavior would spread the free > list all over the zpool. As well, the Netapp uses WAFL, also a > variety of copy-on-write. The LUNs appear as large files on the > filer. It won''t know which blocks are in use by ZFS. It would have > to do copy-on-write each time, I suppose. Do we have a problem here? > > The Netapp has a utility that will defragment files on a volume. It > must put them back into sequential order. Does ZFS have any concept > of the geometry of its disks? If so, regular degragmentation on the > Netapp might be a good thing. >If you measure this, then please share your results. There is much speculation, but little characterization, of the "ills of COW performance."> Should ZFS and the Netapp be using the same blocksize, so that they > cooperate to some extent? > >ZFS blocksize is dynamic, power of 2, with a max size == recordsize. Writes can also be coalesced. If you want to measure the distribution, then there are a few DTrace scripts which will measure it (eg. iosnoop) I did a large e-mail server over ZFS POC earlier this year. We could handle more than 250,000 users on a T5120 message store server using decent storage (lots of spindles). Since the I/O workload for IMAP is quite a unique and demanding workload, we were very pleased with how well ZFS worked. But low-latency storage is key to maintaining such large workloads. -- richard
On 26 April, 2009 - Gary Mills sent me these 1,3K bytes:> On Sun, Apr 26, 2009 at 05:02:38PM -0500, Tim wrote: > > I have to ask though... why not just serve NFS off the filer to the > > Solaris box? ZFS on a LUN served off a filer seems to make about as > > much sense as sticking a ZFS based lun behind a v-filer (although the > > latter might actually might make sense in a world where it were > > supported *cough*neverhappen*cough* since you could buy the "cheap" > > newegg disk). > > I prefer NFS too, but the IMAP server requires POSIX semantics. > I believe that NFS doesn''t support that, at least NFS version 3.What non-POSIXness are you referring to, or is it just random old thoughts that actually doesn''t apply? Lots of people (me for instance) are using IMAP servers with data served over NFSv3.. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
> ZFS blocksize is dynamic, power of 2, with a max size == recordsize.Minor clarification: recordsize is restricted to powers of 2, but blocksize is not -- it can be any multiple of sector size (512 bytes). For small files, this matters: a 37k file is stored in a 37k block. For larger, multi-block files, the size of each block is indeed a power of 2 (simplifies the math a bit). Jeff
On Mon, April 27, 2009 02:13, Tomas ?gren wrote:> On 26 April, 2009 - Gary Mills sent me these 1,3K bytes: > >> I prefer NFS too, but the IMAP server requires POSIX semantics. >> I believe that NFS doesn''t support that, at least NFS version 3. > > What non-POSIXness are you referring to, or is it just random old > thoughts that actually doesn''t apply? > > Lots of people (me for instance) are using IMAP servers with data served > over NFSv3..Depends on the IMAP server. Cyrus for example doesn''t recommend/support it:> Using NFS: We don''t recommend it. If you want to do it, it may possibly > work but you may also lose your email or have corrupted cyrus.* files. > You can look at the mailing list archives for more information.http://cyrusimap.web.cmu.edu/imapd/faq.html As for non-POSIXness:> In fact, because XNFS provides transparent access to remote files, it is > not possible for a process to distinguish between local and remote files > before they are used. Due to the nature of the way XNFS works, there are > some semantic differences between operations on local files and > equivalent operations on remote files. > > This appendix gives a summary of these semantic differences. Together > with "Open-System Interface Semantics over XNFS" and "Open System > Utilities Semantics over XNFS" this appendix specifies differences that > can occur when using a given utility or function with a file on a remote > file system.http://www.opengroup.org/onlinepubs/9629799/apdxa.htm It''s copyright 1998, and only refers to NFSv2 and v3, so it may be out of date (especially with NFSv4[.1]).
Hello Jeff, Monday, April 27, 2009, 9:12:26 AM, you wrote:>> ZFS blocksize is dynamic, power of 2, with a max size == recordsize.JB> Minor clarification: recordsize is restricted to powers of 2, but JB> blocksize is not -- it can be any multiple of sector size (512 bytes). JB> For small files, this matters: a 37k file is stored in a 37k block. JB> For larger, multi-block files, the size of each block is indeed a JB> power of 2 (simplifies the math a bit). which is a consequence of recordsize being a power of 2 and multi-block files usually (always?) will have a block size equvalent to recordsize value. Has the issue with tail block been fixed yet? -- Best regards, Robert Milkowski http://milek.blogspot.com