hi all, I am hoping to move roughly 1TB of maildir format email to ZFS, but I am unsure of what the most appropriate disk configuration on a 3510 would be. based on the desired level of redundancy and usable space, my thought was to create a pool consisting of 2x RAID-Z vdevs (either double parity, or single parity with two hot-spares). using 300GB drives this would give roughly 2.4TB of usable space. I am presuming I will want the RAID module purely for the additional caching, and create a single LUN for each disk and present those to ZFS. the disk will most likely be directly attached to an X4100/X4200 using MPxIO, exported via NFS (some Linux NFSv3 clients, some Solaris 9, 10 and Nevada). I would like to get a feel for what others are doing in similar configurations. is the 3510 RAID module cache effective in such a configuration? I wasn''t able to find any definitive answer to this in the documentation. RAID module or no RAID module? is it worth the extra cost? any insight other ZFS users could provide would be appreciated. grant.
Hello grant, Wednesday, May 31, 2006, 4:11:09 AM, you wrote: gb> hi all, gb> I am hoping to move roughly 1TB of maildir format email to ZFS, but gb> I am unsure of what the most appropriate disk configuration on a 3510 gb> would be. gb> based on the desired level of redundancy and usable space, my thought gb> was to create a pool consisting of 2x RAID-Z vdevs (either double gb> parity, or single parity with two hot-spares). using 300GB drives gb> this would give roughly 2.4TB of usable space. gb> I am presuming I will want the RAID module purely for the additional gb> caching, and create a single LUN for each disk and present those to gb> ZFS. the disk will most likely be directly attached to an gb> X4100/X4200 using MPxIO, exported via NFS (some Linux NFSv3 clients, gb> some Solaris 9, 10 and Nevada). gb> I would like to get a feel for what others are doing in similar gb> configurations. is the 3510 RAID module cache effective in such a gb> configuration? I wasn''t able to find any definitive answer to this in gb> the documentation. RAID module or no RAID module? is it worth the gb> extra cost? gb> any insight other ZFS users could provide would be appreciated. IMHO you you want to just map disk->LUN it doesn''t make sense to buy raid controllers at all. I do have a config in which I have raid-5 done in HW raid and then raid-z from such devices - that way despite of lack of raidz2 you get better redundancy - but write performance isn''t stellar. I also have a config with 3510 JBODs directly connected to host using two links with MPxIO and raidz - works almost ok. By almost I mean it works but I can see much more IOs to disks that I should - see ''BIG IOs overhead due to ZFS'' thread. All the rest it works ok. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Roch Bourbonnais - Performance Engineering
2006-May-31 13:28 UTC
[zfs-discuss] 3510 configuration for ZFS
Hi Grant, this may provide some guidance for your setup; it''s somewhat theoretical (take it for what it''s worth) but it spells out some of the tradeoffs in the RAID-Z vs Mirror battle: http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to As for serving NFS, the user experience is very much impacted by the I/O latency, so any form of ''persistent'' RAM on the server side allows NFS to service frequent commit operations without having to wait for rotational latency. -r
For things like the 3510FC which (can) have Hardware Raid, I''ve been hearing that ZFS is preferable to the HW RAID controller to define arrays. I understand the rational and logic behind these arguments. However, most HW RAID controllers have a large amount of NVRAM, which _really_ helps write performance. Are there any pointers out there for configuring either the 3510FC or the 6020-series Sun arrays to allow for the optimal use of the NVRAM for write caching, while keeping as much RAID configuration in ZFS for portability/flexibility? -- Erik Trimble Java System Support Mailstop: usca14-102 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
On Wed, May 31, 2006 at 03:28:12PM +0200, Roch Bourbonnais - Performance Engineering wrote:> Hi Grant, this may provide some guidance for your setup; > > it''s somewhat theoretical (take it for what it''s worth) but > it spells out some of the tradeoffs in the RAID-Z vs Mirror > battle: > > > http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_tothanks, that is very useful information. it pretty much rules out raid-z for this workload with any reasonable configuration I can dream up with only 12 disks available. it looks like mirroring is going to provide higher write IOPS and increased redundancy, obviously at the expense of the available space.> As for serving NFS, the user experience is very much > impacted by the I/O latency, so any form of ''persistent'' RAM > on the server side allows NFS to service frequent commit > operations without having to wait for rotational latency.indeed, that is what I''m hoping for. delivery to maildir format mailboxes is rather commit intensive, so anything that can reduce latency and squeeze extra IOPS out of the storage is a big win. I''m sure there will be some NFS parameters to tweak, but that''s a separate issue :) grant.
> > http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to > > thanks, that is very useful information. it pretty much rules out raid-z > for this workload with any reasonable configuration I can dream up > with only 12 disks available. it looks like mirroring is going to > provide higher write IOPS and increased redundancy, obviously at the > expense of the available space.There''s an important caveat I want to add to this. When you''re doing sequential I/Os, or have a write-mostly workload, the issues that Roch explained so clearly won''t come into play. The trade-off between space-efficient RAID-Z and IOP-efficient mirroring only exists when you''re doing lots of small random reads. If your I/Os are large, sequential, or write-mostly, then ZFS''s I/O scheduler will aggregate them in such a way that you''ll get very efficient use of the disks regardless of the data replication model. It''s only when you''re doing small random reads that the difference between RAID-Z and mirroring becomes significant. For such workloads, everything that Roch said is spot on. Jeff
Hello grant, Thursday, June 1, 2006, 4:01:26 AM, you wrote: gb> On Wed, May 31, 2006 at 03:28:12PM +0200, Roch Bourbonnais - Performance Engineering wrote:>> Hi Grant, this may provide some guidance for your setup; >> >> it''s somewhat theoretical (take it for what it''s worth) but >> it spells out some of the tradeoffs in the RAID-Z vs Mirror >> battle: >> >> >> http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_togb> thanks, that is very useful information. it pretty much rules out raid-z gb> for this workload with any reasonable configuration I can dream up gb> with only 12 disks available. it looks like mirroring is going to gb> provide higher write IOPS and increased redundancy, obviously at the gb> expense of the available space. Well, mirroring probably would actually give you less IO/s for writing. However you will get much less IO/s when reading many small files using multiple streams. That behaviour just ruled out raid-z here... -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
What about small random writes? Won''t those also require reading from all disks in RAID-Z to read the blocks for update, where in mirroring only one disk need be accessed? Or am I missing something? (It seems like RAID-Z is similar to RAID-3 in its performance characteristics, since both spread a single logical block across all physical devices.) This message posted from opensolaris.org
Jeff Bonwick wrote:>>> http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to >>> >> thanks, that is very useful information. it pretty much rules out raid-z >> for this workload with any reasonable configuration I can dream up >> with only 12 disks available. it looks like mirroring is going to >> provide higher write IOPS and increased redundancy, obviously at the >> expense of the available space. >> > > There''s an important caveat I want to add to this. When you''re > doing sequential I/Os, or have a write-mostly workload, the issues > that Roch explained so clearly won''t come into play. The trade-off > between space-efficient RAID-Z and IOP-efficient mirroring only > exists when you''re doing lots of small random reads. > > If your I/Os are large, sequential, or write-mostly, then ZFS''s > I/O scheduler will aggregate them in such a way that you''ll get > very efficient use of the disks regardless of the data replication > model. It''s only when you''re doing small random reads that the > difference between RAID-Z and mirroring becomes significant. > For such workloads, everything that Roch said is spot on. >An other data point that comes into play here, more so with enterprise customers then SMB, is, "What happens when a failure occurs?" The performance difference between, for example, a HW raid array running R5 and a ZFS pool running RZ would be good to have tested and documented.
Hello Anton, Thursday, June 1, 2006, 5:27:24 PM, you wrote: ABR> What about small random writes? Won''t those also require reading ABR> from all disks in RAID-Z to read the blocks for update, where in ABR> mirroring only one disk need be accessed? Or am I missing something? If I understand it correctly ZFS always writes new block - no updates. It also means it should be able to write sequentially most of the time even if application is issuing random writes/updates. Of course it would be interesting to see what is going to happen after some time (months, years) and when the pool is mostly full - then perhaps it would be hard to write sequentially. Perhaps some kind of backgorung re-ordering would be helpful here (simple file rewrite should do the job - however probably something more clever will be needed). -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hello Robert, On 6/1/06, Robert Milkowski <rmilkowski at task.gda.pl> wrote:> Hello Anton, > > Thursday, June 1, 2006, 5:27:24 PM, you wrote: > > ABR> What about small random writes? Won''t those also require reading > ABR> from all disks in RAID-Z to read the blocks for update, where in > ABR> mirroring only one disk need be accessed? Or am I missing something? > > If I understand it correctly ZFS always writes new block - no updates. > It also means it should be able to write sequentially most of the time > even if application is issuing random writes/updates.When the "update write" is smaller than the orignal block, you have to read the rest of the original block in order to write a new block. I think that''s what Anton was refering. Tao
On Thu, Jun 01, 2006 at 06:40:15PM -0500, Tao Chen wrote:> >ABR> What about small random writes? Won''t those also require reading > >ABR> from all disks in RAID-Z to read the blocks for update, where in > >ABR> mirroring only one disk need be accessed? Or am I missing something? > > > >If I understand it correctly ZFS always writes new block - no updates. > >It also means it should be able to write sequentially most of the time > >even if application is issuing random writes/updates. > > When the "update write" is smaller than the orignal block, you have to > read the rest of the original block in order to write a new block. I > think that''s what Anton was refering.indeed - I don''t think this should be a problem for the Maildir workload, as files are never modified after they are written (only renamed, read and unlinked). this behaviour means that the writes should always be efficient, but the reads are something that I will need to benchmark. which reminds me, I need to write up a filebench profile for typical Maildir delivery and POP3 access. grant.
Roch Bourbonnais - Performance Engineering
2006-Jun-02 09:51 UTC
[zfs-discuss] Re: 3510 configuration for ZFS
Tao Chen writes: > Hello Robert, > > On 6/1/06, Robert Milkowski <rmilkowski at task.gda.pl> wrote: > > Hello Anton, > > > > Thursday, June 1, 2006, 5:27:24 PM, you wrote: > > > > ABR> What about small random writes? Won''t those also require reading > > ABR> from all disks in RAID-Z to read the blocks for update, where in > > ABR> mirroring only one disk need be accessed? Or am I missing something? > > > > If I understand it correctly ZFS always writes new block - no updates. > > It also means it should be able to write sequentially most of the time > > even if application is issuing random writes/updates. > > When the "update write" is smaller than the orignal block, you have to > read the rest of the original block in order to write a new block. I > think that''s what Anton was refering. > One potential optimization in this space could cover the situation of small writes that updates file in sequential order. As it stands, on the first such write to a block we have to Input the block from disk. I imagine that we could delay the input phase as long as we are write-streaming to the block (SMOP). If the load is such that the whole block gets dirty before the transaction group closes, we could save the Input operation. -r