thr3ads.net - zfs discuss - [zfs-discuss] 3510 configuration for ZFS [May 2006]

If this information is useful, please help other people find it:
Share via:

grant beattie

2006-May-31 02:11 UTC

[zfs-discuss] 3510 configuration for ZFS

hi all,

I am hoping to move roughly 1TB of maildir format email to ZFS, but
I am unsure of what the most appropriate disk configuration on a 3510
would be.

based on the desired level of redundancy and usable space, my thought
was to create a pool consisting of 2x RAID-Z vdevs (either double
parity, or single parity with two hot-spares). using 300GB drives
this would give roughly 2.4TB of usable space.

I am presuming I will want the RAID module purely for the additional
caching, and create a single LUN for each disk and present those to
ZFS. the disk will most likely be directly attached to an
X4100/X4200 using MPxIO, exported via NFS (some Linux NFSv3 clients,
some Solaris 9, 10 and Nevada).

I would like to get a feel for what others are doing in similar
configurations. is the 3510 RAID module cache effective in such a
configuration? I wasn''t able to find any definitive answer to this in
the documentation. RAID module or no RAID module? is it worth the
extra cost?

any insight other ZFS users could provide would be appreciated.

grant.

Robert Milkowski

2006-May-31 06:20 UTC

head link

[zfs-discuss] 3510 configuration for ZFS

Hello grant,

Wednesday, May 31, 2006, 4:11:09 AM, you wrote:

gb> hi all,

gb> I am hoping to move roughly 1TB of maildir format email to ZFS, but
gb> I am unsure of what the most appropriate disk configuration on a 3510
gb> would be.

gb> based on the desired level of redundancy and usable space, my thought
gb> was to create a pool consisting of 2x RAID-Z vdevs (either double
gb> parity, or single parity with two hot-spares). using 300GB drives
gb> this would give roughly 2.4TB of usable space.

gb> I am presuming I will want the RAID module purely for the additional
gb> caching, and create a single LUN for each disk and present those to
gb> ZFS. the disk will most likely be directly attached to an
gb> X4100/X4200 using MPxIO, exported via NFS (some Linux NFSv3 clients,
gb> some Solaris 9, 10 and Nevada).

gb> I would like to get a feel for what others are doing in similar
gb> configurations. is the 3510 RAID module cache effective in such a
gb> configuration? I wasn''t able to find any definitive answer to
this in
gb> the documentation. RAID module or no RAID module? is it worth the
gb> extra cost?

gb> any insight other ZFS users could provide would be appreciated.

IMHO you you want to just map disk->LUN it doesn''t make sense to buy
raid controllers at all.

I do have a config in which I have raid-5 done in HW raid and then
raid-z from such devices - that way despite of lack of raidz2 you get
better redundancy - but write performance isn''t stellar.

I also have a config with 3510 JBODs directly connected to host using
two links with MPxIO and raidz - works almost ok. By almost I mean it
works but I can see much more IOs to disks that I should - see ''BIG
IOs overhead due to ZFS'' thread. All the rest it works ok.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Roch Bourbonnais - Performance Engineering

2006-May-31 13:28 UTC

head link

[zfs-discuss] 3510 configuration for ZFS

Hi Grant, this may provide some guidance for your setup;

it''s somewhat theoretical (take it for what it''s worth) but
it spells out some of the tradeoffs in the RAID-Z vs Mirror
battle:


	http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to


As  for  serving  NFS, the  user   experience  is very  much
impacted by the I/O latency, so any form of ''persistent'' RAM
on   the server side allows NFS   to service frequent commit
operations without having to wait for rotational latency.

-r

Erik Trimble

2006-May-31 17:53 UTC

head link

[zfs-discuss] 3510 configuration for ZFS

For things like the 3510FC which (can) have Hardware Raid, I''ve been
hearing that ZFS is preferable to the HW RAID controller to define
arrays. I understand the rational and logic behind these arguments.

However, most HW RAID controllers have a large amount of NVRAM, which
_really_ helps write performance.  Are there any pointers out there for
configuring either the 3510FC or the 6020-series Sun arrays to allow for
the optimal use of the NVRAM for write caching, while keeping as much
RAID configuration in ZFS for portability/flexibility?



-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

grant beattie

2006-Jun-01 02:01 UTC

head link

[zfs-discuss] 3510 configuration for ZFS

On Wed, May 31, 2006 at 03:28:12PM +0200, Roch Bourbonnais - Performance
Engineering wrote:
> Hi Grant, this may provide some guidance for your setup;
> 
> it''s somewhat theoretical (take it for what it''s worth)
but
> it spells out some of the tradeoffs in the RAID-Z vs Mirror
> battle:
> 
> 
> 	http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to
thanks, that is very useful information. it pretty much rules out raid-z
for this workload with any reasonable configuration I can dream up
with only 12 disks available. it looks like mirroring is going to
provide higher write IOPS and increased redundancy, obviously at the
expense of the available space.
> As  for  serving  NFS, the  user   experience  is very  much
> impacted by the I/O latency, so any form of ''persistent''
RAM
> on   the server side allows NFS   to service frequent commit
> operations without having to wait for rotational latency.
indeed, that is what I''m hoping for. delivery to maildir format
mailboxes is rather commit intensive, so anything that can reduce
latency and squeeze extra IOPS out of the storage is a big win. I''m
sure there will be some NFS parameters to tweak, but that''s a separate
issue :)

grant.

Jeff Bonwick

2006-Jun-01 07:56 UTC

head link

[zfs-discuss] 3510 configuration for ZFS

> > 	http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to
> 
> thanks, that is very useful information. it pretty much rules out raid-z
> for this workload with any reasonable configuration I can dream up
> with only 12 disks available. it looks like mirroring is going to
> provide higher write IOPS and increased redundancy, obviously at the
> expense of the available space.
There''s an important caveat I want to add to this.  When
you''re
doing sequential I/Os, or have a write-mostly workload, the issues
that Roch explained so clearly won''t come into play.  The trade-off
between space-efficient RAID-Z and IOP-efficient mirroring only
exists when you''re doing lots of small random reads.

If your I/Os are large, sequential, or write-mostly, then ZFS''s
I/O scheduler will aggregate them in such a way that you''ll get
very efficient use of the disks regardless of the data replication
model.  It''s only when you''re doing small random reads that
the
difference between RAID-Z and mirroring becomes significant.
For such workloads, everything that Roch said is spot on.

Jeff

Robert Milkowski

2006-Jun-01 12:06 UTC

head link

[zfs-discuss] 3510 configuration for ZFS

Hello grant,

Thursday, June 1, 2006, 4:01:26 AM, you wrote:

gb> On Wed, May 31, 2006 at 03:28:12PM +0200, Roch Bourbonnais - Performance
Engineering wrote:
>> Hi Grant, this may provide some guidance for your setup;
>> 
>> it''s somewhat theoretical (take it for what it''s
worth) but
>> it spells out some of the tradeoffs in the RAID-Z vs Mirror
>> battle:
>> 
>> 
>>       http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to
gb> thanks, that is very useful information. it pretty much rules out raid-z
gb> for this workload with any reasonable configuration I can dream up
gb> with only 12 disks available. it looks like mirroring is going to
gb> provide higher write IOPS and increased redundancy, obviously at the
gb> expense of the available space.

Well, mirroring probably would actually give you less IO/s for writing.
However you will get much less IO/s when reading many small files
using multiple streams.

That behaviour just ruled out raid-z here...


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Anton B. Rang

2006-Jun-01 15:27 UTC

head link

[zfs-discuss] Re: 3510 configuration for ZFS

What about small random writes? Won''t those also require reading from
all disks in RAID-Z to read the blocks for update, where in mirroring only one
disk need be accessed? Or am I missing something?

(It seems like RAID-Z is similar to RAID-3 in its performance characteristics,
since both spread a single logical block across all physical devices.)
 
 
This message posted from opensolaris.org

Torrey McMahon

2006-Jun-01 17:58 UTC

head link

[zfs-discuss] 3510 configuration for ZFS

Jeff Bonwick wrote:>>> 	http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to
>>>       
>> thanks, that is very useful information. it pretty much rules out
raid-z
>> for this workload with any reasonable configuration I can dream up
>> with only 12 disks available. it looks like mirroring is going to
>> provide higher write IOPS and increased redundancy, obviously at the
>> expense of the available space.
>>     
>
> There''s an important caveat I want to add to this.  When
you''re
> doing sequential I/Os, or have a write-mostly workload, the issues
> that Roch explained so clearly won''t come into play.  The
trade-off
> between space-efficient RAID-Z and IOP-efficient mirroring only
> exists when you''re doing lots of small random reads.
>
> If your I/Os are large, sequential, or write-mostly, then ZFS''s
> I/O scheduler will aggregate them in such a way that you''ll get
> very efficient use of the disks regardless of the data replication
> model.  It''s only when you''re doing small random reads
that the
> difference between RAID-Z and mirroring becomes significant.
> For such workloads, everything that Roch said is spot on.
>   

An other data point that comes into play here, more so with enterprise 
customers then SMB, is, "What happens when a failure occurs?" The 
performance difference between, for example, a HW raid array running R5 
and a ZFS pool running RZ would be good to have tested and documented.

Robert Milkowski

2006-Jun-01 19:28 UTC

head link

[zfs-discuss] Re: 3510 configuration for ZFS

Hello Anton,

Thursday, June 1, 2006, 5:27:24 PM, you wrote:

ABR> What about small random writes? Won''t those also require
reading
ABR> from all disks in RAID-Z to read the blocks for update, where in
ABR> mirroring only one disk need be accessed? Or am I missing something?

If I understand it correctly ZFS always writes new block - no updates.
It also means it should be able to write sequentially most of the time
even if application is issuing random writes/updates. Of course it
would be interesting to see what is going to happen after some time
(months, years) and when the pool is mostly full - then perhaps it
would be hard to write sequentially. Perhaps some kind of backgorung
re-ordering would be helpful here (simple file rewrite should do the
job - however probably something more clever will be needed).


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Tao Chen

2006-Jun-01 23:40 UTC

head link

[zfs-discuss] Re: 3510 configuration for ZFS

Hello Robert,

On 6/1/06, Robert Milkowski <rmilkowski at task.gda.pl>
wrote:> Hello Anton,
>
> Thursday, June 1, 2006, 5:27:24 PM, you wrote:
>
> ABR> What about small random writes? Won''t those also require
reading
> ABR> from all disks in RAID-Z to read the blocks for update, where in
> ABR> mirroring only one disk need be accessed? Or am I missing
something?
>
> If I understand it correctly ZFS always writes new block - no updates.
> It also means it should be able to write sequentially most of the time
> even if application is issuing random writes/updates.
When the "update write" is smaller than the orignal block, you have to
read the rest of the original block in order to write a new block. I
think that''s what Anton was refering.

Tao

grant beattie

2006-Jun-02 05:57 UTC

head link

[zfs-discuss] Re: 3510 configuration for ZFS

On Thu, Jun 01, 2006 at 06:40:15PM -0500, Tao Chen wrote:
> >ABR> What about small random writes? Won''t those also
require reading
> >ABR> from all disks in RAID-Z to read the blocks for update, where
in
> >ABR> mirroring only one disk need be accessed? Or am I missing
something?
> >
> >If I understand it correctly ZFS always writes new block - no updates.
> >It also means it should be able to write sequentially most of the time
> >even if application is issuing random writes/updates.
> 
> When the "update write" is smaller than the orignal block, you
have to
> read the rest of the original block in order to write a new block. I
> think that''s what Anton was refering.
indeed - I don''t think this should be a problem for the Maildir
workload, as files are never modified after they are written (only
renamed, read and unlinked). this behaviour means that the writes
should always be efficient, but the reads are something that I will
need to benchmark.

which reminds me, I need to write up a filebench profile for typical
Maildir delivery and POP3 access.

grant.

Roch Bourbonnais - Performance Engineering

2006-Jun-02 09:51 UTC

head link

[zfs-discuss] Re: 3510 configuration for ZFS

Tao Chen writes:
 > Hello Robert,
 > 
 > On 6/1/06, Robert Milkowski <rmilkowski at task.gda.pl> wrote:
 > > Hello Anton,
 > >
 > > Thursday, June 1, 2006, 5:27:24 PM, you wrote:
 > >
 > > ABR> What about small random writes? Won''t those also
require reading
 > > ABR> from all disks in RAID-Z to read the blocks for update, where
in
 > > ABR> mirroring only one disk need be accessed? Or am I missing
something?
 > >
 > > If I understand it correctly ZFS always writes new block - no
updates.
 > > It also means it should be able to write sequentially most of the
time
 > > even if application is issuing random writes/updates.
 > 
 > When the "update write" is smaller than the orignal block, you
have to
 > read the rest of the original block in order to write a new block. I
 > think that''s what Anton was refering.
 > 

One  potential optimization  in  this space could  cover the
situation of small writes  that  updates file in  sequential
order. As it stands, on the  first such write  to a block we
have to  Input the block from  disk. I imagine that we could
delay the  input phase as long as  we are write-streaming to
the block (SMOP).  If the load is such  that the whole block
gets dirty   before the transaction   group closes, we could
save the Input operation.

-r

Possibly Parallel Threads

Search for more apparently analagous threads

zfs discuss - May 2006 - 3510 configuration for ZFS

[zfs-discuss] 3510 configuration for ZFS

[zfs-discuss] 3510 configuration for ZFS

[zfs-discuss] 3510 configuration for ZFS

[zfs-discuss] 3510 configuration for ZFS

[zfs-discuss] 3510 configuration for ZFS

[zfs-discuss] 3510 configuration for ZFS

[zfs-discuss] 3510 configuration for ZFS

[zfs-discuss] Re: 3510 configuration for ZFS

[zfs-discuss] 3510 configuration for ZFS

[zfs-discuss] Re: 3510 configuration for ZFS

[zfs-discuss] Re: 3510 configuration for ZFS

[zfs-discuss] Re: 3510 configuration for ZFS

[zfs-discuss] Re: 3510 configuration for ZFS

Possibly Parallel Threads